aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: Stop /128 route from disappearing after pmtu updateMartin KaFai Lau2015-05-011-2/+2
| | | | | | | | | | | | | | | | | This patch is mostly from Steffen Klassert <steffen.klassert@secunet.com>. I only removed the (rt6->rt6i_dst.plen == 128) check from ip6_rt_update_pmtu() because the (rt6->rt6i_flags & RTF_CACHE) test has already implied it. This patch: 1. Create RTF_CACHE route for /128 non local route 2. After (1), all routes that allow pmtu update should have a RTF_CACHE clone. Hence, stop updating MTU for any non RTF_CACHE route. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Extend the route lookups to low priority metrics.Steffen Klassert2015-05-011-5/+23
| | | | | | | | | | | | | | | | We search only for routes with highest priority metric in find_rr_leaf(). However if one of these routes is marked as invalid, we may fail to find a route even if there is a appropriate route with lower priority. Then we loose connectivity until the garbage collector deletes the invalid route. This typically happens if a host route expires afer a pmtu event. Fix this by searching also for routes with a lower priority metric. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Consider RTF_CACHE when searching the fib6 treeMartin KaFai Lau2015-05-012-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a prep work for the later bug-fix patch which will stop /128 route from disappearing after pmtu update. The later bug-fix patch will allow a /128 route and its RTF_CACHE clone both exist at the same fib6_node. To do this, we need to prepare the existing fib6 tree search to expect RTF_CACHE for /128 route. Note that the fn->leaf is sorted by rt6i_metric. Hence, RTF_CACHE (if there is any) is always at the front. This property leads to the following: 1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which the caller is used to ask for deleting clone or non-clone. The rtm_to_fib6_config() should also check the RTM_F_CLONED and then set RTF_CACHE accordingly so that: - 'ip -6 r del...' will make ip6_route_del() to delete a route and all its clones. Note that its clones is flushed by fib6_del() - 'ip -6 r flush table cache' will make ip6_route_del() to only delete clone(s). 2. Exclude RTF_CACHE from addrconf_get_prefix_route() which should not configure on a cloned route. 3. No change is need for rt6_device_match() since it currently could return a RTF_CACHE clone route, so the later bug-fix patch will not affect it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: speedup ip_idents_reserve()Eric Dumazet2015-05-011-9/+11
| | | | | | | | | | | | Under stress, ip_idents_reserve() is accessing a contended cache line twice, with non optimal MESI transactions. If we place timestamps in separate location, we reduce this pressure by ~50% and allow atomic_add_return() to issue a Request for Ownership. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers: net: xgene: fix kbuild warningsIyappan Subramanian2015-04-302-1/+3
| | | | | | | | | Fixed the following kbuild warnings: 1. unused variable 'of_id' 2. buffer overflow 'ring_cfg' 5 <= 5 Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* test_bpf: indicate whether bpf prog got jited in test suiteDaniel Borkmann2015-04-301-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think this is useful to verify whether a filter could be JITed or not in case of bpf_prog_enable >= 1, which otherwise the test suite doesn't tell besides taking a good peek at the performance numbers. Nicolas Schichan reported a bug in the ARM JIT compiler that rejected and waved the filter to the interpreter although it shouldn't have. Nevertheless, the test passes as expected, but such information is not visible. It's i.e. useful for the remaining classic JITs, but also for implementing remaining opcodes that are not yet present in eBPF JITs (e.g. ARM64 waves some of them to the interpreter). This minor patch allows to grep through dmesg to find those accordingly, but also provides a total summary, i.e.: [<X>/53 JIT'ed] # echo 1 > /proc/sys/net/core/bpf_jit_enable # insmod lib/test_bpf.ko # dmesg | grep "jited:0" dmesg example on the ARM issue with JIT rejection: [...] [ 67.925387] test_bpf: #2 ADD_SUB_MUL_K jited:1 24 PASS [ 67.930889] test_bpf: #3 DIV_MOD_KX jited:0 794 PASS [ 67.943940] test_bpf: #4 AND_OR_LSH_K jited:1 20 20 PASS [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* be2net: log link statusIvan Vecera2015-04-301-0/+2
| | | | | | | | | | | | | | The driver unlike other drivers does not log link state changes. It's better for an user when asynchronous link states are logged to the system log. v3: Changes from v2 discarded as "not necessary" Cc: Sathya Perla <sathya.perla@emulex.com> Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com> Cc: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ibmveth: Add support for Large Receive OffloadThomas Falcon2015-04-302-1/+17
| | | | | | | | | | Enables receiving large packets from other LPARs. These packets have a -1 IP header checksum, so we must recalculate to have a valid checksum. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ibmveth: Add GRO supportThomas Falcon2015-04-301-1/+1
| | | | | | Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ibmveth: Add support for TSOThomas Falcon2015-04-302-1/+19
| | | | | | | | | | | Add support for TSO. TSO is turned off by default and must be enabled and configured by the user. The driver version number is increased so that users can be sure that they are using ibmveth with TSO support. Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ibmveth: change rx buffer default allocation for CMOThomas Falcon2015-04-302-1/+5
| | | | | | | | | | This patch enables 64k rx buffer pools by default. If Cooperative Memory Overcommitment (CMO) is enabled, the number of 64k buffers is reduced to save memory. Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'xgene-next'David S. Miller2015-04-3010-81/+530
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Iyappan Subramanian says: ==================== drivers: net: xgene: Add ethernet with ring manager v2 support Adding XFI based 10GbE and SGMII based 1GbE with ring manager v2 support for APM X-Gene ethernet driver. The ring manager v2 is used by 2nd generation SoC. v1: * Initial version ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * drivers: net: xgene: Add SGMII based 1GbE support with ring manager v2Iyappan Subramanian2015-04-305-29/+68
| | | | | | | | | | Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * drivers: net: xgene: Add 10GbE support with ring manager v2Iyappan Subramanian2015-04-305-25/+152
| | | | | | | | | | Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * drivers: net: xgene: Add ring manager v2 functionsIyappan Subramanian2015-04-305-1/+253
| | | | | | | | | | | | | | Adding ring manager v2 support for APM X-Gene ethernet driver. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * drivers: net: xgene: Change ring manager to use function pointersIyappan Subramanian2015-04-304-30/+61
|/ | | | | | | | | | | This is a preparatory patch for adding ethernet support for APM X-Gene ethernet driver to work with ring manager v2. Added xgene_ring_ops structure for storing chip specific ring manager properties and functions. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp_westwood: fix tcp_westwood_info()Eric Dumazet2015-04-301-7/+8
| | | | | | | | | | | I forgot to update tcp_westwood when changing get_info() behavior, this patch should fix this. Fixes: 64f40ff5bbdb ("tcp: prepare CC get_info() access from getsockopt()") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: update reordering first before detecting lossYuchung Cheng2015-04-291-4/+2
| | | | | | | | | | | tcp_mark_lost_retrans is not used when FACK is disabled. Since tcp_update_reordering may disable FACK, it should be called first before tcp_mark_lost_retrans. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add TCP_CC_INFO socket optionEric Dumazet2015-04-292-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | Some Congestion Control modules can provide per flow information, but current way to get this information is to use netlink. Like TCP_INFO, let's add TCP_CC_INFO so that applications can issue a getsockopt() if they have a socket file descriptor, instead of playing complex netlink games. Sample usage would be : union tcp_cc_info info; socklen_t len = sizeof(info); if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: prepare CC get_info() access from getsockopt()Eric Dumazet2015-04-297-34/+46
| | | | | | | | | | | | | | | | | We would like that optional info provided by Congestion Control modules using netlink can also be read using getsockopt() This patch changes get_info() to put this information in a buffer, instead of skb, like tcp_get_info(), so that following patch can reuse this common infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add tcpi_bytes_received to tcp_infoEric Dumazet2015-04-295-4/+20
| | | | | | | | | | | | | | | | | | | | | | | This patch tracks total number of payload bytes received on a TCP socket. This is the sum of all changes done to tp->rcv_nxt RFC4898 named this : tcpEStatsAppHCThruOctetsReceived This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_received was placed near tp->rcv_nxt for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add tcpi_bytes_acked to tcp_infoEric Dumazet2015-04-295-4/+22
| | | | | | | | | | | | | | | | | | | | | | | This patch tracks total number of bytes acked for a TCP socket. This is the sum of all changes done to tp->snd_una, and allows for precise tracking of delivered data. RFC4898 named this : tcpEStatsAppHCThruOctetsAcked This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_acked was placed near tp->snd_una for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-04-2753-317/+527
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) mlx4 doesn't check fully for supported valid RSS hash function, fix from Amir Vadai 2) Off by one in ibmveth_change_mtu(), from David Gibson 3) Prevent altera chip from reporting false error interrupts in some circumstances, from Chee Nouk Phoon 4) Get rid of that stupid endless loop trying to allocate a FIN packet in TCP, and in the process kill deadlocks. From Eric Dumazet 5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from Eric Dumazet 6) Fix two bugs in async rhashtable resizing, from Thomas Graf 7) Fix topology server listener socket namespace bug in TIPC, from Ying Xue 8) Add some missing HAS_DMA kconfig dependencies, from Geert Uytterhoeven 9) bgmac driver intends to force re-polling but does so by returning the wrong value from it's ->poll() handler. Fix from Rafał Miłecki 10) When the creater of an rhashtable configures a max size for it, don't bark in the logs and drop insertions when that is exceeded. Fix from Johannes Berg 11) Recover from out of order packets in ppp mppe properly, from Sylvain Rochet * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) bnx2x: really disable TPA if 'disable_tpa' option is set net:treewide: Fix typo in drivers/net net/mlx4_en: Prevent setting invalid RSS hash function mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions netfilter; Add some missing default cases to switch statements in nft_reject. ppp: mppe: discard late packet in stateless mode ppp: mppe: sanity error path rework net/bonding: Make DRV macros private net: rfs: fix crash in get_rps_cpus() altera tse: add support for fixed-links. pxa168: fix double deallocation of managed resources net: fix crash in build_skb() net: eth: altera: Resolve false errors from MSGDMA to TSE ehea: Fix memory hook reference counting crashes net/tg3: Release IRQs on permanent error net: mdio-gpio: support access that may sleep inet: fix possible panic in reqsk_queue_unlink() rhashtable: don't attempt to grow when at max_size bgmac: fix requests for extra polling calls from NAPI tcp: avoid looping in tcp_send_fin() ...
| * bnx2x: really disable TPA if 'disable_tpa' option is setMichal Schmidt2015-04-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bnx2x's 'disable_tpa=1' module option is not respected properly and TPA (transparent packet aggregation) remains enabled. Even though the module option causes LRO to be disabled, TPA is enabled in GRO mode. Additionally, disabling GRO via ethtool then has no effect. One can still observe tpa_* statistics increase and large packets being received in tcpdump. The bug was an unintended consequence of commit aebf6244cd39 "bnx2x: Be more forgiving toward SW GRO". Fix it by following the bp->disable_tpa flag when initializing fp's. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net:treewide: Fix typo in drivers/netMasanari Iida2015-04-272-2/+2
| | | | | | | | | | | | | | This patch fix spelling typo in printk. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_en: Prevent setting invalid RSS hash functionAmir Vadai2015-04-271-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | mlx4_en_check_rxfh_func() was checking for hardware support before setting a known RSS hash function, but didn't do any check before setting unknown RSS hash function. Need to make it fail on such values. In this occasion, moved the actual setting of the new value from the check function into mlx4_en_set_rxfh(). Fixes: 947cbb0 ("net/mlx4_en: Support for configurable RSS hash function") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functionsRojhalat Ibrahim2015-04-271-43/+17
| | | | | | | | | | | | | | | | | | | | | | Use the new gpiod_get_array and gpiod_put_array functions (added to mainline in the v4.1 merge window) for obtaining and disposing of GPIO descriptors. Cc: David Miller <davem@davemloft.net> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter; Add some missing default cases to switch statements in nft_reject.David S. Miller2015-04-272-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes: ==================== net/netfilter/nft_reject.c: In function ‘nft_reject_dump’: net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch] switch (priv->type) { ^ net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\ tch] net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’: net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\ tch] switch (priv->type) { ^ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'ppp_mppe_desync'David S. Miller2015-04-261-16/+20
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sylvain Rochet says: ==================== ppp: mppe: fixes MPPE desync on links which don't guarantee packet ordering I am currently having an issue with PPP over L2TP (UDP) and MPPE in stateless mode (default mode), UDP does not guarantee packet ordering so we might get out of order packet. MPPE needs to be continuously synched so we should drop late UDP packet. I added a printk on the number of time we rekeyed in MPPE decompressor, this is what we currently have if we receive a slightly out of order UDP packet: [1731001.049206] mppe_decompress[1]: ccount 1559 [1731001.049216] mppe_decompress[1]: rekeyed 1 times [1731001.049228] mppe_decompress[1]: ccount 1560 [1731001.049232] mppe_decompress[1]: rekeyed 1 times [1731001.050170] mppe_decompress[1]: ccount 1562 [1731001.050182] mppe_decompress[1]: rekeyed 2 times [1731001.050191] mppe_decompress[1]: ccount 1561 [1731001.062576] mppe_decompress[1]: rekeyed 4095 times ^^^^ This is obviously wrong, we missed packet 1561 and we already rekeyed 2 times for 1562 we previously received, we can't recover the decryption key we need for 1561, we should drop it instead of rekeying 4095 times. This patch series drop any packet with are not within the 4096/2 forward window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ppp: mppe: discard late packet in stateless modeSylvain Rochet2015-04-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PPP is used over a link which does not guarantee packet ordering, we might get late MPPE packets. This is a problem because MPPE must be kept synchronized and the current implementation does not drop them and rekey 4095 times instead of 0, which is wrong. In order to prevent rekeying about a whole count space times (~ 4095 times), drop packets which are not within the forward 4096/2 window and increase sanity error counter. Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ppp: mppe: sanity error path reworkSylvain Rochet2015-04-261-16/+13
| |/ | | | | | | | | | | | | | | We are going to need sanity error path a little further, rework to be able to use the sanity error path anywhere in decompressor. Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/bonding: Make DRV macros privateMatan Barak2015-04-264-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bonding modules currently defines four macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: rfs: fix crash in get_rps_cpus()Eric Dumazet2015-04-262-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 567e4b79731c ("net: rfs: add hash collision detection") had one mistake : RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu() and get_rps_cpu(), as @next_cpu was the result of an AND with rps_cpu_mask This bug showed up on a host with 72 cpus : next_cpu was 0x7f, and the code was trying to access percpu data of an non existent cpu. In a follow up patch, we might get rid of compares against nr_cpu_ids, if we init the tables with 0. This is silly to test for a very unlikely condition that exists only shortly after table initialization, as we got rid of rps_reset_sock_flow() and similar functions that were writing this RPS_NO_CPU magic value at flow dismantle : When table is old enough, it never contains this value anymore. Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * altera tse: add support for fixed-links.Andreas Oetken2015-04-261-8/+29
| | | | | | | | | | | | | | | | | | Add support for fixed-links in configurations without PHY. (e.g. connection to a switch, SGMII point to point, SFPs) Check: Documentation/devicetree/bindings/net/fixed-link.txt. Signed-off-by: Andreas Oetken <ennoerlangen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * pxa168: fix double deallocation of managed resourcesAlexey Khoroshilov2015-04-261-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 43d3ddf87a57 ("net: pxa168_eth: add device tree support") starts to use managed resources by adding devm_clk_get() and devm_ioremap_resource(), but it leaves explicit iounmap() and clock_put() in pxa168_eth_remove() and in failure handling code of pxa168_eth_probe(). As a result double free can happen. The patch removes explicit resource deallocation. Also it converts clk_disable() to clk_disable_unprepare() to make it symmetrical with clk_prepare_enable(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fix crash in build_skb()Eric Dumazet2015-04-253-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I added pfmemalloc support in build_skb(), I forgot netlink was using build_skb() with a vmalloc() area. In this patch I introduce __build_skb() for netlink use, and build_skb() is a wrapper handling both skb->head_frag and skb->pfmemalloc This means netlink no longer has to hack skb->head_frag [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26! [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1567.700067] Dumping ftrace buffer: [ 1567.700067] (ftrace buffer empty) [ 1567.700067] Modules linked in: [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167 [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000 [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3)) [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202 [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049 [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000 [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000 [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000 [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000 [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0 [ 1567.700067] Stack: [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000 [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08 [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821 [ 1567.700067] Call Trace: [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316) [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329) [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623) [ 1567.774369] sock_write_iter (net/socket.c:823) [ 1567.774369] ? sock_sendmsg (net/socket.c:806) [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491) [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 1567.774369] ? default_llseek (fs/read_write.c:487) [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4)) [ 1567.774369] vfs_write (fs/read_write.c:539) [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577) [ 1567.774369] ? SyS_read (fs/read_write.c:577) [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636) [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261) Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: eth: altera: Resolve false errors from MSGDMA to TSEChee Nouk Phoon2015-04-251-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves false errors from MSGDMA in TX mSGDMA MM to ST mode, and is a continuation of the patch recently submitted by Andrea Oetken. The MSGDMA had a logic bug that masked detection of this issue prior to Quartus 14.1/Build 164. When the MSGDMA logic bug was addressed in Quartus 14.1/Build 164, the driver problem was exposed. The problem is corrected by making sure MSGDMA_DESC_CTL_TR_ERR_IRQ is not set for any of the transmit DMA descriptors, and only used for receive descriptors. Fixes: 71cd26e altera tse: Error-Bit on tx-avalon-stream always set. Signed-off-by: Chee Nouk Phoon <cnphoon@altera.com> Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>a Cc: Andreas Oetken <ennoerlangen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ehea: Fix memory hook reference counting crashesMichael Ellerman2015-04-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent commit to only register the EHEA memory hotplug hooks on adapter probe has a few problems. Firstly the reference counting is wrong for multiple adapters, in that the hooks are registered multiple times. Secondly the check in the tear down path is backward. Finally the error path doesn't decrement the count. The multiple registration of the hooks is the biggest problem, as it leads to oopses when the system is rebooted, and/or errors during memory hotplug, eg: $ ./mem-on-off-test.sh -r 2 ... ehea: memory is going offline ehea: LPAR memory changed - re-initializing driver ehea: re-initializing driver complete ehea: memory is going offline ehea: LPAR memory changed - re-initializing driver ehea: opcode=26c ret=fffffffffffffffc arg1=8000000003000003 arg2=0 arg3=700000060000d600 arg4=3fded0000 arg5=200 arg6=0 arg7=0 ehea: register_rpage_mr failed ehea: registering mr failed ehea: register MR failed - driver inoperable! ehea: memory is going offline Fixes: aa183323312d ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/tg3: Release IRQs on permanent errorGavin Shan2015-04-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When having permanent EEH error, the PCI device will be removed from the system. For this case, we shouldn't set pcierr_recovery to true wrongly, which blocks the driver to release the allocated interrupts and their handlers. Eventually, we can't disable MSI or MSIx successfully because of the MSI or MSIx interrupts still have associated interrupt actions, which is turned into following stack dump. Oops: Exception in kernel mode, sig: 5 [#1] : [c0000000003b76a8] .free_msi_irqs+0x80/0x1a0 (unreliable) [c00000000039f388] .pci_remove_bus_device+0x98/0x110 [c0000000000790f4] .pcibios_remove_pci_devices+0x9c/0x128 [c000000000077b98] .handle_eeh_events+0x2d8/0x4b0 [c0000000000782d0] .eeh_event_handler+0x130/0x1c0 [c000000000022bd4] .kernel_thread+0x54/0x70 Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: mdio-gpio: support access that may sleepVivien Didelot2015-04-241-5/+9
| | | | | | | | | | | | | | | | | | | | | | Some systems using mdio-gpio may use gpio on message based busses, which require sleeping (e.g. gpio from an I2C I/O expander). Since this driver does not use IRQ handler, it is safe to use the _cansleep suffixed gpio accessors. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * inet: fix possible panic in reqsk_queue_unlink()Eric Dumazet2015-04-249-46/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243 There is a race when reqsk_timer_handler() and tcp_check_req() call inet_csk_reqsk_queue_unlink() on the same req at the same time. Before commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer"), listener spinlock was held and race could not happen. To solve this bug, we change reqsk_queue_unlink() to not assume req must be found, and we return a status, to conditionally release a refcount on the request sock. This also means tcp_check_req() in non fastopen case might or not consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have to properly handle this. (Same remark for dccp_check_req() and its callers) inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is called 4 times in tcp and 3 times in dccp. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: don't attempt to grow when at max_sizeJohannes Berg2015-04-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion of mac80211's station table to rhashtable had a bug that I found by accident in code review, that hadn't been found as rhashtable apparently managed to have a maximum hash chain length of one (!) in all our testing. In order to test the bug and verify the fix I set my rhashtable's max_size very low (4) in order to force getting hash collisions. At that point, rhashtable WARNed in rhashtable_insert_rehash() but didn't actually reject the hash table insertion. This caused it to lose insertions - my master list of stations would have 9 entries, but the rhashtable only had 5. This may warrant a deeper look, but that WARN_ON() just shouldn't happen. Fix this by not returning true from rht_grow_above_100() when the rhashtable's max_size has been reached - in this case the user is explicitly configuring it to be at most that big, so even if it's now above 100% it shouldn't attempt to resize. This fixes the "lost insertion" issue and consequently allows my code to display its error (and verify my fix for it.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bgmac: fix requests for extra polling calls from NAPIRafał Miłecki2015-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | After d75b1ade567f ("net: less interrupt masking in NAPI") polling function has to return whole budget when it wants NAPI to call it again. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Cc: Felix Fietkau <nbd@openwrt.org> Fixes: eb64e2923a886 ("bgmac: leave interrupts disabled as long as there is work to do") Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: avoid looping in tcp_send_fin()Eric Dumazet2015-04-241-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presence of an unbound loop in tcp_send_fin() had always been hard to explain when analyzing crash dumps involving gigantic dying processes with millions of sockets. Lets try a different strategy : In case of memory pressure, try to add the FIN flag to last packet in write queue, even if packet was already sent. TCP stack will be able to deliver this FIN after a timeout event. Note that this FIN being delivered by a retransmit, it also carries a Push flag given our current implementation. By checking sk_under_memory_pressure(), we anticipate that cooking many FIN packets might deplete tcp memory. In the case we could not allocate a packet, even with __GFP_WAIT allocation, then not sending a FIN seems quite reasonable if it allows to get rid of this socket, free memory, and not block the process from eventually doing other useful work. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethernet: myri10ge: use arch_phys_wc_add()Luis R. Rodriguez2015-04-231-29/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver already uses ioremap_wc() on the same range so when write-combining is available that will be used instead. Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: netdev@vger.kernel.org Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Airlie <airlied@redhat.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * can: CAN_GRCAN should depend on HAS_DMAGeert Uytterhoeven2015-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | If NO_DMA=y: drivers/built-in.o: In function `grcan_free_dma_buffers': grcan.c:(.text+0x2d7716): undefined reference to `dma_free_coherent' drivers/built-in.o: In function `grcan_allocate_dma_buffers': grcan.c:(.text+0x2d779c): undefined reference to `dma_alloc_coherent' Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethernet: arc: ARC_EMAC and EMAC_ROCKCHIP should depend on HAS_DMAGeert Uytterhoeven2015-04-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If NO_DMA=y: drivers/built-in.o: In function `arc_emac_tx_clean': emac_main.c:(.text+0x2decde): undefined reference to `dma_unmap_single' drivers/built-in.o: In function `arc_emac_rx': emac_main.c:(.text+0x2dee1c): undefined reference to `dma_unmap_single' emac_main.c:(.text+0x2dee72): undefined reference to `dma_map_single' emac_main.c:(.text+0x2dee7e): undefined reference to `dma_mapping_error' drivers/built-in.o: In function `arc_emac_probe': (.text+0x2df2ee): undefined reference to `dmam_alloc_coherent' drivers/built-in.o: In function `arc_emac_open': emac_main.c:(.text+0x2df6d8): undefined reference to `dma_map_single' emac_main.c:(.text+0x2df6e4): undefined reference to `dma_mapping_error' drivers/built-in.o: In function `arc_emac_tx': emac_main.c:(.text+0x2df9e4): undefined reference to `dma_map_single' emac_main.c:(.text+0x2df9f0): undefined reference to `dma_mapping_error' Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethernet: amd: AMD_XGBE should depend on HAS_DMAGeert Uytterhoeven2015-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If NO_DMA=y: drivers/built-in.o: In function `xgbe_probe': xgbe-main.c:(.text+0x2def0a): undefined reference to `dma_set_mask' xgbe-main.c:(.text+0x2def20): undefined reference to `dma_supported' drivers/built-in.o: In function `xgbe_rx_poll': xgbe-drv.c:(.text+0x2e0320): undefined reference to `dma_sync_single_for_cpu' xgbe-drv.c:(.text+0x2e035e): undefined reference to `dma_sync_single_for_cpu' drivers/built-in.o: In function `xgbe_unmap_rdata': xgbe-desc.c:(.text+0x2e5fe4): undefined reference to `dma_unmap_page' xgbe-desc.c:(.text+0x2e5ffa): undefined reference to `dma_unmap_single' xgbe-desc.c:(.text+0x2e604a): undefined reference to `dma_unmap_page' xgbe-desc.c:(.text+0x2e6084): undefined reference to `dma_unmap_page' drivers/built-in.o: In function `xgbe_alloc_pages': xgbe-desc.c:(.text+0x2e6156): undefined reference to `dma_map_page' xgbe-desc.c:(.text+0x2e6164): undefined reference to `dma_mapping_error' drivers/built-in.o: In function `xgbe_free_ring': xgbe-desc.c:(.text+0x2e63d4): undefined reference to `dma_unmap_page' xgbe-desc.c:(.text+0x2e640e): undefined reference to `dma_unmap_page' xgbe-desc.c:(.text+0x2e644a): undefined reference to `dma_free_coherent' drivers/built-in.o: In function `xgbe_init_ring': xgbe-desc.c:(.text+0x2e64d4): undefined reference to `dma_alloc_coherent' drivers/built-in.o: In function `xgbe_map_tx_skb': xgbe-desc.c:(.text+0x2e6628): undefined reference to `dma_map_single' xgbe-desc.c:(.text+0x2e6638): undefined reference to `dma_mapping_error' xgbe-desc.c:(.text+0x2e66b2): undefined reference to `dma_map_single' xgbe-desc.c:(.text+0x2e66c2): undefined reference to `dma_mapping_error' xgbe-desc.c:(.text+0x2e6762): undefined reference to `dma_map_page' xgbe-desc.c:(.text+0x2e6772): undefined reference to `dma_mapping_error' Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: unix: garbage: fixed several comment and whitespace style issuesJason Eastman2015-04-231-42/+28
| | | | | | | | | | | | | | fixed several comment and whitespace style issues Signed-off-by: Jason Eastman <eastman.jason.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'tipc-fixes'David S. Miller2015-04-233-8/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jon Maloy says: ==================== tipc: three bug fixes A set of unrelated corrections; one for the tipc netns implementation, one regarding problems with random link resets, and one removing a an erroneous refcount decrement when reading link statistsics via netlink. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>