aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* page_pool: Don't recycle non-reusable pagesSaeed Mahameed2019-11-201-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A page is NOT reusable when at least one of the following is true: 1) allocated when system was under some pressure. (page_is_pfmemalloc) 2) belongs to a different NUMA node than pool->p.nid. To update pool->p.nid users should call page_pool_update_nid(). Holding on to such pages in the pool will hurt the consumer performance when the pool migrates to a different numa node. Performance testing: XDP drop/tx rate and TCP single/multi stream, on mlx5 driver while migrating rx ring irq from close to far numa: mlx5 internal page cache was locally disabled to get pure page pool results. CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G) XDP Drop/TX single core: NUMA | XDP | Before | After --------------------------------------- Close | Drop | 11 Mpps | 10.9 Mpps Far | Drop | 4.4 Mpps | 5.8 Mpps Close | TX | 6.5 Mpps | 6.5 Mpps Far | TX | 3.5 Mpps | 4 Mpps Improvement is about 30% drop packet rate, 15% tx packet rate for numa far test. No degradation for numa close tests. TCP single/multi cpu/stream: NUMA | #cpu | Before | After -------------------------------------- Close | 1 | 18 Gbps | 18 Gbps Far | 1 | 15 Gbps | 18 Gbps Close | 12 | 80 Gbps | 80 Gbps Far | 12 | 68 Gbps | 80 Gbps In all test cases we see improvement for the far numa case, and no impact on the close numa case. The impact of adding a check per page is very negligible, and shows no performance degradation whatsoever, also functionality wise it seems more correct and more robust for page pool to verify when pages should be recycled, since page pool can't guarantee where pages are coming from. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* page_pool: Add API to update numa nodeSaeed Mahameed2019-11-203-0/+37
| | | | | | | | | | | | | | | | | | | | | | Add page_pool_update_nid() to be called by page pool consumers when they detect numa node changes. It will update the page pool nid value to start allocating from the new effective numa node. This is to mitigate page pool allocating pages from a wrong numa node, where the pool was originally allocated, and holding on to pages that belong to a different numa node, which causes performance degradation. For pages that are already being consumed and could be returned to the pool by the consumer, in next patch we will add a check per page to avoid recycling them back to the pool and return them to the page allocator. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'cpsw-switchdev'David S. Miller2019-11-2020-1293/+4704
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Grygorii Strashko says: ==================== net: ethernet: ti: introduce new cpsw switchdev based driver Thank you All for review of v6. There are no significant changes in this version, just fixed comments to v6. --- v6 The major change in this version is DT bindings conversation to json-schema, and fixed other comments to v5. Also added patch to clean up ALE on init and netif restart. --- v5 The major part of work done in this iteration is rebasing on top of net-next with XDP series from Ivan Khoronzhuk [3], and enable XDP support in the new CPSW switchdev driver (it was little bit painful ;(). There are mostly no functional changes in new CPSW driver, just few fixes, sync with old driver and cleanups/optimizations. So, I've kept rest of cover letter unchanged. --- This series originally based on work [1][2] done by Ilias Apalodimas <ilias.apalodimas@linaro.org>. This the RFC v5 which introduces new CPSW switchdev based driver which is operating in dual-emac mode by default, thus working as 2 individual network interfaces. The Switch mode can be enabled by configuring devlink driver parameter "switch_mode" to 1/true: devlink dev param set platform/48484000.switch \ name switch_mode value 1 cmode runtime This can be done regardless of the state of Port's netdev devices - UP/DOWN, but Port's netdev devices have to be in UP before joining the bridge to avoid overwriting of bridge configuration as CPSW switch driver completely reloads its configuration when first Port changes its state to UP. When the both interfaces joined the bridge - CPSW switch driver will start marking packets with offload_fwd_mark flag unless "ale_bypass=0". All configuration is implemented via switchdev API. The previous solution of tracking both Ports joined the bridge (from netdevice_notifier) proved to be not correct as changing CPSW switch driver mode required cleanup of ALE table and CPSW settings which happens while second Port is joined bridge and as result configuration loaded by bridge for the first Port became corrupted. The introduction of the new CPSW switchdev based driver (cpsw_new.c) is split on two parts: Part 1 - basic dual-emac driver; Part 2 switchdev support. Such approach has simplified code development and testing alot. And, I hope, it will help with better review. patches #1 - 5: preparation patches which also moves common code to cpsw_priv.c patches #6 - 9: Introduce TI CPSW switch driver based on switchdev and new DT bindings patch #10: new CPSW switchdev driver documentation patch #11: adds DT nodes for new CPSW switchdev driver added for DRA7 SoC patch #12: adds DT nodes for new cpsw switchdev driver for am571x-idk board patch #13: enables build of TI CPSW driver Most of the contents of the previous cover-letter have been added in new driver documentation, so please refer to that for configuration, testing and future work. These patches can be found at (branch contains some additional patches required for testing on top of net-next): https://github.com/grygoriyS/linux.git branch: lkml-5.4-switch-tbd-v7 changes in v7: - patch 2: added check for devm_kmalloc_array() return value - patch 6: fixed comments changes in v6: https://lkml.org/lkml/2019/11/9/108 - DT bindings converted to json-schema - netdev initialization is split on creation and registration. The netdevs registration happens now at the end of the pobe. - reworked cpsw_set_pauseparam() to use PHYlib APIs. - other comments for v5 fixed v5: https://patchwork.kernel.org/cover/11208785/ - rebase on top of net-next with XDP series from Ivan Khoronzhuk [3], and enable XDP support in the new CPSW switchdev driver cpsw driver (tested XDP_DROP only) - sync with old cpsw driver - implement comments from Ivan Khoronzhuk and Rob Herring - fixed "NETDEV WATCHDOG: .." warning after interface after interface UP/DOWN, missed TX wake in cpsw_adjust_link() v4: https://patchwork.kernel.org/cover/11010523/ - finished split of common CPSW code - added devlink support - changed CPSW mode configuration approach: from netdevice_notifier to devlink parameter - refactor and clean up ALE changes which allows to modify VLANs/MDBs entries - added missed support for port QDISC_CBS and QDISC_MQPRIO - the CPSW is split on two parts: basic dual_mac driver and switchdev support - added missed callback .ndo_get_port_parent_id() - reworked ingress frames marking in switch mode (offload_fwd_mark) - applied comments from Andrew Lunn v3: https://lwn.net/Articles/786677/ Changes in v3: - alot of work done to split properly common code between legacy and switchdev CPSW drivers and clean up code - CPSW switchdev interface updated to the current LKML switchdev interface - actually new CPSW switchdev based driver introduced - optimized dual_mac mode in new driver. Main change is that in promiscuous mode P0_UNI_FLOOD (both ports) is enabled in addition to ALLMULTI (current port) instead of ALE_BYPASS. So, port in non promiscuous mode will keep possibility of mcast and vlan filtering. - changed bridge join sequnce: now switch mode will be enabled only when both ports joined the bridge. CPSW will be switched to dual_mac mode if any port leave bridge. ALE table is completly cleared and then refiled while switching to switch mode - this simplidies code a lot, but introduces some limitation to bridge setup sequence: ip link add name br0 type bridge ip link set dev br0 type bridge ageing_time 1000 ip link set dev br0 type bridge vlan_filtering 0 <- disable echo 0 > /sys/class/net/br0/bridge/default_vlan ip link set dev sw0p1 up <- add ports ip link set dev sw0p2 up ip link set dev sw0p1 master br0 ip link set dev sw0p2 master br0 echo 1 > /sys/class/net/br0/bridge/default_vlan <- enable ip link set dev br0 type bridge vlan_filtering 1 bridge vlan add dev br0 vid 1 pvid untagged self - STP tested with vlan_filtering 1/0. To make STP work I've had to set NO_SA_UPDATE for all slave ports (see comment in code). It also required to statically register STP mcast address {0x01, 0x80, 0xc2, 0x0, 0x0, 0x0}; - allowed build both TI_CPSW and TI_CPSW_SWITCHDEV drivers - PTP can be enabled on both ports in dual_mac mode [1] https://patchwork.ozlabs.org/cover/929367/ [2] https://patches.linaro.org/cover/136709/ [3] https://patchwork.kernel.org/cover/11035813/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * arm: omap2plus_defconfig: enable new cpsw switchdev driverGrygorii Strashko2019-11-201-0/+1
| | | | | | | | | | | | | | Add CONFIG_TI_CPSW_SWITCHDEV option to enable new cpsw switchdev driver Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ARM: dts: am571x-idk: enable for new cpsw switch dev driverGrygorii Strashko2019-11-204-5/+37
| | | | | | | | | | | | | | | | Add DT nodes for new cpsw switchdev driver for am571x-idk board for now to enable testing of the new solution. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ARM: dts: dra7: add dt nodes for new cpsw switch dev driverGrygorii Strashko2019-11-201-0/+52
| | | | | | | | | | | | | | Add DT nodes for new cpsw switch dev driver. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Documentation: networking: add cpsw switchdev based driver documentationIlias Apalodimas2019-11-202-0/+219
| | | | | | | | | | | | | | | | | | A new cpsw dirver based on switchdev was added. Add documentation about basic configuration and future features Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * phy: ti: phy-gmii-sel: dependency from ti cpsw-switchdev driverGrygorii Strashko2019-11-201-2/+2
| | | | | | | | | | | | | | Add dependency from TI_CPSW_SWITCHDEV. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: introduce cpsw switchdev based driver part 2 - switchIlias Apalodimas2019-11-205-9/+993
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPSW switchdev based driver which is operating in dual-emac mode by default, thus working as 2 individual network interfaces. The Switch mode can be enabled by configuring devlink driver parameter "switch_mode" to 1: devlink dev param set platform/48484000.switch \ name switch_mode value 1 cmode runtime This can be done regardless of the state of Port's netdevs - UP/DOWN, but Port's netdev devices have to be UP before joining the bridge to avoid overwriting of bridge configuration as CPSW switch driver completely reloads its configuration when first Port changes its state to UP. When the both interfaces joined the bridge - CPSW switch driver will start marking packets with offload_fwd_mark flag unless "ale_bypass=0". All configuration is implemented via switchdev API and notifiers. Supported: - SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS - SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: BR_MCAST_FLOOD - SWITCHDEV_ATTR_ID_PORT_STP_STATE - SWITCHDEV_OBJ_ID_PORT_VLAN - SWITCHDEV_OBJ_ID_PORT_MDB - SWITCHDEV_OBJ_ID_HOST_MDB Hence CPSW switchdev driver supports: - FDB offloading - MDB offloading - VLAN filtering and offloading - STP Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emacIlias Apalodimas2019-11-205-5/+1710
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 1: Introduce basic CPSW dual_mac driver (cpsw_new.c) which is operating in dual-emac mode by default, thus working as 2 individual network interfaces. Main differences from legacy CPSW driver are: - optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in addition to ALLMULTI (current port) instead of ALE_BYPASS. So, Ports in promiscuous mode will keep possibility of mcast and vlan filtering, which is provides significant benefits when ports are joined to the same bridge, but without enabling "switch" mode, or to different bridges. - learning disabled on ports as it make not too much sense for segregated ports - no forwarding in HW. - enabled basic support for devlink. devlink dev show platform/48484000.switch devlink dev param show platform/48484000.switch: name ale_bypass type driver-specific values: cmode runtime value false - "ale_bypass" devlink driver parameter allows to enable ALE_CONTROL(4).BYPASS mode for debug purposes. - updated DT bindings. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * dt-bindings: net: ti: add new cpsw switch driver bindingsGrygorii Strashko2019-11-201-0/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add bindings for the new TI CPSW switch driver. Comparing to the legacy bindings (net/cpsw.txt): - ports definition follows DSA bindings (net/dsa/dsa.txt) and ports can be marked as "disabled" if not physically wired. - all deprecated properties dropped; - all legacy propertiies dropped which represent constant HW cpapbilities (cpdma_channels, ale_entries, bd_ram_size, mac_control, slaves, active_slave) - TI CPTS DT properties are reused as is, but grouped in "cpts" sub-node - TI Davinci MDIO DT bindings are reused as is, because Davinci MDIO is reused. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: cpsw: move set of common functions in cpsw_privGrygorii Strashko2019-11-203-1264/+1293
| | | | | | | | | | | | | | | | | | | | | | As a preparatory patch to add support for a switchdev based cpsw driver, move common functions to cpsw-priv.c so that they can be used across both drivers. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: cpsw: resolve build deps of cpsw driversGrygorii Strashko2019-11-203-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A following patches introduce new CPSW switchdev driver which uses common code with legacy CPSW driver. This will introduce build dependency between CPSW switchdev and CPSW legacy drivers related to for_each_slave() and cpsw_slave_index() - they can be compiled both, but only one of them will be not functional depending in Kconfig settings due to duffrences in Slave Ports indexes calculation. To fix this make for_each_slave() local (it's used now only by legacy CPSW driver) and convert cpsw_slave_index() to be a function pointer which is assigned in probe. Driver to probe is defined by DT. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: ale: modify vlan/mdb api for switchdevIlias Apalodimas2019-11-202-10/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A following patch introduces switchdev functionality, so modify ALE engine VLANs/MDBs API: - cpsw_ale_del_mcast(): update so it will remove only selected ports from mcast port_mask or delete whole mcast record if !port_mask - cpsw_ale_del_vlan(): update so it will remove only selected ports from all VLAN record's masks or delete whole VLAN record if !port_mask - add cpsw_ale_vlan_add_modify() to add or modify existing VLAN record's masks - add cpsw_ale_set_unreg_mcast() for enabling unreg mcast on port VLANs Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: cpsw: allow untagged traffic on host portGrygorii Strashko2019-11-203-10/+35
| | | | | | | | | | | | | | | | | | | | Now untagged vlan traffic is not support on Host P0 port. This patch adds in ALE context bitmap of VLANs for which Host P0 port bit set in Force Untagged Packet Egress bitmask in VLANs ALE entries, and adds corresponding check in VLAN incapsulation header parsing function cpsw_rx_vlan_encap(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: ti: ale: clean ale tbl on init and intf restartGrygorii Strashko2019-11-201-0/+2
|/ | | | | | | | Clean CPSW ALE on init and intf restart (up/down) to avoid reading obsolete or garbage entries from ALE table. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'nf_tables_offload-vlan-matching-support'David S. Miller2019-11-205-6/+59
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== nf_tables_offload: vlan matching support The following patchset contains Netfilter support for vlan matching offloads: 1) Constify nft_reg_load() as a preparation patch. 2) Restrict rule matching to ingress interface type ARPHRD_ETHER. 3) Add new vlan_tci field to flow_dissector_key_vlan structure, to allow to set up vlan_id, vlan_dei and vlan_priority in one go. 4) C-VLAN matching support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nft_payload: add C-VLAN offload supportPablo Neira Ayuso2019-11-201-0/+16
| | | | | | | | | | | | | | | | | | Match on h_vlan_encapsulated_proto and set up protocol dependency. Check for protocol dependency before accessing the tci field. Allow to match on the encapsulated ethertype too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nft_payload: add VLAN offload supportPablo Neira Ayuso2019-11-202-3/+28
| | | | | | | | | | | | | | | | | | Match on ethertype and set up protocol dependency. Check for protocol dependency before accessing the tci field. Allow to match on the encapsulated ethertype too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables_offload: allow ethernet interface type onlyPablo Neira Ayuso2019-11-203-0/+12
| | | | | | | | | | | | | | | | | | | | | | Hardware offload support at this stage assumes an ethernet device in place. The flow dissector provides the intermediate representation to express this selector, so extend it to allow to store the interface type. Flower does not uses this, so skb_flow_dissect_meta() is not extended to match on this new field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: constify nft_reg_load{8, 16, 64}()Pablo Neira Ayuso2019-11-201-3/+3
|/ | | | | | | | This patch constifies the pointer to source register data that is passed as an input parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* lwtunnel: add support for multiple geneve optsXin Long2019-11-201-36/+75
| | | | | | | | | | | | | | | | | | | | | | | geneve RFC (draft-ietf-nvo3-geneve-14) allows a geneve packet to carry multiple geneve opts, so it's necessary for lwtunnel to support adding multiple geneve opts in one lwtunnel route. But vxlan and erspan opts are still only allowed to add one option. With this patch, iproute2 could make it like: # ip r a 1.1.1.0/24 encap ip id 1 geneve_opts 0:0:12121212,1:2:12121212 \ dst 10.1.0.2 dev geneve1 # ip r a 1.1.1.0/24 encap ip id 1 vxlan_opts 456 \ dst 10.1.0.2 dev erspan1 # ip r a 1.1.1.0/24 encap ip id 1 erspan_opts 1:123:0:0 \ dst 10.1.0.2 dev erspan1 Which are pretty much like cls_flower and act_tunnel_key. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cxgb4: remove unneeded semicolon for switch blockRahul Lakkireddy2019-11-191-1/+1
| | | | | | | | | | | | Semicolon is not required at the end of switch block. So, remove it. Addresses coccinelle warning: drivers/net/ethernet/chelsio/cxgb4/sge.c:2260:2-3: Unneeded semicolon Fixes: 4846d5330daf ("cxgb4: add Tx and Rx path for ETHOFLD traffic") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: felix: Fix CPU port assignment when not last portVladimir Oltean2019-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the NXP LS1028A, there are 2 Ethernet links between the Felix switch and the ENETC: - eno2 <-> swp4, at 2.5G - eno3 <-> swp5, at 1G Only one of the above Ethernet port pairs can act as a DSA link for tagging. When adding initial support for the driver, it was tested only on the 1G eno3 <-> swp5 interface, due to the necessity of using PHYLIB initially (which treats fixed-link interfaces as emulated C22 PHYs, so it doesn't support fixed-link speeds higher than 1G). After making PHYLINK work, it appears that swp4 still can't act as CPU port. So it looks like ocelot_set_cpu_port was being called for swp4, but then it was called again for swp5, overwriting the CPU port assigned in the DT. It appears that when you call dsa_upstream_port for a port that is not defined in the device tree (such as swp5 when using swp4 as CPU port), its dp->cpu_dp pointer is not initialized by dsa_tree_setup_default_cpu, and this trips up the following condition in dsa_upstream_port: if (!cpu_dp) return port; So the moral of the story is: don't call dsa_upstream_port for a port that is not defined in the device tree, and therefore its dsa_port structure is not completely initialized (ds->num_ports is still 6). Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: dp83869: fix return of uninitialized variable retColin Ian King2019-11-181-1/+1
| | | | | | | | | | | | | In the case where the call to phy_interface_is_rgmii returns zero the variable ret is left uninitialized and this is returned at the end of the function dp83869_configure_rgmii. Fix this by returning 0 instead of the uninitialized value in ret. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 01db923e8377 ("net: phy: dp83869: Add TI dp83869 phy") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* lwtunnel: change to use nla_put_u8 for LWTUNNEL_IP_OPT_ERSPAN_VERXin Long2019-11-181-1/+1
| | | | | | | | | | LWTUNNEL_IP_OPT_ERSPAN_VER is u8 type, and nla_put_u8 should have been used instead of nla_put_u32(). This is a copy-paste error. Fixes: b0a21810bd5e ("lwtunnel: add options setting and dumping for erspan") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'bnxt_en-Updates'David S. Miller2019-11-186-51/+145
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Chan says: ==================== bnxt_en: Updates. This series has the firmware interface update that changes the aRFS/ntuple interface on 57500 chips. The 2nd patch adds a counter and improves the hardware buffer error handling on the 57500 chips. The rest of the series is mainly enhancements on error recovery and firmware reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Abort waiting for firmware response if there is no heartbeat.Pavan Chebbi2019-11-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | This is especially beneficial during the NVRAM related firmware commands that have longer timeouts. If the BNXT_STATE_FW_FATAL_COND flag gets set while waiting for firmware response, abort and return error. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Add a warning message for driver initiated resetVasundhara Volam2019-11-181-0/+1
| | | | | | | | | | | | | | | | During loss of heartbeat, log this warning message. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Return proper error code for non-existent NVM variableVasundhara Volam2019-11-181-1/+8
| | | | | | | | | | | | | | | | | | For NVM params that are not supported in the current NVM configuration, return the error as -EOPNOTSUPP. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Report health status update after reset is doneVasundhara Volam2019-11-184-0/+26
| | | | | | | | | | | | | | | | | | | | Report health status update to devlink health reporter, once reset is completed. Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Set MASTER flag during driver registration.Vasundhara Volam2019-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | The Linux driver is capable of being the master function to handle resets, so we set the flag to let firmware know. Some other drivers, such as DPDK, is not capable and will not set the flag. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Extend ETHTOOL_RESET to hot reset driver.Vasundhara Volam2019-11-183-2/+10
| | | | | | | | | | | | | | | | | | | | | | If firmware supports hot reset, extend ETHTOOL_RESET to support hot reset driver which does not require a driver reload after ETHTOOL_RESET. The driver will go through the same coordinated reset sequence as a firmware initiated fatal/non-fatal reset. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Increase firmware response timeout for coredump commands.Vasundhara Volam2019-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | Use the larger HWRM_COREDUMP_TIMEOUT value for coredump related data response from the firmware. These commands take longer than normal commands. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Improve RX buffer error handling.Michael Chan2019-11-183-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | When hardware reports RX buffer errors, the latest 57500 chips do not require reset. The packet is discarded by the hardware and the ring will continue to operate. Also, add an rx_buf_errors counter for this type of error. It can help the user to identify if the aggregation ring is too small. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Update firmware interface spec to 1.10.1.12.Michael Chan2019-11-183-44/+77
|/ | | | | | | | The aRFS ring table interface has changed for the 57500 chips. Updating it accordingly so it will work with the latest production firmware. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'selftests-Add-ethtool-and-scale-tests'David S. Miller2019-11-185-4/+440
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== selftests: Add ethtool and scale tests This patch set adds generic ethtool tests and a mlxsw-specific router scale test for Spectrum-2. Patches #1-#2 from Danielle add the router scale test for Spectrum-2. It re-uses the same test as Spectrum-1, but it is invoked with a different scale, according to what it is queried from devlink-resource. Patches #3-#5 from Amit are a re-work of the ethtool tests that were posted in the past [1]. Patches #3-#4 add the necessary library routines, whereas patch #5 adds the test itself. The test checks both good and bad flows with autoneg on and off. The test plan it detailed in the commit message. Last time Andrew and Florian (copied) provided very useful feedback that is incorporated in this set. Namely: * Parse the value of the different link modes from /usr/include/linux/ethtool.h * Differentiate between supported and advertised speeds and use the latter in autoneg tests * Make the test generic and move it to net/forwarding/ instead of being mlxsw-specific [1] https://patchwork.ozlabs.org/cover/1112903/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests: forwarding: Add speed and auto-negotiation testAmit Cohen2019-11-181-0/+318
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check configurations and packets transference with different variations of autoneg and speed. Test plan: 1. Test force of same speed with autoneg off 2. Test force of different speeds with autoneg off (should fail) 3. One side is autoneg on and other side sets force of common speeds 4. One side is autoneg on and other side only advertises a subset of the common speeds (one speed of the subset) 5. One side is autoneg on and other side only advertises a subset of the common speeds. Check that highest speed is negotiated 6. Test autoneg on, but each side advertises different speeds (should fail) Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests: forwarding: lib.sh: Add wait for dev with timeoutAmit Cohen2019-11-181-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a function that waits for device with maximum number of iterations. It enables to limit the waiting and prevent infinite loop. This will be used by the subsequent patch which will set two ports to different speeds in order to make sure they cannot negotiate a link. Waiting for all the setup is limited with 10 minutes for each device. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests: forwarding: Add ethtool_lib.shAmit Cohen2019-11-181-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functions: 1. speeds_arr_get The function returns an array of speed values from /usr/include/linux/ethtool.h The array looks as follows: [10baseT/Half] = 0, [10baseT/Full] = 1, ... 2. ethtool_set: params: cmd The function runs ethtool by cmd (ethtool -s cmd) and checks if there was an error in configuration 3. dev_speeds_get: params: dev, with_mode (0 or 1), adver (0 or 1) return value: Array of supported/Advertised link modes with/without mode * Example 1: speeds_get swp1 0 0 return: 1000 10000 40000 * Example 2: speeds_get swp1 1 1 return: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full 4. common_speeds_get: params: dev1, dev2, with_mode (0 or 1), adver (0 or 1) return value: Array of common speeds of dev1 and dev2 * Example: common_speeds_get swp1 swp2 0 0 return: 1000 10000 Assuming that swp1 supports 1000 10000 40000 and swp2 supports 1000 10000 Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests: mlxsw: Check devlink device before running testDanielle Ratson2019-11-181-0/+5
| | | | | | | | | | | | | | | | | | The scale test for Spectrum-2 should only be invoked for Spectrum-2. Skip the test otherwise. Signed-off-by: Danielle Ratson <danieller@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests: mlxsw: Add router scale test for Spectrum-2Danielle Ratson2019-11-182-1/+22
|/ | | | | | | | | | | Same as for Spectrum-1, test the ability to add the maximum number of routes possible to the switch. Invoke the test from the 'resource_scale' wrapper script. Signed-off-by: Danielle Ratson <danieller@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'page_pool-followup-changes-to-restore-tracepoint-features'David S. Miller2019-11-184-14/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jesper Dangaard says: ==================== page_pool: followup changes to restore tracepoint features This patchset is a followup to Jonathan patch, that do not release pool until inflight == 0. That changed page_pool to be responsible for its own delayed destruction instead of relying on xdp memory model. As the page_pool maintainer, I'm promoting the use of tracepoint to troubleshoot and help driver developers verify correctness when converting at driver to use page_pool. The role of xdp:mem_disconnect have changed, which broke my bpftrace tools for shutdown verification. With these changes, the same capabilities are regained. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * page_pool: extend tracepoint to also include the page PFNJesper Dangaard Brouer2019-11-181-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | The MM tracepoint for page free (called kmem:mm_page_free) doesn't provide the page pointer directly, instead it provides the PFN (Page Frame Number). This is annoying when writing a page_pool leak detector in BPF. This patch change page_pool tracepoints to also provide the PFN. The page pointer is still provided to allow other kinds of troubleshooting from BPF. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * page_pool: add destroy attempts counter and rename tracepointJesper Dangaard Brouer2019-11-183-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Jonathan change the page_pool to become responsible to its own shutdown via deferred work queue, then the disconnect_cnt counter was removed from xdp memory model tracepoint. This patch change the page_pool_inflight tracepoint name to page_pool_release, because it reflects the new responsability better. And it reintroduces a counter that reflect the number of times page_pool_release have been tried. The counter is also used by the code, to only empty the alloc cache once. With a stuck work queue running every second and counter being 64-bit, it will overrun in approx 584 billion years. For comparison, Earth lifetime expectancy is 7.5 billion years, before the Sun will engulf, and destroy, the Earth. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * xdp: remove memory poison on free for struct xdp_mem_allocatorJesper Dangaard Brouer2019-11-181-5/+0
|/ | | | | | | | | | | | | | | | | | When looking at the details I realised that the memory poison in __xdp_mem_allocator_rcu_free doesn't make sense. This is because the SLUB allocator uses the first 16 bytes (on 64 bit), for its freelist, which overlap with members in struct xdp_mem_allocator, that were updated. Thus, SLUB already does the "poisoning" for us. I still believe that poisoning memory make sense in other cases. Kernel have gained different use-after-free detection mechanism, but enabling those is associated with a huge overhead. Experience is that debugging facilities can change the timing so much, that that a race condition will not be provoked when enabled. Thus, I'm still in favour of poisoning memory where it makes sense. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: avoid matching all-ones clause 45 PHY IDsRussell King2019-11-181-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently match clause 45 PHYs using any ID read from a MMD marked as present in the "Devices in package" registers 5 and 6. However, this is incorrect. 45.2 says: "The definition of the term package is vendor specific and could be a chip, module, or other similar entity." so a package could be more or less than the whole PHY - a PHY could be made up of several modules instantiated onto a single chip such as the Marvell 88x3310, or some of the MMDs could be disabled according to chip configuration, such as the Broadcom 84881. In the case of Broadcom 84881, the "Devices in package" registers contain 0xc000009b, meaning that there is a PHYXS present in the package, but all registers in MMD 4 return 0xffff. This leads to our matching code incorrectly binding this PHY to one of our generic PHY drivers. This patch changes the way we determine whether to attempt to match a MMD identifier, or use it to request a module - if the identifier is all-ones, then we skip over it. When reading the identifiers, we initialise phydev->c45_ids.device_ids to all-ones, only reading the device ID if the "Devices in package" registers indicates we should. This avoids the generic drivers incorrectly matching on a PHY ID of 0xffffffff. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'Add-support-for-SFPs-behind-PHYs'David S. Miller2019-11-186-1/+118
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell King says: ==================== Add support for SFPs behind PHYs This series adds partial support for SFP cages connected to PHYs, specifically optical SFPs. We add core infrastructure to phylib for this, and arrange for minimal code in the PHY driver - currently, this is code to verify that the module is one that we can support for Marvell 10G PHYs. v2: add yaml binding patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: phy: marvell10g: add SFP+ supportRussell King2019-11-181-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for SFP+ cages to the Marvell 10G PHY driver. This is slightly complicated by the way phylib works in that we need to use a multi-step process to attach the SFP bus, and we also need to track the phylink state machine to know when the module's transmit disable signal should change state. With appropriate DT changes, this allows the SFP+ canges on the Macchiatobin platform to be functional. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: phy: add core phylib sfp supportRussell King2019-11-183-0/+84
| | | | | | | | | | | | | | | | | | | | Add core phylib help for supporting SFP sockets on PHYs. This provides a mechanism to inform the SFP layer about PHY up/down events, and also unregister the SFP bus when the PHY is going away. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>