aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tls: Add opt-in zerocopy mode of sendfile()Boris Pismenny2022-05-194-13/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TLS device offload copies sendfile data to a bounce buffer before transmitting. It allows to maintain the valid MAC on TLS records when the file contents change and a part of TLS record has to be retransmitted on TCP level. In many common use cases (like serving static files over HTTPS) the file contents are not changed on the fly. In many use cases breaking the connection is totally acceptable if the file is changed during transmission, because it would be received corrupted in any case. This commit allows to optimize performance for such use cases to providing a new optional mode of TLS sendfile(), in which the extra copy is skipped. Removing this copy improves performance significantly, as TLS and TCP sendfile perform the same operations, and the only overhead is TLS header/trailer insertion. The new mode can only be enabled with the new socket option named TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards compatibility with existing applications that rely on the copying behavior. The new mode is safe, meaning that unsolicited modifications of the file being sent can't break integrity of the kernel. The worst thing that can happen is sending a corrupted TLS record, which is in any case not forbidden when using regular TCP sockets. Sockets other than TLS device offload are not affected by the new socket option. The actual status of zerocopy sendfile can be queried with sock_diag. Performance numbers in a single-core test with 24 HTTPS streams on nginx, under 100% CPU load: * non-zerocopy: 33.6 Gbit/s * zerocopy: 79.92 Gbit/s CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz Signed-off-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* nfp: flower: support ct merging when mangle action existsYinjun Zhang2022-05-192-105/+189
| | | | | | | | | | | | | | | | Current implementation of ct merging doesn't support the case that the fields mangling in pre_ct rules are matched in post_ct rules. This change is to support merging when mangling mac address, ip address, tos, ttl and l4 port. VLAN and MPLS mangling is not involved yet. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220518075055.130649-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: fec: Avoid allocating rx buffer using ATOMIC in ndo_openMichael Trimarchi2022-05-191-1/+1
| | | | | | | | | Make ndo_open less sensitive to memory pressure. Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220518062007.10056-1-michael@amarulasolutions.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5: fix multiple definitions of mlx5_lag_mpesw_init / ↵Jakub Kicinski2022-05-181-2/+2
| | | | | | | | | | | mlx5_lag_mpesw_cleanup static inline is needed in the header. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20220518183022.2034373-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* sfc: siena: Have a unique wrapper ifndef for efx channels headerSaeed Mahameed2022-05-181-2/+2
| | | | | | | | | | | | | | | | | | Both sfc/efx_channels.h and sfc/siena/efx_channels.h used the same wrapper #ifndef EFX_CHANNELS_H, this patch changes the siena define to be EFX_SIENA_CHANNELS_H to avoid build system confusion. This fixes the following build break: drivers/net/ethernet/sfc/ptp.c:2191:28: error: ‘efx_copy_channel’ undeclared here (not in a function); did you mean ‘efx_ptp_channel’? 2191 | .copy = efx_copy_channel, Fixes: 6e173d3b4af9 ("sfc: Copy shared files needed for Siena (part 1)") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220518065820.131611-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge branch 'octeon_ep-fix-the-error-handling-path-of-octep_request_irqs'Jakub Kicinski2022-05-181-11/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Christophe says: ==================== octeon_ep: Fix the error handling path of octep_request_irqs() I send a small serie to ease review and because I'm sighly less confident with the 2nd patch. ==================== Link: https://lore.kernel.org/r/cover.1652819974.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * octeon_ep: Fix irq releasing in the error handling path of octep_request_irqs()Christophe JAILLET2022-05-181-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When taken, the error handling path does not undo correctly what has already been allocated. Introduce a new loop index, 'j', in order to simplify the error handling path and rewrite part of it. It is now written with the same logic and intermediate variables used when resources are allocated. This is much more straightforward. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * octeon_ep: Fix a memory leak in the error handling path of octep_request_irqs()Christophe JAILLET2022-05-181-0/+2
|/ | | | | | | | | | | | | 'oct->non_ioq_irq_names' is not freed in the error handling path of octep_request_irqs(). Add the missing kfree(). Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Veerasenareddy Burru <vburru@marvell.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge branch 'adin-add-support-for-clock-output'Jakub Kicinski2022-05-183-0/+65
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Josua Mayer says: ==================== adin: add support for clock output This patch series adds support for configuring the two clock outputs of adin 1200 and 1300 PHYs. Certain network controllers require an external reference clock which can be provided by the PHY. One of the replies to v1 was asking why the common clock framework isn't used. Currently no PHY driver has implemented providing a clock to the network controller. Instead they rely on vendor extensions to make the appropriate configuration. For example ar8035 uses qca,clk-out-frequency - this patchset aimed to replicate the same functionality. Finally the 125MHz free-running clock is enabled in the device-tree for SolidRun i.MX6 SoMs, to support revisions 1.9 and later, where the original phy has been replaced with an adin 1300. To avoid introducing new warning messages during boot for SoMs before rev 1.9, the status field of the new phy node is disabled by default, and will be enabled by U-Boot on demand. ==================== Link: https://lore.kernel.org/r/20220517085143.3749-1-josua@solid-run.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ARM: dts: imx6qdl-sr-som: update phy configuration for som revision 1.9Josua Mayer2022-05-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since SoM revision 1.9 the PHY has been replaced with an ADIN1300, add an entry for it next to the original. As Russell King pointed out, additional phy nodes cause warnings like: mdio_bus 2188000.ethernet-1: MDIO device at address 1 is missing To avoid this the new node has its status set to disabled. U-Boot will be modified to enable the appropriate phy node after probing. The existing ar8035 nodes have to stay enabled by default to avoid breaking existing systems when they update Linux only. Co-developed-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Signed-off-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: phy: adin: add support for clock outputJosua Mayer2022-05-181-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ADIN1300 supports generating certain clocks on its GP_CLK pin, as well as providing the reference clock on CLK25_REF. Add support for selecting the clock via device-tree properties. Technically the phy also supports a recovered 125MHz clock for synchronous ethernet. SyncE should be configured dynamically at runtime, however Linux does not currently have a toggle for this, so support is explicitly omitted. Co-developed-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Signed-off-by: Josua Mayer<josua@solid-run.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * dt-bindings: net: adin: document phy clock output propertiesJosua Mayer2022-05-181-0/+15
|/ | | | | | | | | | | | | | | The ADIN1300 supports generating certain clocks on its GP_CLK pin, as well as providing the reference clock on CLK25_REF. Add DT properties to configure both pins. Technically the phy also supports a recovered 125MHz clock for synchronous ethernet. However SyncE should be configured dynamically at runtime, so it is explicitly omitted in this binding. Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: smc911x: Fix min() use in debug codeGeert Uytterhoeven2022-05-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | If ENABLE_SMC_DEBUG_PKTS=1: drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_hardware_send_pkt’: include/linux/minmax.h:20:28: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ drivers/net/ethernet/smsc/smc911x.c:483:17: note: in expansion of macro ‘min’ 483 | PRINT_PKT(buf, min(len, 64)); Fix this by making the constant unsigned, to match the type of "len". While at it, replace the other missed ternary operator by min(), too. Convert the dummy PRINT_PKT() from a macro to a static inline function, to catch mistakes like this without having to enable debug options manually. Fixes: 5ff0348b7f755aac ("net: smc911x: replace ternary operator with min()") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: sunplus: add missing of_node_put() in spl2sw_mdio_init()Yang Yingliang2022-05-181-3/+8
| | | | | | | | | | | | of_get_child_by_name() returns device node pointer with refcount incremented. The refcount should be decremented before returning from spl2sw_mdio_init(). Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: netdevsim: Increase sleep time in hw_stats_l3.sh testDanielle Ratson2022-05-181-2/+2
| | | | | | | | | | | | | | | hw_stats_l3.sh test is failing often for l3 stats shows less than 20 packets after 2 seconds sleep. This is happening since there is a race between the 2 seconds sleep and the netdevsim actually delivering the packets. Increase the sleep time so the packets will be delivered successfully on time. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* eth: sun: cassini: remove dead codeMartin Liška2022-05-181-2/+2
| | | | | | | | | | | | | Fixes the following GCC warning: drivers/net/ethernet/sun/cassini.c:1316:29: error: comparison between two arrays [-Werror=array-compare] drivers/net/ethernet/sun/cassini.c:3783:34: error: comparison between two arrays [-Werror=array-compare] Note that 2 arrays should be compared by comparing of their addresses: note: use ‘&cas_prog_workaroundtab[0] == &cas_prog_null[0]’ to compare the addresses Signed-off-by: Martin Liska <mliska@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: stmmac: remove unused get_addr() callbackVincent Whitchurch2022-05-185-28/+0
| | | | | | | | | | | | | The last caller of the stmmac_desc_ops::get_addr() callback was removed a while ago, so remove the unused callback. Note that the callback also only gets half the descriptor address on systems with 64-bit descriptor addresses, so that should be fixed if it needs to be resurrected later. Fixes: ec222003bd948de8f3 ("net: stmmac: Prepare to add Split Header support") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'armada-3720-turris-mox-and-orion-mdio'David S. Miller2022-05-182-7/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Packham says: ==================== armada-3720-turris-mox and orion-mdio This is a follow up to the change that converted the orion-mdio dt-binding from txt to DT schema format. At the time I thought the binding needed 'unevaluatedProperties: false' because the core mdio.yaml binding didn't handle the DSA switches. In reality it was simply the invalid reg property causing the downstream nodes to be unevaluated. Fixing the reg nodes means we can set 'unevaluatedProperties: true' Marek, I don't know if you had a change for the reg properties in flight. I didn't see anything on lore/lkml so sorry if this crosses with something you've done. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * dt-bindings: net: marvell,orion-mdio: Set unevaluatedProperties to falseChris Packham2022-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the binding was converted it appeared necessary to set 'unevaluatedProperties: true' because of the switch devices on the turris-mox board. Actually the error was because of the reg property being incorrect causing the rest of the properties to be unevaluated. After the reg properties are fixed for turris-mox we can set 'unevaluatedProperties: false' as is generally expected. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * arm64: dts: armada-3720-turris-mox: Correct reg property for mdio devicesChris Packham2022-05-181-6/+6
|/ | | | | | | | | | | | MDIO devices have #address-cells = <1>, #size-cells = <0>. Now that we have a schema enforcing this for marvell,orion-mdio we can see that the turris-mox has a unnecessary 2nd cell for the switch nodes reg property of it's switch devices. Remove the unnecessary 2nd cell from the switches reg property. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'dsa-microchip-ksz_switch-refactor'David S. Miller2022-05-188-457/+664
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arun Ramadoss says: ==================== net: dsa: microchip: refactor the ksz switch init function During the ksz_switch_register function, it calls the individual switches init functions (ksz8795.c and ksz9477.c). Both these functions have few things in common like, copying the chip specific data to struct ksz_dev, allocating ksz_port memory and mib_names memory & cnt. And to add the new LAN937x series switch, these allocations has to be replicated. Based on the review feedback of LAN937x part support patch, refactored the switch init function to move allocations to switch register. Link:https://patchwork.kernel.org/project/netdevbpf/patch/20220504151755.11737-8-arun.ramadoss@microchip.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: remove unused members in ksz_deviceArun Ramadoss2022-05-182-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | The name, regs_size and overrides members in struct ksz_device are unused. Hence remove it. And host_mask is used in only place of ksz8795.c file, which can be replaced by dev->info->cpu_ports Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: add the phylink get_capsArun Ramadoss2022-05-184-9/+117
| | | | | | | | | | | | | | | | | | | | | | | | This patch add the support for phylink_get_caps for ksz8795 and ksz9477 series switch. It updates the struct ksz_switch_chip with the details of the internal phys and xmii interface. Then during the get_caps based on the bits set in the structure, corresponding phy mode is set. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: move mib->cnt_ptr reset code to ksz_common.cPrasanna Vengateshan2022-05-183-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | mib->cnt_ptr resetting is handled in multiple places as part of port_init_cnt(). Hence moved mib->cnt_ptr code to ksz common layer and removed from individual product files. Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: move get_strings to ksz_commonArun Ramadoss2022-05-184-29/+20
| | | | | | | | | | | | | | | | | | | | ksz8795 and ksz9477 uses the same algorithm for copying the ethtool strings. Hence moved to ksz_common to remove the redundant code. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: move port memory allocation to ksz_commonArun Ramadoss2022-05-183-42/+21
| | | | | | | | | | | | | | | | | | | | ksz8795 and ksz9477 init function initializes the memory to dev->ports, mib counters and assigns the ds real number of ports. Since both the routines are same, moved the allocation of port memory to ksz_switch_register after init. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: move struct mib_names to ksz_chip_dataArun Ramadoss2022-05-184-157/+149
| | | | | | | | | | | | | | | | | | | | The ksz88xx family has one set of mib_names. The ksz87xx, ksz9477, LAN937x based switches has one set of mib_names. In order to remove redundant declaration, moved the struct mib_names to ksz_chip_data structure. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: perform the compatibility check for dev probedArun Ramadoss2022-05-186-22/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch perform the compatibility check for the device after the chip detect is done. It is to prevent the mismatch between the device compatible specified in the device tree and actual device found during the detect. The ksz9477 device doesn't use any .data in the of_device_id. But the ksz8795_spi uses .data for assigning the regmap between 8830 family and 87xx family switch. Changed the regmap assignment based on the chip_id from the .data. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: move ksz_chip_data to ksz_commonArun Ramadoss2022-05-184-206/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the ksz_chip_data in ksz8795 and ksz9477 to ksz_common. At present, the dev->chip_id is iterated with the ksz_chip_data and then copy its value to the ksz_dev structure. These values are declared as constant. Instead of copying the values and referencing it, this patch update the dev->info to the ksz_chip_data based on the chip_id in the init function. And also update the ksz_chip_data values for the LAN937x based switches. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: microchip: ksz8795: update the port_cnt value in ksz_chip_dataArun Ramadoss2022-05-181-2/+3
|/ | | | | | | | | | | | | The port_cnt value in the structure is not used in the switch_init. Instead it uses the fls(chip->cpu_port), this is due to one of port in the ksz8794 unavailable. The cpu_port for the 8794 is 0x10, fls(0x10) = 5, hence updating it directly in the ksz_chip_data structure in order to same with all the other switches in ksz8795.c and ksz9477.c files. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'mlx5-updates-2022-05-17' of ↵David S. Miller2022-05-1827-166/+489
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-05-17 MISC updates to mlx5 dirver 1) Aya Levin allows relaxed ordering over VFs 2) Gal Pressman Adds support XDP SQs for uplink representors in switchdev mode 3) Add debugfs TC stats and command failure syndrome for debuggability 4) Tariq uses variants of vzalloc where it could help 5) Multiport eswitch support from Elic Cohen: Eli Cohen Says: =============== The multiport eswitch feature allows to forward traffic from a representor net device to the uplink port of an associated eswitch's uplink port. This feature requires creating a LAG object. Since LAG can be created only once for a function, the feature is mutual exclusive with either bonding or multipath. Multipath eswitch mode is entered automatically these conditions are met: 1. No other LAG related mode is active. 2. A rule that explicitly forwards to an uplink port is inserted. The implementation maintains a reference count on such rules. When the reference count reaches zero, the LAG is released and other modes may be used. When an explicit rule that explicitly forwards to an uplink port is inserted while another LAG mode is active, that rule will not be offloaded by the hardware since the hardware cannot guarantee that the rule will actually be forwarded to that port. Example rules that forwards to an uplink port is: $ tc filter add dev rep0 root flower action mirred egress \ redirect dev uplinkrep0 $ tc filter add dev rep0 root flower action mirred egress \ redirect dev uplinkrep1 This feature is supported only if LAG_RESOURCE_ALLOCATION firmware configuration parameter is set to true. The series consists of three patches: 1. Lag state machine refactor This patch does not add new functionality but rather changes the way the state of the LAG is maintained. 2. Small fix to remove unused argument. 3. The actual implementation of the feature. =============== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5: Support multiport eswitch modeEli Cohen2022-05-1711-40/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Remove unused argumentEli Cohen2022-05-171-3/+1
| | | | | | | | | | | | | | | | | | Argument ndev is not used in mlx5_handle_changeupper_event() Remove it. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Lag, refactor lag state machineEli Cohen2022-05-174-68/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LAG state machine is implemented using bit flags. However, all these bit flags, except for MLX5_LAG_FLAG_HASH_BASED, are really mutual exclusive. In addition, MLX5_LAG_FLAG_READY is used by bonding to mark if we have our netdevices successfully added to lag and does not really belong in the same flags variable as the other flags. Rename MLX5_LAG_FLAG_READY to MLX5_LAG_FLAG_NDEVS_READY to better reflect its purpose and put it in a new flags variable. For the rest of the flags, we introduce a mode enum to hold the state of the LAG. Remove the shared fdb boolean flag from struct mlx5_lag and store this configuration as a mode flag. Change all flag related operations to use standard Linux APIs. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: Add XDP SQs to uplink representors steering tablesGal Pressman2022-05-171-2/+15
| | | | | | | | | | | | | | | | | | This patch adds the XDP SQs to the uplink representors steering tables in swichdev mode and enables XDP usage on them. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: Correct the calculation of max channels for repMoshe Tal2022-05-173-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | Correct the calculation of maximum channels of rep to better utilize the hardware resources and allow a larger scale of reps. This will allow creation of all virtual ports configured. Fixes: 473baf2e9e8c ("net/mlx5e: Allow profile-specific limitation on max num of channels") Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: CT: Add ct driver countersSaeed Mahameed2022-05-171-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connection offload is translated to multiple rules over several hardware flow tables. Unhandled end-cases may cause a hardware resource leak causing multiple system symptoms such as a host memory leak, decreased performance and other scale related issues. Export the current number of firmware FTEs related to the CT table as a debugfs counter. Also add a dropped packets counter to help debug packets dropped on restore failure. To show the offloaded count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded To show the dropped count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Roi Dayan <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com>
| * net/mlx5e: Allow relaxed ordering over VFsAya Levin2022-05-172-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By PCI spec, the config space of the VF always report relaxed ordering not supported while it inherits this property from its PF. Hence using pcie_relaxed_ordering_enable(), always disables the relaxed ordering on all VFs. Remove this check and rely on the firmware which queries the config space of the PF and set the capability bit accordingly. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marina Varshaver <marinav@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: Support partial GSO for tunnels over vlansGal Pressman2022-05-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Offloading outer checksum on tunnels requires GSO partial, add it to 'vlan_features' to allow offloading tunnels over vlans. For example, running GENEVE over vlan & ipv6 (mandatory UDP checksum) now allows for hardware TSO instead of software segmentation in GSO only. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: IPoIB, Improve ethtool rxnfc callback structure in IPoIBGal Pressman2022-05-171-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | Followup commit 79ce39be1d63 ("net/mlx5e: Improve ethtool rxnfc callback structure") and handle CONFIG_MLX5_EN_RXNFC enabled/disabled inside the fs layer so the ethtool callbacks are always available. The fs layer will provide stubs when CONFIG_MLX5_EN_RXNFC is compiled out. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: Allocate virtually contiguous memory for reps structuresTariq Toukan2022-05-171-6/+6
| | | | | | | | | | | | | | | | | | | | Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5e: Allocate virtually contiguous memory for VLANs listTariq Toukan2022-05-171-2/+2
| | | | | | | | | | | | | | | | | | | | Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Allocate virtually contiguous memory in pci_irq.cTariq Toukan2022-05-171-4/+4
| | | | | | | | | | | | | | | | | | | | Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Allocate virtually contiguous memory in vport.cTariq Toukan2022-05-171-26/+26
| | | | | | | | | | | | | | | | | | | | Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Inline db alloc API functionTariq Toukan2022-05-172-7/+6
| | | | | | | | | | | | | | | | | | Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: Add last command failure syndrome to debugfsMoshe Shemesh2022-05-173-2/+9
| | | | | | | | | | | | | | | | | | Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * net/mlx5: sparse: error: context imbalance in 'mlx5_vf_get_core_dev'Saeed Mahameed2022-05-171-2/+0
|/ | | | | | Removing the annotation resolves the issue for some reason. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
* octeontx2-pf: Add support for adaptive interrupt coalescingSuman Ghosh2022-05-177-8/+99
| | | | | | | | | | Added support for adaptive IRQ coalescing. It uses net_dim algorithm to find the suitable delay/IRQ count based on the current packet rate. Signed-off-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20220517044055.876158-1-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* dn_route: set rt neigh to blackhole_netdev instead of loopback_dev in ifdownXin Long2022-05-171-1/+1
| | | | | | | | | Like other places in ipv4/6 dst ifdown, change to use blackhole_netdev instead of pernet loopback_dev in dn dst ifdown. Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/0cdf10e5a4af509024f08644919121fb71645bc2.1652751029.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ptp: ptp_clockmatrix: return -EBUSY if phase pull-in is in progressMin Li2022-05-172-32/+2
| | | | | | | | | Also removes PEROUT_ENABLE_OUTPUT_MASK Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1652712427-14703-2-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>