diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/bareudp.rst | 52 | ||||
-rw-r--r-- | Documentation/networking/device_drivers/mellanox/mlx5.rst | 2 | ||||
-rw-r--r-- | Documentation/networking/devlink/devlink-trap.rst | 9 | ||||
-rw-r--r-- | Documentation/networking/devlink/mlx5.rst | 6 | ||||
-rw-r--r-- | Documentation/networking/ethtool-netlink.rst | 272 | ||||
-rw-r--r-- | Documentation/networking/index.rst | 1 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 9 | ||||
-rw-r--r-- | Documentation/networking/page_pool.rst | 159 | ||||
-rw-r--r-- | Documentation/networking/sfp-phylink.rst | 49 |
9 files changed, 515 insertions, 44 deletions
diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst new file mode 100644 index 000000000000..465a8b251bfe --- /dev/null +++ b/Documentation/networking/bareudp.rst @@ -0,0 +1,52 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================== +Bare UDP Tunnelling Module Documentation +======================================== + +There are various L3 encapsulation standards using UDP being discussed to +leverage the UDP based load balancing capability of different networks. +MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. + +The Bareudp tunnel module provides a generic L3 encapsulation tunnelling +support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside +a UDP tunnel. + +Special Handling +---------------- +The bareudp device supports special handling for MPLS & IP as they can have +multiple ethertypes. +MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). +IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). +This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC +with a flag called multiproto mode. + +Usage +------ + +1) Device creation & deletion + + a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847. + + This creates a bareudp tunnel device which tunnels L3 traffic with ethertype + 0x8847 (MPLS traffic). The destination port of the UDP header will be set to + 6635.The device will listen on UDP port 6635 to receive traffic. + + b) ip link delete bareudp0 + +2) Device creation with multiple proto mode enabled + +There are two ways to create a bareudp device for MPLS & IP with multiproto mode +enabled. + + a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto + + b) ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls + +3) Device Usage + +The bareudp device could be used along with OVS or flower filter in TC. +The OVS or TC flower layer must set the tunnel information in SKB dst field before +sending packet buffer to the bareudp device for transmission. On reception the +bareudp device extracts and stores the tunnel information in SKB dst field before +passing the packet buffer to the network stack. diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst index f575a49790e8..e9b65035cd47 100644 --- a/Documentation/networking/device_drivers/mellanox/mlx5.rst +++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst @@ -101,7 +101,7 @@ Enabling the driver and kconfig options **External options** ( Choose if the corresponding mlx5 feature is required ) - CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled -- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled. +- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled. - CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). Devlink info diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 47a429bb8658..63350e7332ce 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -238,6 +238,12 @@ be added to the following table: - ``drop`` - Traps NVE packets that the device decided to drop because their overlay source MAC is multicast + * - ``ingress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of ingress flow action drop + * - ``egress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of egress flow action drop Driver-specific Packet Traps ============================ @@ -277,6 +283,9 @@ narrow. The description of these groups must be added to the following table: * - ``tunnel_drops`` - Contains packet traps for packets that were dropped by the device during tunnel encapsulation / decapsulation + * - ``acl_drops`` + - Contains packet traps for packets that were dropped by the device during + ACL processing Testing ======= diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 629a6e69c036..4e4b97f7971a 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -37,6 +37,12 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + * - ``fdb_large_groups`` + - u32 + - driverinit + - Control the number of large groups (size > 1) in the FDB table. + + * The default value is 15, and the range is between 1 and 1024. The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index f1f868479ceb..31a601cafa3f 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -189,6 +189,14 @@ Userspace to kernel: ``ETHTOOL_MSG_DEBUG_SET`` set debugging settings ``ETHTOOL_MSG_WOL_GET`` get wake-on-lan settings ``ETHTOOL_MSG_WOL_SET`` set wake-on-lan settings + ``ETHTOOL_MSG_FEATURES_GET`` get device features + ``ETHTOOL_MSG_FEATURES_SET`` set device features + ``ETHTOOL_MSG_PRIVFLAGS_GET`` get private flags + ``ETHTOOL_MSG_PRIVFLAGS_SET`` set private flags + ``ETHTOOL_MSG_RINGS_GET`` get ring sizes + ``ETHTOOL_MSG_RINGS_SET`` set ring sizes + ``ETHTOOL_MSG_CHANNELS_GET`` get channel counts + ``ETHTOOL_MSG_CHANNELS_SET`` set channel counts ===================================== ================================ Kernel to userspace: @@ -204,6 +212,15 @@ Kernel to userspace: ``ETHTOOL_MSG_DEBUG_NTF`` debugging settings notification ``ETHTOOL_MSG_WOL_GET_REPLY`` wake-on-lan settings ``ETHTOOL_MSG_WOL_NTF`` wake-on-lan settings notification + ``ETHTOOL_MSG_FEATURES_GET_REPLY`` device features + ``ETHTOOL_MSG_FEATURES_SET_REPLY`` optional reply to FEATURES_SET + ``ETHTOOL_MSG_FEATURES_NTF`` netdev features notification + ``ETHTOOL_MSG_PRIVFLAGS_GET_REPLY`` private flags + ``ETHTOOL_MSG_PRIVFLAGS_NTF`` private flags + ``ETHTOOL_MSG_RINGS_GET_REPLY`` ring sizes + ``ETHTOOL_MSG_RINGS_NTF`` ring sizes + ``ETHTOOL_MSG_CHANNELS_GET_REPLY`` channel counts + ``ETHTOOL_MSG_CHANNELS_NTF`` channel counts ===================================== ================================= ``GET`` requests are sent by userspace applications to retrieve device @@ -521,6 +538,213 @@ Request contents: ``WAKE_MAGICSECURE`` mode. +FEATURES_GET +============ + +Gets netdev features like ``ETHTOOL_GFEATURES`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested reply header + ``ETHTOOL_A_FEATURES_HW`` bitset dev->hw_features + ``ETHTOOL_A_FEATURES_WANTED`` bitset dev->wanted_features + ``ETHTOOL_A_FEATURES_ACTIVE`` bitset dev->features + ``ETHTOOL_A_FEATURES_NOCHANGE`` bitset NETIF_F_NEVER_CHANGE + ==================================== ====== ========================== + +Bitmaps in kernel response have the same meaning as bitmaps used in ioctl +interference but attribute names are different (they are based on +corresponding members of struct net_device). Legacy "flags" are not provided, +if userspace needs them (most likely only ethtool for backward compatibility), +it can calculate their values from related feature bits itself. +ETHA_FEATURES_HW uses mask consisting of all features recognized by kernel (to +provide all names when using verbose bitmap format), the other three use no +mask (simple bit lists). + + +FEATURES_SET +============ + +Request to set netdev features like ``ETHTOOL_SFEATURES`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested request header + ``ETHTOOL_A_FEATURES_WANTED`` bitset requested features + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested reply header + ``ETHTOOL_A_FEATURES_WANTED`` bitset diff wanted vs. result + ``ETHTOOL_A_FEATURES_ACTIVE`` bitset diff old vs. new active + ==================================== ====== ========================== + +Request constains only one bitset which can be either value/mask pair (request +to change specific feature bits and leave the rest) or only a value (request +to set all features to specified set). + +As request is subject to netdev_change_features() sanity checks, optional +kernel reply (can be suppressed by ``ETHTOOL_FLAG_OMIT_REPLY`` flag in request +header) informs client about the actual result. ``ETHTOOL_A_FEATURES_WANTED`` +reports the difference between client request and actual result: mask consists +of bits which differ between requested features and result (dev->features +after the operation), value consists of values of these bits in the request +(i.e. negated values from resulting features). ``ETHTOOL_A_FEATURES_ACTIVE`` +reports the difference between old and new dev->features: mask consists of +bits which have changed, values are their values in new dev->features (after +the operation). + +``ETHTOOL_MSG_FEATURES_NTF`` notification is sent not only if device features +are modified using ``ETHTOOL_MSG_FEATURES_SET`` request or on of ethtool ioctl +request but also each time features are modified with netdev_update_features() +or netdev_change_features(). + + +PRIVFLAGS_GET +============= + +Gets private flags like ``ETHTOOL_GPFLAGS`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested reply header + ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags + ==================================== ====== ========================== + +``ETHTOOL_A_PRIVFLAGS_FLAGS`` is a bitset with values of device private flags. +These flags are defined by driver, their number and names (and also meaning) +are device dependent. For compact bitset format, names can be retrieved as +``ETH_SS_PRIV_FLAGS`` string set. If verbose bitset format is requested, +response uses all private flags supported by the device as mask so that client +gets the full information without having to fetch the string set with names. + + +PRIVFLAGS_SET +============= + +Sets or modifies values of device private flags like ``ETHTOOL_SPFLAGS`` +ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header + ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags + ==================================== ====== ========================== + +``ETHTOOL_A_PRIVFLAGS_FLAGS`` can either set the whole set of private flags or +modify only values of some of them. + + +RINGS_GET +========= + +Gets ring sizes like ``ETHTOOL_GRINGPARAM`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ==================================== ====== ========================== + + +RINGS_SET +========= + +Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ==================================== ====== ========================== + +Kernel checks that requested ring sizes do not exceed limits reported by +driver. Driver may impose additional constraints and may not suspport all +attributes. + + +CHANNELS_GET +============ + +Gets channel counts like ``ETHTOOL_GCHANNELS`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested reply header + ``ETHTOOL_A_CHANNELS_RX_MAX`` u32 max receive channels + ``ETHTOOL_A_CHANNELS_TX_MAX`` u32 max transmit channels + ``ETHTOOL_A_CHANNELS_OTHER_MAX`` u32 max other channels + ``ETHTOOL_A_CHANNELS_COMBINED_MAX`` u32 max combined channels + ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count + ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count + ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count + ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count + ===================================== ====== ========================== + + +CHANNELS_SET +============ + +Sets channel counts like ``ETHTOOL_SCHANNELS`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested request header + ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count + ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count + ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count + ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count + ===================================== ====== ========================== + +Kernel checks that requested channel counts do not exceed limits reported by +driver. Driver may impose additional constraints and may not suspport all +attributes. + + Request translation =================== @@ -547,35 +771,35 @@ have their netlink replacement yet. ``ETHTOOL_SEEPROM`` n/a ``ETHTOOL_GCOALESCE`` n/a ``ETHTOOL_SCOALESCE`` n/a - ``ETHTOOL_GRINGPARAM`` n/a - ``ETHTOOL_SRINGPARAM`` n/a + ``ETHTOOL_GRINGPARAM`` ``ETHTOOL_MSG_RINGS_GET`` + ``ETHTOOL_SRINGPARAM`` ``ETHTOOL_MSG_RINGS_SET`` ``ETHTOOL_GPAUSEPARAM`` n/a ``ETHTOOL_SPAUSEPARAM`` n/a - ``ETHTOOL_GRXCSUM`` n/a - ``ETHTOOL_SRXCSUM`` n/a - ``ETHTOOL_GTXCSUM`` n/a - ``ETHTOOL_STXCSUM`` n/a - ``ETHTOOL_GSG`` n/a - ``ETHTOOL_SSG`` n/a + ``ETHTOOL_GRXCSUM`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SRXCSUM`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GTXCSUM`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_STXCSUM`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GSG`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SSG`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_TEST`` n/a ``ETHTOOL_GSTRINGS`` ``ETHTOOL_MSG_STRSET_GET`` ``ETHTOOL_PHYS_ID`` n/a ``ETHTOOL_GSTATS`` n/a - ``ETHTOOL_GTSO`` n/a - ``ETHTOOL_STSO`` n/a + ``ETHTOOL_GTSO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_STSO`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_GPERMADDR`` rtnetlink ``RTM_GETLINK`` - ``ETHTOOL_GUFO`` n/a - ``ETHTOOL_SUFO`` n/a - ``ETHTOOL_GGSO`` n/a - ``ETHTOOL_SGSO`` n/a - ``ETHTOOL_GFLAGS`` n/a - ``ETHTOOL_SFLAGS`` n/a - ``ETHTOOL_GPFLAGS`` n/a - ``ETHTOOL_SPFLAGS`` n/a + ``ETHTOOL_GUFO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SUFO`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GGSO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SGSO`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GFLAGS`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SFLAGS`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_GET`` + ``ETHTOOL_SPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_SET`` ``ETHTOOL_GRXFH`` n/a ``ETHTOOL_SRXFH`` n/a - ``ETHTOOL_GGRO`` n/a - ``ETHTOOL_SGRO`` n/a + ``ETHTOOL_GGRO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SGRO`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_GRXRINGS`` n/a ``ETHTOOL_GRXCLSRLCNT`` n/a ``ETHTOOL_GRXCLSRULE`` n/a @@ -589,10 +813,10 @@ have their netlink replacement yet. ``ETHTOOL_GSSET_INFO`` ``ETHTOOL_MSG_STRSET_GET`` ``ETHTOOL_GRXFHINDIR`` n/a ``ETHTOOL_SRXFHINDIR`` n/a - ``ETHTOOL_GFEATURES`` n/a - ``ETHTOOL_SFEATURES`` n/a - ``ETHTOOL_GCHANNELS`` n/a - ``ETHTOOL_SCHANNELS`` n/a + ``ETHTOOL_GFEATURES`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SFEATURES`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GCHANNELS`` ``ETHTOOL_MSG_CHANNELS_GET`` + ``ETHTOOL_SCHANNELS`` ``ETHTOOL_MSG_CHANNELS_SET`` ``ETHTOOL_SET_DUMP`` n/a ``ETHTOOL_GET_DUMP_FLAG`` n/a ``ETHTOOL_GET_DUMP_DATA`` n/a diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index d07d9855dcd3..3a83cfb66704 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -8,6 +8,7 @@ Contents: netdev-FAQ af_xdp + bareudp batman-adv can can_ucan_protocol diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5f53faff4e25..ee961d322d93 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -958,6 +958,15 @@ ip_nonlocal_bind - BOOLEAN which can be quite useful - but may break some applications. Default: 0 +ip_autobind_reuse - BOOLEAN + By default, bind() does not select the ports automatically even if + the new socket and all sockets bound to the port have SO_REUSEADDR. + ip_autobind_reuse allows bind() to reuse the port and this is useful + when you use bind()+connect(), but may break some applications. + The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this + option should only be set by experts. + Default: 0 + ip_dynaddr - BOOLEAN If set non-zero, enables support for dynamic addresses. If set to a non-zero value larger than 1, a kernel log diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst new file mode 100644 index 000000000000..43088ddf95e4 --- /dev/null +++ b/Documentation/networking/page_pool.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +Page Pool API +============= + +The page_pool allocator is optimized for the XDP mode that uses one frame +per-page, but it can fallback on the regular page allocator APIs. + +Basic use involves replacing alloc_pages() calls with the +page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages() +replacing dev_alloc_pages(). + +API keeps track of inflight pages, in order to let API user know +when it is safe to free a page_pool object. Thus, API users +must run page_pool_release_page() when a page is leaving the page_pool or +call page_pool_put_page() where appropriate in order to maintain correct +accounting. + +API user must call page_pool_put_page() once on a page, as it +will either recycle the page, or in case of refcnt > 1, it will +release the DMA mapping and inflight state accounting. + +Architecture overview +===================== + +.. code-block:: none + + +------------------+ + | Driver | + +------------------+ + ^ + | + | + | + v + +--------------------------------------------+ + | request memory | + +--------------------------------------------+ + ^ ^ + | | + | Pool empty | Pool has entries + | | + v v + +-----------------------+ +------------------------+ + | alloc (and map) pages | | get page from cache | + +-----------------------+ +------------------------+ + ^ ^ + | | + | cache available | No entries, refill + | | from ptr-ring + | | + v v + +-----------------+ +------------------+ + | Fast cache | | ptr-ring cache | + +-----------------+ +------------------+ + +API interface +============= +The number of pools created **must** match the number of hardware queues +unless hardware restrictions make that impossible. This would otherwise beat the +purpose of page pool, which is allocate pages fast from cache without locking. +This lockless guarantee naturally comes from running under a NAPI softirq. +The protection doesn't strictly have to be NAPI, any guarantee that allocating +a page will cause no race conditions is enough. + +* page_pool_create(): Create a pool. + * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV + * order: 2^order pages on allocation + * pool_size: size of the ptr_ring + * nid: preferred NUMA node for allocation + * dev: struct device. Used on DMA operations + * dma_dir: DMA direction + * max_len: max DMA sync memory size + * offset: DMA address offset + +* page_pool_put_page(): The outcome of this depends on the page refcnt. If the + driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1 + the allocator owns the page and will try to recycle it in one of the pool + caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device + using dma_sync_single_range_for_device(). + +* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync + for the entire memory area configured in area pool->max_len. + +* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller + must guarantee safe context (e.g NAPI), since it will recycle the page + directly into the pool fast cache. + +* page_pool_release_page(): Unmap the page (if mapped) and account for it on + inflight counters. + +* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool + caches. + +* page_pool_get_dma_addr(): Retrieve the stored DMA address. + +* page_pool_get_dma_dir(): Retrieve the stored DMA direction. + +Coding examples +=============== + +Registration +------------ + +.. code-block:: c + + /* Page pool registration */ + struct page_pool_params pp_params = { 0 }; + struct xdp_rxq_info xdp_rxq; + int err; + + pp_params.order = 0; + /* internal DMA mapping in page_pool */ + pp_params.flags = PP_FLAG_DMA_MAP; + pp_params.pool_size = DESC_NUM; + pp_params.nid = NUMA_NO_NODE; + pp_params.dev = priv->dev; + pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; + page_pool = page_pool_create(&pp_params); + + err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0); + if (err) + goto err_out; + + err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool); + if (err) + goto err_out; + +NAPI poller +----------- + + +.. code-block:: c + + /* NAPI Rx poller */ + enum dma_data_direction dma_dir; + + dma_dir = page_pool_get_dma_dir(dring->page_pool); + while (done < budget) { + if (some error) + page_pool_recycle_direct(page_pool, page); + if (packet_is_xdp) { + if XDP_DROP: + page_pool_recycle_direct(page_pool, page); + } else (packet_is_skb) { + page_pool_release_page(page_pool, page); + new_page = page_pool_dev_alloc_pages(page_pool); + } + } + +Driver unload +------------- + +.. code-block:: c + + /* Driver unload */ + page_pool_put_full_page(page_pool, page, false); + xdp_rxq_info_unreg(&xdp_rxq); diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst index d753a309f9d1..5aec7c8857d0 100644 --- a/Documentation/networking/sfp-phylink.rst +++ b/Documentation/networking/sfp-phylink.rst @@ -74,10 +74,13 @@ phylib to the sfp/phylink support. Please send patches to improve this documentation. 1. Optionally split the network driver's phylib update function into - three parts dealing with link-down, link-up and reconfiguring the - MAC settings. This can be done as a separate preparation commit. + two parts dealing with link-down and link-up. This can be done as + a separate preparation commit. - An example of this preparation can be found in git commit fc548b991fb0. + An older example of this preparation can be found in git commit + fc548b991fb0, although this was splitting into three parts; the + link-up part now includes configuring the MAC for the link settings. + Please see :c:func:`mac_link_up` for more information on this. 2. Replace:: @@ -135,27 +138,27 @@ this documentation. .. code-block:: c - static int foo_ethtool_set_link_ksettings(struct net_device *dev, - const struct ethtool_link_ksettings *cmd) - { - struct foo_priv *priv = netdev_priv(dev); - - return phylink_ethtool_ksettings_set(priv->phylink, cmd); - } - - static int foo_ethtool_get_link_ksettings(struct net_device *dev, - struct ethtool_link_ksettings *cmd) - { - struct foo_priv *priv = netdev_priv(dev); + static int foo_ethtool_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) + { + struct foo_priv *priv = netdev_priv(dev); + + return phylink_ethtool_ksettings_set(priv->phylink, cmd); + } - return phylink_ethtool_ksettings_get(priv->phylink, cmd); - } + static int foo_ethtool_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *cmd) + { + struct foo_priv *priv = netdev_priv(dev); + + return phylink_ethtool_ksettings_get(priv->phylink, cmd); + } -7. Replace the call to: +7. Replace the call to:: phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface); - and associated code with a call to: + and associated code with a call to:: err = phylink_of_phy_connect(priv->phylink, node, flags); @@ -207,6 +210,14 @@ this documentation. using. This is particularly important for in-band negotiation methods such as 1000base-X and SGMII. + The :c:func:`mac_link_up` method is used to inform the MAC that the + link has come up. The call includes the negotiation mode and interface + for reference only. The finalised link parameters are also supplied + (speed, duplex and flow control/pause enablement settings) which + should be used to configure the MAC when the MAC and PCS are not + tightly integrated, or when the settings are not coming from in-band + negotiation. + The :c:func:`mac_config` method is used to update the MAC with the requested state, and must avoid unnecessarily taking the link down when making changes to the MAC configuration. This means the |