aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/bareudp.rst52
-rw-r--r--Documentation/networking/device_drivers/mellanox/mlx5.rst2
-rw-r--r--Documentation/networking/devlink/devlink-trap.rst9
-rw-r--r--Documentation/networking/devlink/mlx5.rst6
-rw-r--r--Documentation/networking/ethtool-netlink.rst272
-rw-r--r--Documentation/networking/index.rst1
-rw-r--r--Documentation/networking/ip-sysctl.txt9
-rw-r--r--Documentation/networking/page_pool.rst159
-rw-r--r--Documentation/networking/sfp-phylink.rst49
9 files changed, 515 insertions, 44 deletions
diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst
new file mode 100644
index 000000000000..465a8b251bfe
--- /dev/null
+++ b/Documentation/networking/bareudp.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Bare UDP Tunnelling Module Documentation
+========================================
+
+There are various L3 encapsulation standards using UDP being discussed to
+leverage the UDP based load balancing capability of different networks.
+MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
+
+The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
+support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
+a UDP tunnel.
+
+Special Handling
+----------------
+The bareudp device supports special handling for MPLS & IP as they can have
+multiple ethertypes.
+MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast).
+IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
+This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC
+with a flag called multiproto mode.
+
+Usage
+------
+
+1) Device creation & deletion
+
+ a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
+
+ This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
+ 0x8847 (MPLS traffic). The destination port of the UDP header will be set to
+ 6635.The device will listen on UDP port 6635 to receive traffic.
+
+ b) ip link delete bareudp0
+
+2) Device creation with multiple proto mode enabled
+
+There are two ways to create a bareudp device for MPLS & IP with multiproto mode
+enabled.
+
+ a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto
+
+ b) ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls
+
+3) Device Usage
+
+The bareudp device could be used along with OVS or flower filter in TC.
+The OVS or TC flower layer must set the tunnel information in SKB dst field before
+sending packet buffer to the bareudp device for transmission. On reception the
+bareudp device extracts and stores the tunnel information in SKB dst field before
+passing the packet buffer to the network stack.
diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst
index f575a49790e8..e9b65035cd47 100644
--- a/Documentation/networking/device_drivers/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst
@@ -101,7 +101,7 @@ Enabling the driver and kconfig options
**External options** ( Choose if the corresponding mlx5 feature is required )
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
-- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled.
+- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
Devlink info
diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
index 47a429bb8658..63350e7332ce 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -238,6 +238,12 @@ be added to the following table:
- ``drop``
- Traps NVE packets that the device decided to drop because their overlay
source MAC is multicast
+ * - ``ingress_flow_action_drop``
+ - ``drop``
+ - Traps packets dropped during processing of ingress flow action drop
+ * - ``egress_flow_action_drop``
+ - ``drop``
+ - Traps packets dropped during processing of egress flow action drop
Driver-specific Packet Traps
============================
@@ -277,6 +283,9 @@ narrow. The description of these groups must be added to the following table:
* - ``tunnel_drops``
- Contains packet traps for packets that were dropped by the device during
tunnel encapsulation / decapsulation
+ * - ``acl_drops``
+ - Contains packet traps for packets that were dropped by the device during
+ ACL processing
Testing
=======
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 629a6e69c036..4e4b97f7971a 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -37,6 +37,12 @@ parameters.
* ``smfs`` Software managed flow steering. In SMFS mode, the HW
steering entities are created and manage through the driver without
firmware intervention.
+ * - ``fdb_large_groups``
+ - u32
+ - driverinit
+ - Control the number of large groups (size > 1) in the FDB table.
+
+ * The default value is 15, and the range is between 1 and 1024.
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index f1f868479ceb..31a601cafa3f 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -189,6 +189,14 @@ Userspace to kernel:
``ETHTOOL_MSG_DEBUG_SET`` set debugging settings
``ETHTOOL_MSG_WOL_GET`` get wake-on-lan settings
``ETHTOOL_MSG_WOL_SET`` set wake-on-lan settings
+ ``ETHTOOL_MSG_FEATURES_GET`` get device features
+ ``ETHTOOL_MSG_FEATURES_SET`` set device features
+ ``ETHTOOL_MSG_PRIVFLAGS_GET`` get private flags
+ ``ETHTOOL_MSG_PRIVFLAGS_SET`` set private flags
+ ``ETHTOOL_MSG_RINGS_GET`` get ring sizes
+ ``ETHTOOL_MSG_RINGS_SET`` set ring sizes
+ ``ETHTOOL_MSG_CHANNELS_GET`` get channel counts
+ ``ETHTOOL_MSG_CHANNELS_SET`` set channel counts
===================================== ================================
Kernel to userspace:
@@ -204,6 +212,15 @@ Kernel to userspace:
``ETHTOOL_MSG_DEBUG_NTF`` debugging settings notification
``ETHTOOL_MSG_WOL_GET_REPLY`` wake-on-lan settings
``ETHTOOL_MSG_WOL_NTF`` wake-on-lan settings notification
+ ``ETHTOOL_MSG_FEATURES_GET_REPLY`` device features
+ ``ETHTOOL_MSG_FEATURES_SET_REPLY`` optional reply to FEATURES_SET
+ ``ETHTOOL_MSG_FEATURES_NTF`` netdev features notification
+ ``ETHTOOL_MSG_PRIVFLAGS_GET_REPLY`` private flags
+ ``ETHTOOL_MSG_PRIVFLAGS_NTF`` private flags
+ ``ETHTOOL_MSG_RINGS_GET_REPLY`` ring sizes
+ ``ETHTOOL_MSG_RINGS_NTF`` ring sizes
+ ``ETHTOOL_MSG_CHANNELS_GET_REPLY`` channel counts
+ ``ETHTOOL_MSG_CHANNELS_NTF`` channel counts
===================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -521,6 +538,213 @@ Request contents:
``WAKE_MAGICSECURE`` mode.
+FEATURES_GET
+============
+
+Gets netdev features like ``ETHTOOL_GFEATURES`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested reply header
+ ``ETHTOOL_A_FEATURES_HW`` bitset dev->hw_features
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset dev->wanted_features
+ ``ETHTOOL_A_FEATURES_ACTIVE`` bitset dev->features
+ ``ETHTOOL_A_FEATURES_NOCHANGE`` bitset NETIF_F_NEVER_CHANGE
+ ==================================== ====== ==========================
+
+Bitmaps in kernel response have the same meaning as bitmaps used in ioctl
+interference but attribute names are different (they are based on
+corresponding members of struct net_device). Legacy "flags" are not provided,
+if userspace needs them (most likely only ethtool for backward compatibility),
+it can calculate their values from related feature bits itself.
+ETHA_FEATURES_HW uses mask consisting of all features recognized by kernel (to
+provide all names when using verbose bitmap format), the other three use no
+mask (simple bit lists).
+
+
+FEATURES_SET
+============
+
+Request to set netdev features like ``ETHTOOL_SFEATURES`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested request header
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset requested features
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested reply header
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset diff wanted vs. result
+ ``ETHTOOL_A_FEATURES_ACTIVE`` bitset diff old vs. new active
+ ==================================== ====== ==========================
+
+Request constains only one bitset which can be either value/mask pair (request
+to change specific feature bits and leave the rest) or only a value (request
+to set all features to specified set).
+
+As request is subject to netdev_change_features() sanity checks, optional
+kernel reply (can be suppressed by ``ETHTOOL_FLAG_OMIT_REPLY`` flag in request
+header) informs client about the actual result. ``ETHTOOL_A_FEATURES_WANTED``
+reports the difference between client request and actual result: mask consists
+of bits which differ between requested features and result (dev->features
+after the operation), value consists of values of these bits in the request
+(i.e. negated values from resulting features). ``ETHTOOL_A_FEATURES_ACTIVE``
+reports the difference between old and new dev->features: mask consists of
+bits which have changed, values are their values in new dev->features (after
+the operation).
+
+``ETHTOOL_MSG_FEATURES_NTF`` notification is sent not only if device features
+are modified using ``ETHTOOL_MSG_FEATURES_SET`` request or on of ethtool ioctl
+request but also each time features are modified with netdev_update_features()
+or netdev_change_features().
+
+
+PRIVFLAGS_GET
+=============
+
+Gets private flags like ``ETHTOOL_GPFLAGS`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested reply header
+ ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags
+ ==================================== ====== ==========================
+
+``ETHTOOL_A_PRIVFLAGS_FLAGS`` is a bitset with values of device private flags.
+These flags are defined by driver, their number and names (and also meaning)
+are device dependent. For compact bitset format, names can be retrieved as
+``ETH_SS_PRIV_FLAGS`` string set. If verbose bitset format is requested,
+response uses all private flags supported by the device as mask so that client
+gets the full information without having to fetch the string set with names.
+
+
+PRIVFLAGS_SET
+=============
+
+Sets or modifies values of device private flags like ``ETHTOOL_SPFLAGS``
+ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header
+ ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags
+ ==================================== ====== ==========================
+
+``ETHTOOL_A_PRIVFLAGS_FLAGS`` can either set the whole set of private flags or
+modify only values of some of them.
+
+
+RINGS_GET
+=========
+
+Gets ring sizes like ``ETHTOOL_GRINGPARAM`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ==================================== ====== ==========================
+
+
+RINGS_SET
+=========
+
+Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ==================================== ====== ==========================
+
+Kernel checks that requested ring sizes do not exceed limits reported by
+driver. Driver may impose additional constraints and may not suspport all
+attributes.
+
+
+CHANNELS_GET
+============
+
+Gets channel counts like ``ETHTOOL_GCHANNELS`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested reply header
+ ``ETHTOOL_A_CHANNELS_RX_MAX`` u32 max receive channels
+ ``ETHTOOL_A_CHANNELS_TX_MAX`` u32 max transmit channels
+ ``ETHTOOL_A_CHANNELS_OTHER_MAX`` u32 max other channels
+ ``ETHTOOL_A_CHANNELS_COMBINED_MAX`` u32 max combined channels
+ ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count
+ ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count
+ ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count
+ ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count
+ ===================================== ====== ==========================
+
+
+CHANNELS_SET
+============
+
+Sets channel counts like ``ETHTOOL_SCHANNELS`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested request header
+ ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count
+ ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count
+ ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count
+ ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count
+ ===================================== ====== ==========================
+
+Kernel checks that requested channel counts do not exceed limits reported by
+driver. Driver may impose additional constraints and may not suspport all
+attributes.
+
+
Request translation
===================
@@ -547,35 +771,35 @@ have their netlink replacement yet.
``ETHTOOL_SEEPROM`` n/a
``ETHTOOL_GCOALESCE`` n/a
``ETHTOOL_SCOALESCE`` n/a
- ``ETHTOOL_GRINGPARAM`` n/a
- ``ETHTOOL_SRINGPARAM`` n/a
+ ``ETHTOOL_GRINGPARAM`` ``ETHTOOL_MSG_RINGS_GET``
+ ``ETHTOOL_SRINGPARAM`` ``ETHTOOL_MSG_RINGS_SET``
``ETHTOOL_GPAUSEPARAM`` n/a
``ETHTOOL_SPAUSEPARAM`` n/a
- ``ETHTOOL_GRXCSUM`` n/a
- ``ETHTOOL_SRXCSUM`` n/a
- ``ETHTOOL_GTXCSUM`` n/a
- ``ETHTOOL_STXCSUM`` n/a
- ``ETHTOOL_GSG`` n/a
- ``ETHTOOL_SSG`` n/a
+ ``ETHTOOL_GRXCSUM`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SRXCSUM`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GTXCSUM`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_STXCSUM`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GSG`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SSG`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_TEST`` n/a
``ETHTOOL_GSTRINGS`` ``ETHTOOL_MSG_STRSET_GET``
``ETHTOOL_PHYS_ID`` n/a
``ETHTOOL_GSTATS`` n/a
- ``ETHTOOL_GTSO`` n/a
- ``ETHTOOL_STSO`` n/a
+ ``ETHTOOL_GTSO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_STSO`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_GPERMADDR`` rtnetlink ``RTM_GETLINK``
- ``ETHTOOL_GUFO`` n/a
- ``ETHTOOL_SUFO`` n/a
- ``ETHTOOL_GGSO`` n/a
- ``ETHTOOL_SGSO`` n/a
- ``ETHTOOL_GFLAGS`` n/a
- ``ETHTOOL_SFLAGS`` n/a
- ``ETHTOOL_GPFLAGS`` n/a
- ``ETHTOOL_SPFLAGS`` n/a
+ ``ETHTOOL_GUFO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SUFO`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GGSO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SGSO`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GFLAGS`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SFLAGS`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_GET``
+ ``ETHTOOL_SPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_SET``
``ETHTOOL_GRXFH`` n/a
``ETHTOOL_SRXFH`` n/a
- ``ETHTOOL_GGRO`` n/a
- ``ETHTOOL_SGRO`` n/a
+ ``ETHTOOL_GGRO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SGRO`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_GRXRINGS`` n/a
``ETHTOOL_GRXCLSRLCNT`` n/a
``ETHTOOL_GRXCLSRULE`` n/a
@@ -589,10 +813,10 @@ have their netlink replacement yet.
``ETHTOOL_GSSET_INFO`` ``ETHTOOL_MSG_STRSET_GET``
``ETHTOOL_GRXFHINDIR`` n/a
``ETHTOOL_SRXFHINDIR`` n/a
- ``ETHTOOL_GFEATURES`` n/a
- ``ETHTOOL_SFEATURES`` n/a
- ``ETHTOOL_GCHANNELS`` n/a
- ``ETHTOOL_SCHANNELS`` n/a
+ ``ETHTOOL_GFEATURES`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SFEATURES`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GCHANNELS`` ``ETHTOOL_MSG_CHANNELS_GET``
+ ``ETHTOOL_SCHANNELS`` ``ETHTOOL_MSG_CHANNELS_SET``
``ETHTOOL_SET_DUMP`` n/a
``ETHTOOL_GET_DUMP_FLAG`` n/a
``ETHTOOL_GET_DUMP_DATA`` n/a
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index d07d9855dcd3..3a83cfb66704 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -8,6 +8,7 @@ Contents:
netdev-FAQ
af_xdp
+ bareudp
batman-adv
can
can_ucan_protocol
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5f53faff4e25..ee961d322d93 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -958,6 +958,15 @@ ip_nonlocal_bind - BOOLEAN
which can be quite useful - but may break some applications.
Default: 0
+ip_autobind_reuse - BOOLEAN
+ By default, bind() does not select the ports automatically even if
+ the new socket and all sockets bound to the port have SO_REUSEADDR.
+ ip_autobind_reuse allows bind() to reuse the port and this is useful
+ when you use bind()+connect(), but may break some applications.
+ The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this
+ option should only be set by experts.
+ Default: 0
+
ip_dynaddr - BOOLEAN
If set non-zero, enables support for dynamic addresses.
If set to a non-zero value larger than 1, a kernel log
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
new file mode 100644
index 000000000000..43088ddf95e4
--- /dev/null
+++ b/Documentation/networking/page_pool.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Page Pool API
+=============
+
+The page_pool allocator is optimized for the XDP mode that uses one frame
+per-page, but it can fallback on the regular page allocator APIs.
+
+Basic use involves replacing alloc_pages() calls with the
+page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
+replacing dev_alloc_pages().
+
+API keeps track of inflight pages, in order to let API user know
+when it is safe to free a page_pool object. Thus, API users
+must run page_pool_release_page() when a page is leaving the page_pool or
+call page_pool_put_page() where appropriate in order to maintain correct
+accounting.
+
+API user must call page_pool_put_page() once on a page, as it
+will either recycle the page, or in case of refcnt > 1, it will
+release the DMA mapping and inflight state accounting.
+
+Architecture overview
+=====================
+
+.. code-block:: none
+
+ +------------------+
+ | Driver |
+ +------------------+
+ ^
+ |
+ |
+ |
+ v
+ +--------------------------------------------+
+ | request memory |
+ +--------------------------------------------+
+ ^ ^
+ | |
+ | Pool empty | Pool has entries
+ | |
+ v v
+ +-----------------------+ +------------------------+
+ | alloc (and map) pages | | get page from cache |
+ +-----------------------+ +------------------------+
+ ^ ^
+ | |
+ | cache available | No entries, refill
+ | | from ptr-ring
+ | |
+ v v
+ +-----------------+ +------------------+
+ | Fast cache | | ptr-ring cache |
+ +-----------------+ +------------------+
+
+API interface
+=============
+The number of pools created **must** match the number of hardware queues
+unless hardware restrictions make that impossible. This would otherwise beat the
+purpose of page pool, which is allocate pages fast from cache without locking.
+This lockless guarantee naturally comes from running under a NAPI softirq.
+The protection doesn't strictly have to be NAPI, any guarantee that allocating
+a page will cause no race conditions is enough.
+
+* page_pool_create(): Create a pool.
+ * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
+ * order: 2^order pages on allocation
+ * pool_size: size of the ptr_ring
+ * nid: preferred NUMA node for allocation
+ * dev: struct device. Used on DMA operations
+ * dma_dir: DMA direction
+ * max_len: max DMA sync memory size
+ * offset: DMA address offset
+
+* page_pool_put_page(): The outcome of this depends on the page refcnt. If the
+ driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
+ the allocator owns the page and will try to recycle it in one of the pool
+ caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
+ using dma_sync_single_range_for_device().
+
+* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
+ for the entire memory area configured in area pool->max_len.
+
+* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
+ must guarantee safe context (e.g NAPI), since it will recycle the page
+ directly into the pool fast cache.
+
+* page_pool_release_page(): Unmap the page (if mapped) and account for it on
+ inflight counters.
+
+* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
+ caches.
+
+* page_pool_get_dma_addr(): Retrieve the stored DMA address.
+
+* page_pool_get_dma_dir(): Retrieve the stored DMA direction.
+
+Coding examples
+===============
+
+Registration
+------------
+
+.. code-block:: c
+
+ /* Page pool registration */
+ struct page_pool_params pp_params = { 0 };
+ struct xdp_rxq_info xdp_rxq;
+ int err;
+
+ pp_params.order = 0;
+ /* internal DMA mapping in page_pool */
+ pp_params.flags = PP_FLAG_DMA_MAP;
+ pp_params.pool_size = DESC_NUM;
+ pp_params.nid = NUMA_NO_NODE;
+ pp_params.dev = priv->dev;
+ pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+ page_pool = page_pool_create(&pp_params);
+
+ err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
+ if (err)
+ goto err_out;
+
+ err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
+ if (err)
+ goto err_out;
+
+NAPI poller
+-----------
+
+
+.. code-block:: c
+
+ /* NAPI Rx poller */
+ enum dma_data_direction dma_dir;
+
+ dma_dir = page_pool_get_dma_dir(dring->page_pool);
+ while (done < budget) {
+ if (some error)
+ page_pool_recycle_direct(page_pool, page);
+ if (packet_is_xdp) {
+ if XDP_DROP:
+ page_pool_recycle_direct(page_pool, page);
+ } else (packet_is_skb) {
+ page_pool_release_page(page_pool, page);
+ new_page = page_pool_dev_alloc_pages(page_pool);
+ }
+ }
+
+Driver unload
+-------------
+
+.. code-block:: c
+
+ /* Driver unload */
+ page_pool_put_full_page(page_pool, page, false);
+ xdp_rxq_info_unreg(&xdp_rxq);
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
index d753a309f9d1..5aec7c8857d0 100644
--- a/Documentation/networking/sfp-phylink.rst
+++ b/Documentation/networking/sfp-phylink.rst
@@ -74,10 +74,13 @@ phylib to the sfp/phylink support. Please send patches to improve
this documentation.
1. Optionally split the network driver's phylib update function into
- three parts dealing with link-down, link-up and reconfiguring the
- MAC settings. This can be done as a separate preparation commit.
+ two parts dealing with link-down and link-up. This can be done as
+ a separate preparation commit.
- An example of this preparation can be found in git commit fc548b991fb0.
+ An older example of this preparation can be found in git commit
+ fc548b991fb0, although this was splitting into three parts; the
+ link-up part now includes configuring the MAC for the link settings.
+ Please see :c:func:`mac_link_up` for more information on this.
2. Replace::
@@ -135,27 +138,27 @@ this documentation.
.. code-block:: c
- static int foo_ethtool_set_link_ksettings(struct net_device *dev,
- const struct ethtool_link_ksettings *cmd)
- {
- struct foo_priv *priv = netdev_priv(dev);
-
- return phylink_ethtool_ksettings_set(priv->phylink, cmd);
- }
-
- static int foo_ethtool_get_link_ksettings(struct net_device *dev,
- struct ethtool_link_ksettings *cmd)
- {
- struct foo_priv *priv = netdev_priv(dev);
+ static int foo_ethtool_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_set(priv->phylink, cmd);
+ }
- return phylink_ethtool_ksettings_get(priv->phylink, cmd);
- }
+ static int foo_ethtool_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_get(priv->phylink, cmd);
+ }
-7. Replace the call to:
+7. Replace the call to::
phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface);
- and associated code with a call to:
+ and associated code with a call to::
err = phylink_of_phy_connect(priv->phylink, node, flags);
@@ -207,6 +210,14 @@ this documentation.
using. This is particularly important for in-band negotiation
methods such as 1000base-X and SGMII.
+ The :c:func:`mac_link_up` method is used to inform the MAC that the
+ link has come up. The call includes the negotiation mode and interface
+ for reference only. The finalised link parameters are also supplied
+ (speed, duplex and flow control/pause enablement settings) which
+ should be used to configure the MAC when the MAC and PCS are not
+ tightly integrated, or when the settings are not coming from in-band
+ negotiation.
+
The :c:func:`mac_config` method is used to update the MAC with the
requested state, and must avoid unnecessarily taking the link down
when making changes to the MAC configuration. This means the