Commit 3948b059 authored by Eric Dumazet's avatar Eric Dumazet Committed by Jakub Kicinski

net: introduce a config option to tweak MAX_SKB_FRAGS

Currently, MAX_SKB_FRAGS value is 17.

For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.

But with zero copy, we use order-0 pages.

For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.

This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.

We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect, other than
a recent issue in mlx4, fixed in commit 26782aad
("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")

Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.

Note that /proc/sys/net/core/max_skb_frags can also be used to limit
the number of fragments TCP can use in tx packets.

By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.

Sizes of struct skb_shared_info on 64bit arches

MAX_SKB_FRAGS | sizeof(struct skb_shared_info):
==============================================
         17     320
         21     320+64  = 384
         25     320+128 = 448
         29     320+192 = 512
         33     320+256 = 576
         37     320+320 = 640
         41     320+384 = 704
         45     320+448 = 768

This inflation might cause problems for drivers assuming they could pack
both the incoming packet (for MTU=1500) and skb_shared_info in half a page,
using build_skb().

v3: fix build error when CONFIG_NET=n
v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20230323162842.1935061-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent e5b42483
...@@ -2314,9 +2314,9 @@ static int cxgbi_sock_tx_queue_up(struct cxgbi_sock *csk, struct sk_buff *skb) ...@@ -2314,9 +2314,9 @@ static int cxgbi_sock_tx_queue_up(struct cxgbi_sock *csk, struct sk_buff *skb)
frags++; frags++;
if (frags >= SKB_WR_LIST_SIZE) { if (frags >= SKB_WR_LIST_SIZE) {
pr_err("csk 0x%p, frags %u, %u,%u >%lu.\n", pr_err("csk 0x%p, frags %u, %u,%u >%u.\n",
csk, skb_shinfo(skb)->nr_frags, skb->len, csk, skb_shinfo(skb)->nr_frags, skb->len,
skb->data_len, SKB_WR_LIST_SIZE); skb->data_len, (unsigned int)SKB_WR_LIST_SIZE);
return -EINVAL; return -EINVAL;
} }
......
...@@ -345,18 +345,12 @@ struct sk_buff_head { ...@@ -345,18 +345,12 @@ struct sk_buff_head {
struct sk_buff; struct sk_buff;
/* To allow 64K frame to be packed as single skb without frag_list we #ifndef CONFIG_MAX_SKB_FRAGS
* require 64K/PAGE_SIZE pages plus 1 additional page to allow for # define CONFIG_MAX_SKB_FRAGS 17
* buffers which do not start on a page boundary.
*
* Since GRO uses frags we allocate at least 16 regardless of page
* size.
*/
#if (65536/PAGE_SIZE + 1) < 16
#define MAX_SKB_FRAGS 16UL
#else
#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
#endif #endif
#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
extern int sysctl_max_skb_frags; extern int sysctl_max_skb_frags;
/* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
......
...@@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT ...@@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT
network device refcount are using per cpu variables if this option is set. network device refcount are using per cpu variables if this option is set.
This can be forced to N to detect underflows (with a performance drop). This can be forced to N to detect underflows (with a performance drop).
config MAX_SKB_FRAGS
int "Maximum number of fragments per skb_shared_info"
range 17 45
default 17
help
Having more fragments per skb_shared_info can help GRO efficiency.
This helps BIG TCP workloads, but might expose bugs in some
legacy drivers.
This also increases memory overhead of small packets,
and in drivers using build_skb().
If unsure, say 17.
config RPS config RPS
bool bool
depends on SMP && SYSFS depends on SMP && SYSFS
......
...@@ -2622,8 +2622,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, ...@@ -2622,8 +2622,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
nr_frags = skb_shinfo(skb)->nr_frags; nr_frags = skb_shinfo(skb)->nr_frags;
if (unlikely(nr_frags >= MAX_SKB_FRAGS)) { if (unlikely(nr_frags >= MAX_SKB_FRAGS)) {
pr_err("Packet exceed the number of skb frags(%lu)\n", pr_err("Packet exceed the number of skb frags(%u)\n",
MAX_SKB_FRAGS); (unsigned int)MAX_SKB_FRAGS);
return -EFAULT; return -EFAULT;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment