Commits · 2fa3ee93d13ce723d60a4fc4620ed9126c468901 · Kirill Smelkov / linux

10 Jun, 2022 23 commits

Jonathan Toppins authored Jun 08, 2022

Setting RLB_NULL_INDEX is not needed as this is done in bond_alb_initialize
which is called by bond_open.

Also reduce the number of rtnl_unlock calls by just using the standard
goto cleanup path.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2fa3ee93

bonding: netlink error message support for options · 2bff369b

Jonathan Toppins authored Jun 08, 2022

Add support for reporting errors via extack in both bond_newlink
and bond_changelink.

Instead of having to look in the kernel log for why an option was not
correct just report the error to the user via the extack variable.

What is currently reported today:
  ip link add bond0 type bond
  ip link set bond0 up
  ip link set bond0 type bond mode 4
 RTNETLINK answers: Device or resource busy

After this change:
  ip link add bond0 type bond
  ip link set bond0 up
  ip link set bond0 type bond mode 4
 Error: unable to set option because the bond is up.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2bff369b

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · ce1d8e74

Jakub Kicinski authored Jun 09, 2022

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-06-08

Michal prevents setting of VF VLAN capabilities in switchdev mode and
removes, not needed, specific switchdev VLAN operations.

Karol converts u16 variables to unsigned int for GNSS calculations.

Christophe Jaillet corrects the parameter order for a couple of
devm_kcalloc() calls.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Use correct order for the parameters of devm_kcalloc()
  ice: remove u16 arithmetic in ice_gnss
  ice: remove VLAN representor specific ops
  ice: don't set VF VLAN caps in switchdev
====================

Link: https://lore.kernel.org/r/20220608160757.2395729-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ce1d8e74

Merge branch 'net-few-debug-refinements' · 3000024c

Jakub Kicinski authored Jun 09, 2022

Eric Dumazet says:

====================
net: few debug refinements

Adopt DEBUG_NET_WARN_ON_ONCE() or WARN_ON_ONCE()
in some places where it makes sense.

Add checks in napi_consume_skb() and __napi_alloc_skb()

Make sure napi_get_frags() does not use page fragments
for skb->head.
====================

Link: https://lore.kernel.org/r/20220608160438.1342569-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3000024c

net: add napi_get_frags_check() helper · fd9ea57f

Eric Dumazet authored Jun 08, 2022

This is a follow up of commit 3226b158
("net: avoid 32 x truesize under-estimation for tiny skbs")

When/if we increase MAX_SKB_FRAGS, we better make sure
the old bug will not come back.

Adding a check in napi_get_frags() would be costly,
even if using DEBUG_NET_WARN_ON_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fd9ea57f

net: add debug checks in napi_consume_skb and __napi_alloc_skb() · ee2640df

Eric Dumazet authored Jun 08, 2022

Commit 6454eca8 ("net: Use lockdep_assert_in_softirq()
in napi_consume_skb()") added a check in napi_consume_skb()
which is a bit weak.

napi_consume_skb() and __napi_alloc_skb() should only
be used from BH context, not from hard irq or nmi context,
otherwise we could have races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ee2640df

net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state() · 7890e2f0

Eric Dumazet authored Jun 08, 2022

Remove this check from fast path unless CONFIG_DEBUG_NET=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7890e2f0

af_unix: use DEBUG_NET_WARN_ON_ONCE() · dd29c67d

Eric Dumazet authored Jun 08, 2022

Replace four WARN_ON() that have not triggered recently
with DEBUG_NET_WARN_ON_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dd29c67d

net: use WARN_ON_ONCE() in sk_stream_kill_queues() · c59f02f8

Eric Dumazet authored Jun 08, 2022

sk_stream_kill_queues() has three checks which have been
useful to detect kernel bugs in the past.

However they are potentially a problem because they
could flood the syslog.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c59f02f8

net: use WARN_ON_ONCE() in inet_sock_destruct() · 3e7f2b8d

Eric Dumazet authored Jun 08, 2022

inet_sock_destruct() has four warnings which have been
useful to point to kernel bugs in the past.

However they are potentially a problem because they
could flood the syslog.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3e7f2b8d

net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit() · 76458fae

Eric Dumazet authored Jun 08, 2022

One check in dev_loopback_xmit() has not caught issues
in the past.

Keep it for CONFIG_DEBUG_NET=y builds only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

76458fae

net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock() · 63fbdd3c

Eric Dumazet authored Jun 08, 2022

Check against skb dst in socket backlog has never triggered
in past years.

Keep the check omly for CONFIG_DEBUG_NET=y builds.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

63fbdd3c

Merge branch 'net-adopt-u64_stats_t-type' · f5f37fc9

Jakub Kicinski authored Jun 09, 2022

Eric Dumazet says:

====================
net: adopt u64_stats_t type

While KCSAN has not raised any reports yet, we should address the
potential load/store tearing problem happening with per cpu stats.

This series is not exhaustive, but hopefully a step in the right
direction.
====================

Link: https://lore.kernel.org/r/20220608154640.1235958-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f5f37fc9

team: adopt u64_stats_t · 9ec321ab

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9ec321ab

drop_monitor: adopt u64_stats_t · c6cce71e

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c6cce71e

devlink: adopt u64_stats_t · 958751e0

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

958751e0

net: adopt u64_stats_t in struct pcpu_sw_netstats · 9962acef

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9962acef

wireguard: receive: use dev_sw_netstats_rx_add() · eeb15885

Eric Dumazet authored Jun 08, 2022

We have a convenient helper, let's use it.
This will make the following patch easier to review and smaller.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eeb15885

ip6_tunnel: use dev_sw_netstats_rx_add() · afd2051b

Eric Dumazet authored Jun 08, 2022

We have a convenient helper, let's use it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

afd2051b

sit: use dev_sw_netstats_rx_add() · 3a960ca7

Eric Dumazet authored Jun 08, 2022

We have a convenient helper, let's use it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3a960ca7

ipvlan: adopt u64_stats_t · 5665f48e

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Add READ_ONCE() when reading rx_errs & tx_drps.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5665f48e

vlan: adopt u64_stats_t · 09cca53c

Eric Dumazet authored Jun 08, 2022

As explained in commit 316580b6 ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Add READ_ONCE() when reading rx_errors & tx_dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

09cca53c

net: rename reference+tracking helpers · d62607c3

Jakub Kicinski authored Jun 07, 2022

Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()    -> netdev_hold()
 dev_put_track()     -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d62607c3

09 Jun, 2022 17 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a98a62e4
Jakub Kicinski authored Jun 09, 2022
```
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
```
a98a62e4

Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 825464e7

Linus Torvalds authored Jun 09, 2022

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - eth: amt: fix possible null-ptr-deref in amt_rcv()

  Previous releases - regressions:

   - tcp: use alloc_large_system_hash() to allocate table_perturb

   - af_unix: fix a data-race in unix_dgram_peer_wake_me()

   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling

   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF

  Previous releases - always broken:

   - ipv6: fix signed integer overflow in __ip6_append_data

   - netfilter:
       - nat: really support inet nat without l3 address
       - nf_tables: memleak flow rule from commit path

   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs

   - openvswitch: fix misuse of the cached connection on tuple changes

   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred

   - eth: altera: fix refcount leak in altera_tse_mdio_create

  Misc:

   - add Quentin Monnet to bpftool maintainers"

* tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: amd-xgbe: fix clang -Wformat warning
  tcp: use alloc_large_system_hash() to allocate table_perturb
  net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
  net: dsa: mv88e6xxx: correctly report serdes link failure
  net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
  net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
  net: altera: Fix refcount leak in altera_tse_mdio_create
  net: openvswitch: fix misuse of the cached connection on tuple changes
  net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
  ip_gre: test csum_start instead of transport header
  au1000_eth: stop using virt_to_bus()
  ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
  ipv6: Fix signed integer overflow in __ip6_append_data
  nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
  nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
  nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
  nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
  net: ipv6: unexport __init-annotated seg6_hmac_init()
  net: xfrm: unexport __init-annotated xfrm4_protocol_init()
  net: mdio: unexport __init-annotated mdio_bus_init()
  ...

825464e7

netfs: gcc-12: temporarily disable '-Wattribute-warning' for now · 507160f4

Linus Torvalds authored Jun 09, 2022

This is a pure band-aid so that I can continue merging stuff from people
while some of the gcc-12 fallout gets sorted out.

In particular, gcc-12 is very unhappy about the kinds of pointer
arithmetic tricks that netfs does, and that makes the fortify checks
trigger in afs and ceph:

  In function ‘fortify_memset_chk’,
      inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
      inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
      inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
  include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    258 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and the reason is that netfs_i_context_init() is passed a 'struct inode'
pointer, and then it does

        struct netfs_i_context *ctx = netfs_i_context(inode);

        memset(ctx, 0, sizeof(*ctx));

where that netfs_i_context() function just does pointer arithmetic on
the inode pointer, knowing that the netfs_i_context is laid out
immediately after it in memory.

This is all truly disgusting, since the whole "netfs_i_context is laid
out immediately after it in memory" is not actually remotely true in
general, but is just made to be that way for afs and ceph.

See for example fs/cifs/cifsglob.h:

  struct cifsInodeInfo {
        struct {
                /* These must be contiguous */
                struct inode    vfs_inode;      /* the VFS's inode record */
                struct netfs_i_context netfs_ctx; /* Netfslib context */
        };
	[...]

and realize that this is all entirely wrong, and the pointer arithmetic
that netfs_i_context() is doing is also very very wrong and wouldn't
give the right answer if netfs_ctx had different alignment rules from a
'struct inode', for example).

Anyway, that's just a long-winded way to say "the gcc-12 warning is
actually quite reasonable, and our code happens to work but is pretty
disgusting".

This is getting fixed properly, but for now I made the mistake of
thinking "the week right after the merge window tends to be calm for me
as people take a breather" and I did a sustem upgrade.  And I got gcc-12
as a result, so to continue merging fixes from people and not have the
end result drown in warnings, I am fixing all these gcc-12 issues I hit.

Including with these kinds of temporary fixes.

Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

507160f4

gcc-12: disable '-Warray-bounds' universally for now · f0be87c4

Linus Torvalds authored Jun 09, 2022

In commit 8b202ee2 ("s390: disable -Warray-bounds") the s390 people
disabled the '-Warray-bounds' warning for gcc-12, because the new logic
in gcc would cause warnings for their use of the S390_lowcore macro,
which accesses absolute pointers.

It turns out gcc-12 has many other issues in this area, so this takes
that s390 warning disable logic, and turns it into a kernel build config
entry instead.

Part of the intent is that we can make this all much more targeted, and
use this conflig flag to disable it in only particular configurations
that cause problems, with the s390 case as an example:

        select GCC12_NO_ARRAY_BOUNDS

and we could do that for other configuration cases that cause issues.

Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
targeted way, and disable the warning only for particular uses: again
the s390 case as an example:

  KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)

but this ends up just doing it globally in the top-level Makefile, since
the current issues are spread fairly widely all over:

  KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds

We'll try to limit this later, since the gcc-12 problems are rare enough
that *much* of the kernel can be built with it without disabling this
warning.

Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f0be87c4

mellanox: mlx5: avoid uninitialized variable warning with gcc-12 · 842c3b3d

Linus Torvalds authored Jun 09, 2022

gcc-12 started warning about 'tracker' being used uninitialized:

  drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’:
  drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized]
    786 |         struct lag_tracker tracker;
        |                            ^~~~~~~

which seems to be because it doesn't track how the use (and
initialization) is bound by the 'do_bond' flag.

But admittedly that 'do_bond' usage is fairly complicated, and involves
passing it around as an argument to helper functions, so it's somewhat
understandable that gcc doesn't see how that all works.

This function could be rewritten to make the use of that tracker
variable more obviously safe, but for now I'm just adding the forced
initialization of it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

842c3b3d

gcc-12: disable '-Wdangling-pointer' warning for now · 49beadbd

Linus Torvalds authored Jun 09, 2022

While the concept of checking for dangling pointers to local variables
at function exit is really interesting, the gcc-12 implementation is not
compatible with reality, and results in false positives.

For example, gcc sees us putting things on a local list head allocated
on the stack, which involves exactly those kinds of pointers to the
local stack entry:

  In function ‘__list_add’,
      inlined from ‘list_add_tail’ at include/linux/list.h:102:2,
      inlined from ‘rebuild_snap_realms’ at fs/ceph/snap.c:434:2:
  include/linux/list.h:74:19: warning: storing the address of local variable ‘realm_queue’ in ‘*&realm_27(D)->rebuild_item.prev’ [-Wdangling-pointer=]
     74 |         new->prev = prev;
        |         ~~~~~~~~~~^~~~~~

But then gcc - understandably - doesn't really understand the big
picture how the doubly linked list works, so doesn't see how we then end
up emptying said list head in a loop and the pointer we added has been
removed.

Gcc also complains about us (intentionally) using this as a way to store
a kind of fake stack trace, eg

  drivers/acpi/acpica/utdebug.c:40:38: warning: storing the address of local variable ‘current_sp’ in ‘acpi_gbl_entry_stack_pointer’ [-Wdangling-pointer=]
     40 |         acpi_gbl_entry_stack_pointer = &current_sp;
        |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~

which is entirely reasonable from a compiler standpoint, and we may want
to change those kinds of patterns, but not not.

So this is one of those "it would be lovely if the compiler were to
complain about us leaving dangling pointers to the stack", but not this
way.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

49beadbd

drm: imx: fix compiler warning with gcc-12 · 7aefd8b5

Linus Torvalds authored Jun 08, 2022

Gcc-12 correctly warned about this code using a non-NULL pointer as a
truth value:

  drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’:
  drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address]
     72 |                 if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base)
        |                     ^

due to the extraneous '&' address-of operator.

Philipp Zabel points out that The mistake had no adverse effect since
the following condition doesn't actually dereference the NULL pointer,
but the intent of the code was obviously to check for it, not to take
the address of the member.

Fixes: eb8c8880 ("drm/imx: add deferred plane disabling")
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7aefd8b5

net: macb: change return type for gem_ptp_set_one_step_sync() · 263efe85

Claudiu Beznea authored Jun 08, 2022

gem_ptp_set_one_step_sync() always returns zero thus change its return
type to void.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220608080818.1495044-1-claudiu.beznea@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

263efe85

Merge branch 'vmxnet3-upgrade-to-version-7' · e4c437cd

Paolo Abeni authored Jun 09, 2022

Ronak Doshi says:

====================
vmxnet3: upgrade to version 7

vmxnet3 emulation has recently added several new features including
support for uniform passthrough(UPT). To make UPT work vmxnet3 has
to be enhanced as per the new specification. This patch series
extends the vmxnet3 driver to leverage these new features.

Compatibility is maintained using existing vmxnet3 versioning mechanism as
follows:
 - new features added to vmxnet3 emulation are associated with new vmxnet3
   version viz. vmxnet3 version 7.
 - emulation advertises all the versions it supports to the driver.
 - during initialization, vmxnet3 driver picks the highest version number
 supported by both the emulation and the driver and configures emulation
 to run at that version.

In particular, following changes are introduced:

Patch 1:
  This patch introduces utility macros for vmxnet3 version 7 comparison
  and updates Copyright information.

Patch 2:
  This patch adds new capability registers to fine control enablement of
  individual features based on emulation and passthrough.

Patch 3:
  This patch adds support for large passthrough BAR register.

Patch 4:
  This patch adds support for out of order rx completion processing.

Patch 5:
  This patch introduces new command to set ring buffer sizes to pass this
  information to the hardware.

Patch 6:
  For better performance, hardware has a requirement to limit number of TSO
  descriptors. This patch adds that support.

Patch 7:
  With vmxnet3 version 7, new descriptor fields are used to indicate
  encapsulation offload.

Patch 8:
  With all vmxnet3 version 7 changes incorporated in the vmxnet3 driver,
  with this patch, the driver can configure emulation to run at vmxnet3
  version 7.

Changes in v2->v3:
 - use correct byte ordering for ringBufSize

Changes in v2:
 - use local rss_fields variable for the rss capability checks in patch 2
====================

Link: https://lore.kernel.org/r/20220608032353.964-1-doshir@vmware.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

e4c437cd

vmxnet3: update to version 7 · acc38e04

Ronak Doshi authored Jun 07, 2022

With all vmxnet3 version 7 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 7, provided
the emulation advertises support for version 7.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

acc38e04

vmxnet3: use ext1 field to indicate encapsulated packet · 60cafa03

Ronak Doshi authored Jun 07, 2022

Till vmxnet3 version 6, om field of transmit descriptor was used
to indicate encapsulated offload packet and msscof was used to
indirectly indicate TSO/CSO. From version 7 and later, ext1 field
will be used to indicate whether packet is encapsulated or not and
om fields will continue to indicate if the packet is TSO or CSO.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

60cafa03

vmxnet3: limit number of TXDs used for TSO packet · d2857b99

Ronak Doshi authored Jun 07, 2022

Currently, vmxnet3 does not have a limit on number of descriptors
used for a TSO packet. However, with UPT, for hardware performance
reasons, this patch limits the number of transmit descriptors to 24
for a TSO packet.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

d2857b99

vmxnet3: add command to set ring buffer sizes · c7112ebd

Ronak Doshi authored Jun 07, 2022

This patch adds a new command to set ring buffer sizes. This is
required to pass the buffer size information to passthrough devices.
For performance reasons, with version7 and later, ring1 will contain
only mtu size buffers (bound to 3K). Packets > 3K will use both ring1
and ring2.

Also, ring sizes are round down to power of 2 and ring2 default
size is increased to 512.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

c7112ebd

vmxnet3: add support for out of order rx completion · 2c5a5748

Ronak Doshi authored Jun 07, 2022

Currently, vmxnet3 processes rx completions in-order i.e. no
out of order completion descriptor is expected. With UPT, if
hardware supports LRO, then hardware can report out of order
rx completions. This patch enhances vmxnet3 to add this support.
This supports gets effective only when the corresponding feature
bit is set.

Also, minor enhancements are done for performance.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

2c5a5748

vmxnet3: add support for large passthrough BAR register · 543fb674

Ronak Doshi authored Jun 07, 2022

For vmxnet3 to work in UPT mode, the BAR sizes have been increased.
The PT page has been extended to 2 pages and also includes OOB pages
as a part of PT BAR. This patch enhances vmxnet3 to use appropriate
BAR offsets based on the capability registered. To use new offsets,
VMXNET3_CAP_LARGE_BAR needs to be set by the device. If it is not set
then the device will use legacy PT page layout.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

543fb674

vmxnet3: add support for capability registers · 6f91f4ba

Ronak Doshi authored Jun 07, 2022

This patch enhances vmxnet3 to suuport capability registers which
allows it to enable features selectively. The DCR register tracks
the capabilities vmxnet3 device supports. The PTCR register states
the capabilities that the passthrough device supports.

With the help of these registers, vmxnet3 can enable only those
features which the passthrough device supoprts. This allows
smooth trasition to Uniform-Passthrough (UPT) mode if the virtual
nic requests it. If PTCR register returns nothing or error it means
UPT is not being requested and vnic will continue in emulation mode.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

6f91f4ba

vmxnet3: prepare for version 7 changes · 55f0395f

Ronak Doshi authored Jun 07, 2022

vmxnet3 is currently at version 6 and this patch initiates the
preparation to accommodate changes for upto version 7. Introduced
utility macros for vmxnet3 version 7 comparison and update Copyright
information.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

55f0395f