- 09 Jan, 2021 10 commits
-
-
Aya Levin authored
There are cases where GSO segment's length exceeds the egress MTU: - Forwarding of a TCP GRO skb, when DF flag is not set. - Forwarding of an skb that arrived on a virtualisation interface (virtio-net/vhost/tap) with TSO/GSO size set by other network stack. - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an interface with a smaller MTU. - Arriving GRO skb (or GSO skb in a virtualised environment) that is bridged to a NETIF_F_TSO tunnel stacked over an interface with an insufficient MTU. If so: - Consume the SKB and its segments. - Issue an ICMP packet with 'Packet Too Big' message containing the MTU, allowing the source host to reduce its Path MTU appropriately. Note: These cases are handled in the same manner in IPv4 output finish. This patch aligns the behavior of IPv6 and the one of IPv4. Fixes: 9e508490 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Manish Chopra authored
For all PCI functions on the netxen_nic adapter, interrupt mode (INTx or MSI) configuration is dependent on what has been configured by the PCI function zero in the shared interrupt register, as these adapters do not support mixed mode interrupts among the functions of a given adapter. Logic for setting MSI/MSI-x interrupt mode in the shared interrupt register based on PCI function id zero check is not appropriate for all family of netxen adapters, as for some of the netxen family adapters PCI function zero is not really meant to be probed/loaded in the host but rather just act as a management function on the device, which caused all the other PCI functions on the adapter to always use legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode. This patch replaces that check with port number so that for all type of adapters driver attempts for MSI/MSI-x interrupt modes. Fixes: b37eb210 ("netxen_nic: Avoid mixed mode interrupts") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== net: fix issues around register_netdevice() failures This series attempts to clean up the life cycle of struct net_device. Dave has added dev->needs_free_netdev in the past to fix double frees, we can lean on that mechanism a little more to fix remaining issues with register_netdevice(). This is the next chapter of the saga which already includes: commit 0e0eee24 ("net: correct error path in rtnl_newlink()") commit e51fb152 ("rtnetlink: fix a memory leak when ->newlink fails") commit cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.") commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.") commit 814152a8 ("net: fix memleak in register_netdevice()") commit 10cc514f ("net: Fix null de-reference of device refcount") The immediate problem which gets fixed here is that calling free_netdev() right after unregister_netdevice() is illegal because we need to release rtnl_lock first, to let the unregistration finish. Note that unregister_netdevice() is just a wrapper of unregister_netdevice_queue(), it only does half of the job. Where this limitation becomes most problematic is in failure modes of register_netdevice(). There is a notifier call right at the end of it, which lets other subsystems veto the entire thing. At which point we should really go through a full unregister_netdevice(), but we can't because callers may go straight to free_netdev() after the failure, and that's no bueno (see the previous paragraph). This set makes free_netdev() more lenient, when device is still being unregistered free_netdev() will simply set dev->needs_free_netdev and let the unregister process do the freeing. With the free_netdev() problem out of the way failures in register_netdevice() can make use of net_todo, again. Users are still expected to call free_netdev() right after failure but that will only set dev->needs_free_netdev. To prevent the pathological case of: dev->needs_free_netdev = true; if (register_netdevice(dev)) { rtnl_unlock(); free_netdev(dev); } make register_netdevice()'s failure clear dev->needs_free_netdev. Problems described above are only present with register_netdevice() / unregister_netdevice(). We have two parallel APIs for registration of devices: - those called outside rtnl_lock (register_netdev(), and unregister_netdev()); - and those to be used under rtnl_lock - register_netdevice() and unregister_netdevice(). The former is trivial and has no problems. The alternative approach to fix the latter would be to also separate the freeing functions - i.e. add free_netdevice(). This has been implemented (incl. converting all relevant calls in the tree) but it feels a little unnecessary to put the burden of choosing the right free_netdev{,ice}() call on the programmer when we can "just do the right thing" by default. ==================== Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
If register_netdevice() fails at the very last stage - the notifier call - some subsystems may have already seen it and grabbed a reference. struct net_device can't be freed right away without calling netdev_wait_all_refs(). Now that we have a clean interface in form of dev->needs_free_netdev and lenient free_netdev() we can undo what commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.") has done and complete the unregistration path by bringing the net_set_todo() call back. After registration fails user is still expected to explicitly free the net_device, so make sure ->needs_free_netdev is cleared, otherwise rolling back the registration will cause the old double free for callers who release rtnl_lock before the free. This also solves the problem of priv_destructor not being called on notifier error. net_set_todo() will be moved back into unregister_netdevice_queue() in a follow up. Reported-by: Hulk Robot <hulkci@huawei.com> Reported-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
There are two flavors of handling netdev registration: - ones called without holding rtnl_lock: register_netdev() and unregister_netdev(); and - those called with rtnl_lock held: register_netdevice() and unregister_netdevice(). While the semantics of the former are pretty clear, the same can't be said about the latter. The netdev_todo mechanism is utilized to perform some of the device unregistering tasks and it hooks into rtnl_unlock() so the locked variants can't actually finish the work. In general free_netdev() does not mix well with locked calls. Most drivers operating under rtnl_lock set dev->needs_free_netdev to true and expect core to make the free_netdev() call some time later. The part where this becomes most problematic is error paths. There is no way to unwind the state cleanly after a call to register_netdevice(), since unreg can't be performed fully without dropping locks. Make free_netdev() more lenient, and defer the freeing if device is being unregistered. This allows error paths to simply call free_netdev() both after register_netdevice() failed, and after a call to unregister_netdevice() but before dropping rtnl_lock. Simplify the error paths which are currently doing gymnastics around free_netdev() handling. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Explain the two basic flows of struct net_device's operation. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tom Parkin authored
When setting up a channel bridge, ppp_bridge_channels sets the pch->bridge field before taking the associated reference on the bridge file instance. This opens up a refcount underflow bug if ppp_bridge_channels called via. iotcl runs concurrently with ppp_unbridge_channels executing via. file release. The bug is triggered by ppp_bridge_channels taking the error path through the 'err_unset' label. In this scenario, pch->bridge is set, but the reference on the bridged channel will not be taken because the function errors out. If ppp_unbridge_channels observes pch->bridge before it is unset by the error path, it will erroneously drop the reference on the bridged channel and cause a refcount underflow. To avoid this, ensure that ppp_bridge_channels holds a reference on each channel in advance of setting the bridge pointers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Fixes: 4cf476ce ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls") Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20210107181315.3128-1-tparkin@katalix.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Baptiste Lepers authored
reuse->socks[] is modified concurrently by reuseport_add_sock. To prevent reading values that have not been fully initialized, only read the array up until the last known safe index instead of incorrectly re-reading the last index of the array. Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dongseok Yi authored
skbs in fraglist could be shared by a BPF filter loaded at TC. If TC writes, it will call skb_ensure_writable -> pskb_expand_head to create a private linear section for the head_skb. And then call skb_clone_fraglist -> skb_get on each skb in the fraglist. skb_segment_list overwrites part of the skb linear section of each fragment itself. Even after skb_clone, the frag_skbs share their linear section with their clone in PF_PACKET. Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have a link for the same frag_skbs chain. If a new skb (not frags) is queued to one of the sk_receive_queue, multiple ptypes can see and release this. It causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Stephan Gerhold authored
At the moment it is quite hard to identify the network interface provided by IPA in userspace components: The network interface is created as virtual device, without any link to the IPA device. The interface name ("rmnet_ipa%d") is the only indication that the network interface belongs to IPA, but this is not very reliable. Add SET_NETDEV_DEV() to associate the network interface with the IPA parent device. This allows userspace services like ModemManager to properly identify that this network interface is provided by IPA and belongs to the modem. Cc: Alex Elder <elder@kernel.org> Fixes: a646d6ec ("soc: qcom: ipa: modem and microcontroller") Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20210106100755.56800-1-stephan@gerhold.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 08 Jan, 2021 25 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull more networking fixes from Jakub Kicinski: "Slightly lighter pull request to get back into the Thursday cadence. Current release - always broken: - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions - dsa: hellcreek: fix led_classdev build errors Previous releases - regressions: - ipv6: fib: flush exceptions when purging route to avoid netdev reference leak - ip_tunnels: fix pmtu check in nopmtudisc mode - ip: always refragment ip defragmented packets to avoid MTU issues when forwarding through tunnels, correct "packet too big" message is prohibitively tricky to generate - s390/qeth: fix locking for discipline setup / removal and during recovery to prevent both deadlocks and races - mlx5: Use port_num 1 instead of 0 when delete a RoCE address Previous releases - always broken: - cdc_ncm: correct overhead calculation in delayed_ndp_size to prevent out of bound accesses with Huawei 909s-120 LTE module - fix stmmac dwmac-sun8i suspend/resume: - PHY being left powered off - MAC syscon configuration being reset - reference to the reset controller being improperly dropped - qrtr: fix null-ptr-deref in qrtr_ns_remove - can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver - mlx5e: CT: Use per flow counter when CT flow accounting is enabled - mlx5e: Fix SWP offsets when vlan inserted by driver Misc: - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict resolution And the usual many fixes to various error paths" * tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE s390/qeth: fix L2 header access in qeth_l3_osa_features_check() s390/qeth: fix locking for discipline setup / removal s390/qeth: fix deadlock during recovery selftests: fib_nexthops: Fix wrong mausezahn invocation nexthop: Bounce NHA_GATEWAY in FDB nexthop groups nexthop: Unlink nexthop group entry in error path nexthop: Fix off-by-one error in error path octeontx2-af: fix memory leak of lmac and lmac->name chtls: Fix chtls resources release sequence chtls: Added a check to avoid NULL pointer dereference chtls: Replace skb_dequeue with skb_peek chtls: Avoid unnecessary freeing of oreq pointer chtls: Fix panic when route to peer not configured chtls: Remove invalid set_tcb call chtls: Fix hardware tid leak net: ip: always refragment ip defragmented packets net: fix pmtu check in nopmtudisc mode selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking docs: octeontx2: tune rst markup ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes a functional bug in arm/chacha-neon as well as a potential buffer overflow in ecdh" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ecdh - avoid buffer overflow in ecdh_set_secret() crypto: arm/chacha-neon - add missing counter increment
-
Linus Torvalds authored
The kernel test robot reported a -5.8% performance regression on the "poll2" test of will-it-scale, and bisected it to commit d55564cf ("x86: Make __put_user() generate an out-of-line call"). I didn't expect an out-of-line __put_user() to matter, because no normal core code should use that non-checking legacy version of user access any more. But I had overlooked the very odd poll() usage, which does a __put_user() to update the 'revents' values of the poll array. Now, Al Viro correctly points out that instead of updating just the 'revents' field, it would be much simpler to just copy the _whole_ pollfd entry, and then we could just use "copy_to_user()" on the whole array of entries, the same way we use "copy_from_user()" a few lines earlier to get the original values. But that is not what we've traditionally done, and I worry that threaded applications might be concurrently modifying the other fields of the pollfd array. So while Al's suggestion is simpler - and perhaps worth trying in the future - this instead keeps the "just update revents" model. To fix the performance regression, use the modern "unsafe_put_user()" instead of __put_user(), with the proper "user_write_access_begin()" guarding in place. This improves code generation enormously. Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Laight <David.Laight@aculab.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Petr Mladek authored
This reverts commit 757055ae. The commit caused that ttynull was used as the default console on several systems[1][2][3]. As a result, the console was blank even when a better alternative existed. It happened when there was no console configured on the command line and ttynull_init() was the first initcall calling register_console(). Or it happened when /dev/ did not exist when console_on_rootfs() was called. It was not able to open /dev/console even though a console driver was registered. It tried to add ttynull console but it obviously did not help. But ttynull became the preferred console and was used by /dev/console when it was available later. The commit tried to fix a historical problem that have been there for ages. The primary motivation was the commit 3cffa06a ("printk/console: Allow to disable console output by using console="" or console=null"). It provided a clean solution for a workaround that was widely used and worked only by chance. This revert causes that the console="" or console=null command line options will again work only by chance. These options will cause that a particular console will be preferred and the default (tty) ones will not get enabled. There will be no console registered at all. As a result there won't be stdin, stdout, and stderr for the init process. But it worked exactly this way even before. The proper solution has to fulfill many conditions: + Register ttynull only when explicitly required or as the ultimate fallback. + ttynull should get associated with /dev/console but it must not become preferred console when used as a fallback. Especially, it must still be possible to replace it by a better console later. Such a change requires clean up of the register_console() code. Otherwise, it would be even harder to follow. Especially, the use of has_preferred_console and CON_CONSDEV flag is tricky. The clean up is risky. The ordering of consoles is not well defined. And any changes tend to break existing user settings. Do the revert at the least risky solution for now. [1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/ [2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/ [3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Vineet Gupta <vgupta@synopsys.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5 fixes 2021-01-07 * tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups net/mlx5e: Fix two double free cases net/mlx5: Release devlink object if adev fails net/mlx5e: ethtool, Fix restriction of autoneg with 56G net/mlx5e: In skb build skip setting mark in switchdev mode net/mlx5: E-Switch, fix changing vf VLANID net/mlx5e: Fix SWP offsets when vlan inserted by driver net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address net/mlx5e: Add missing capability check for uplink follow net/mlx5: Check if lag is supported before creating one ==================== Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Aleksander Jan Bajkowski authored
Exclude RMII from modes that report 1 GbE support. Reduced MII supports up to 100 MbE. Fixes: 14fceff4 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210107195818.3878-1-olek2@wp.plSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Julian Wiedmann says: ==================== s390/qeth: fixes 2021-01-07 This brings two locking fixes for the device control path. Also one fix for a path where our .ndo_features_check() attempts to access a non-existent L2 header. ==================== Link: https://lore.kernel.org/r/20210107172442.1737-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
ip_finish_output_gso() may call .ndo_features_check() even before the skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt to inspect the L2 header via vlan_eth_hdr(). Switch to vlan_get_protocol(), as already used further down in the common qeth_features_check() path. Fixes: f13ade19 ("s390/qeth: run non-offload L3 traffic over common xmit path") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
Due to insufficient locking, qeth_core_set_online() and qeth_dev_layer2_store() can run in parallel, both attempting to load & setup the discipline (and stepping on each other toes along the way). A similar race can also occur between qeth_core_remove_device() and qeth_dev_layer2_store(). Access to .discipline is meant to be protected by the discipline_mutex, so add/expand the locking in qeth_core_remove_device() and qeth_core_set_online(). Adjust the locking in qeth_l*_remove_device() accordingly, as it's now handled by the callers in a consistent manner. Based on an initial patch by Ursula Braun. Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
When qeth_dev_layer2_store() - holding the discipline_mutex - waits inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete, we can hit a deadlock if qeth_do_reset() concurrently calls qeth_set_online() and thus tries to aquire the discipline_mutex. Move the discipline_mutex locking outside of qeth_set_online() and qeth_set_offline(), and turn the discipline into a parameter so that callers understand the dependency. To fix the deadlock, we can now relax the locking: As already established, qeth_l*_remove_device() waits for qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk of having card->discipline ripped out while it's running, and thus doesn't need to take the discipline_mutex. Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== nexthop: Various fixes This series contains various fixes for the nexthop code. The bugs were uncovered during the development of resilient nexthop groups. Patches #1-#2 fix the error path of nexthop_create_group(). I was not able to trigger these bugs with current code, but it is possible with the upcoming resilient nexthop groups code which adds a user controllable memory allocation further in the function. Patch #3 fixes wrong validation of netlink attributes. Patch #4 fixes wrong invocation of mausezahn in a selftest. ==================== Link: https://lore.kernel.org/r/20210107144824.1135691-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an error is returned: # ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address. Invalid command line parameters! Fixes: 7c741868 ("selftests: Add torture tests to nexthop tests") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
The function nh_check_attr_group() is called to validate nexthop groups. The intention of that code seems to have been to bounce all attributes above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all these attributes except when NHA_FDB attribute is present--then it accepts them. NHA_FDB validation that takes place before, in rtm_to_nh_config(), already bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further back, NHA_GROUPS and NHA_MASTER are bounced unconditionally. But that still leaves NHA_GATEWAY as an attribute that would be accepted in FDB nexthop groups (with no meaning), so long as it keeps the address family as unspecified: # ip nexthop add id 1 fdb via 127.0.0.1 # ip nexthop add id 10 fdb via default group 1 The nexthop code is still relatively new and likely not used very broadly, and the FDB bits are newer still. Even though there is a reproducer out there, it relies on an improbable gateway arguments "via default", "via all" or "via any". Given all this, I believe it is OK to reformulate the condition to do the right thing and bounce NHA_GATEWAY. Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
In case of error, remove the nexthop group entry from the list to which it was previously added. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
A reference was not taken for the current nexthop entry, so do not try to put it in the error path. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Colin Ian King authored
Currently the error return paths don't kfree lmac and lmac->name leading to some memory leaks. Fix this by adding two error return paths that kfree these objects Addresses-Coverity: ("Resource leak") Fixes: 1463f382 ("octeontx2-af: Add support for CGX link management") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ayush Sawal says: ==================== Bug fixes for chtls driver patch 1: Fix hardware tid leak. patch 2: Remove invalid set_tcb call. patch 3: Fix panic when route to peer not configured. patch 4: Avoid unnecessary freeing of oreq pointer. patch 5: Replace skb_dequeue with skb_peek. patch 6: Added a check to avoid NULL pointer dereference patch. patch 7: Fix chtls resources release sequence. ==================== Link: https://lore.kernel.org/r/20210106042912.23512-1-ayush.sawal@chelsio.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
CPL_ABORT_RPL is sent after releasing the resources by calling chtls_release_resources(sk); and chtls_conn_done(sk); eventually causing kernel panic. Fixing it by calling release in appropriate order. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
In case of server removal lookup_stid() may return NULL pointer, which is used as listen_ctx. So added a check before accessing this pointer. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
The skb is unlinked twice, one in __skb_dequeue in function chtls_reset_synq() and another in cleanup_syn_rcv_conn(). So in this patch using skb_peek() instead of __skb_dequeue(), so that unlink will be handled only in cleanup_syn_rcv_conn(). Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
In chtls_pass_accept_request(), removing the chtls_reqsk_free() call to avoid oreq freeing twice. Here oreq is the pointer to struct request_sock. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
If route to peer is not configured, we might get non tls devices from dst_neigh_lookup() which is invalid, adding a check to avoid it. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
At the time of SYN_RECV, connection information is not initialized at FW, updating tcb flag over uninitialized connection causes adapter crash. We don't need to update the flag during SYN_RECV state, so avoid this. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
send_abort_rpl() is not calculating cpl_abort_req_rss offset and ends up sending wrong TID with abort_rpl WR causng tid leaks. Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is redundant. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull gcc-plugins fix from Kees Cook: "Bump c++ standard version for latest GCC versions (Valdis Kletnieks)" * tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: fix gcc 11 indigestion with plugins...
-
- 07 Jan, 2021 5 commits
-
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Alexei Starovoitov says: ==================== pull-request: bpf 2021-01-07 We've added 4 non-merge commits during the last 10 day(s) which contain a total of 4 files changed, 14 insertions(+), 7 deletions(-). The main changes are: 1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong. 2) Fix resolve_btfids for multiple type hierarchies, from Jiri. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpftool: Fix compilation failure for net.o with older glibc tools/resolve_btfids: Warn when having multiple IDs for single type bpf: Fix a task_iter bug caused by a merge conflict resolution selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC ==================== Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Florian Westphal says: ==================== net: fix netfilter defrag/ip tunnel pmtu blackhole Christian Perle reported a PMTU blackhole due to unexpected interaction between the ip defragmentation that comes with connection tracking and ip tunnels. Unfortunately setting 'nopmtudisc' on the tunnel breaks the test scenario even without netfilter. Christinas setup looks like this: +--------+ +---------+ +--------+ |Router A|-------|Wanrouter|-------|Router B| | |.IPIP..| |..IPIP.| | +--------+ +---------+ +--------+ / mtu 1400 \ / \ +--------+ +--------+ |Client A| |Client B| +--------+ +--------+ MTU is 1500 everywhere, except on Router A to Wanrouter and Wanrouter to Router B. Router A and Router B use IPIP tunnel interfaces to tunnel traffic between Client A and Client B over WAN. Client A sends a 1400 byte UDP datagram to Client B. This packet gets encapsulated in the IPIP tunnel. This works, packet is received on client B. When conntrack (or anything else that forces ip defragmentation) is enabled on Router A, the packet gets dropped on Router A after encapsulation because they exceed the link MTU. Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse, no packets pass even in the no-netfilter scenario. Patch one is a reproducer script for selftest infra. Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send an icmp error to Client A. This allows 'nopmtudisc' tunnel to forward the UDP datagrams. Patch three enables ip refragmentation for all reassembled packets, just like ipv6. ==================== Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Conntrack reassembly records the largest fragment size seen in IPCB. However, when this gets forwarded/transmitted, fragmentation will only be forced if one of the fragmented packets had the DF bit set. In that case, a flag in IPCB will force fragmentation even if the MTU is large enough. This should work fine, but this breaks with ip tunnels. Consider client that sends a UDP datagram of size X to another host. The client fragments the datagram, so two packets, of size y and z, are sent. DF bit is not set on any of these packets. Middlebox netfilter reassembles those packets back to single size-X packet, before routing decision. packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit isn't set. At output time, ip refragmentation is skipped as well because x is still smaller than the mtu of the output device. If ttransmit device is an ip tunnel, the packet size increases to x+overhead. Also, tunnel might be configured to force DF bit on outer header. In this case, packet will be dropped (exceeds MTU) and an ICMP error is generated back to sender. But sender already respects the announced MTU, all the packets that it sent did fit the announced mtu. Force refragmentation as per original sizes unconditionally so ip tunnel will encapsulate the fragments instead. The only other solution I see is to place ip refragmentation in the ip_tunnel code to handle this case. Fixes: d6b915e2 ("ip_fragment: don't forward defragmented DF packet") Reported-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
For some reason ip_tunnel insist on setting the DF bit anyway when the inner header has the DF bit set, EVEN if the tunnel was configured with 'nopmtudisc'. This means that the script added in the previous commit cannot be made to work by adding the 'nopmtudisc' flag to the ip tunnel configuration. Doing so breaks connectivity even for the without-conntrack/netfilter scenario. When nopmtudisc is set, the tunnel will skip the mtu check, so no icmp error is sent to client. Then, because inner header has DF set, the outer header gets added with DF bit set as well. IP stack then sends an error to itself because the packet exceeds the device MTU. Fixes: 23a3647b ("ip_tunnels: Use skb-len to PMTU check.") Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Convert Christians bug description into a reproducer. Cc: Shuah Khan <shuah@kernel.org> Reported-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-