- 02 Jan, 2018 40 commits
-
-
Juan Zea authored
commit 544c4605 upstream. usbip bind writes commands followed by random string when writing to match_busid attribute in sysfs, caused by using full variable size instead of string length. Signed-off-by:
Juan Zea <juan.zea@qindel.com> Acked-by:
Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Engelhardt authored
[ Upstream commit 59585b4b ] Commit v4.12-rc4-1-g9289ea7f introduced a mistake that made the 64-bit hweight stub call the 16-bit hweight function. Fixes: 9289ea7f ("sparc64: Use indirect calls in hamming weight stubs") Signed-off-by:
Jan Engelhardt <jengelh@inai.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
skb_copy_ubufs must unclone before it is safe to modify its skb_shared_info with skb_zcopy_clear. Commit b90ddd56 ("skbuff: skb_copy_ubufs must release uarg even without user frags") ensures that all skbs release their zerocopy state, even those without frags. But I forgot an edge case where such an skb arrives that is cloned. The stack does not build such packets. Vhost/tun skbs have their frags orphaned before cloning. TCP skbs only attach zerocopy state when a frag is added. But if TCP packets can be trimmed or linearized, this might occur. Tracing the code I found no instance so far (e.g., skb_linearize ends up calling skb_zcopy_clear if !skb->data_len). Still, it is non-obvious that no path exists. And it is fragile to rely on this. Fixes: b90ddd56 ("skbuff: skb_copy_ubufs must release uarg even without user frags") Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit b90ddd56 ] skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece7826 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit 268b7906 ] Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY") Reported-by:
Andreas Hartmann <andihartmann@01019freenet.de> Reported-by:
David Hill <dhill@redhat.com> Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Saeed Mahameed authored
[ Upstream commit 231243c8 ] Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393a. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393a ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by:
Jes Sorensen <jsorensen@fb.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicolas Dichtel authored
[ Upstream commit 09400953 ] With commits 35e015e1 and a2d3f3e3, the global 'accept_dad' flag is also taken into account (default value is 1). If either global or per-interface flag is non-zero, DAD will be enabled on a given interface. This is not backward compatible: before those patches, the user could disable DAD just by setting the per-interface flag to 0. Now, the user instead needs to set both flags to 0 to actually disable DAD. Restore the previous behaviour by setting the default for the global 'accept_dad' flag to 0. This way, DAD is still enabled by default, as per-interface flags are set to 1 on device creation, but setting them to 0 is enough to disable DAD on a given interface. - Before 35e015e1f57a7 and a2d3f3e3: global per-interface DAD enabled [default] 1 1 yes X 0 no X 1 yes - After 35e015e1 and a2d3f3e3: global per-interface DAD enabled [default] 1 1 yes 0 0 no 0 1 yes 1 0 yes - After this fix: global per-interface DAD enabled 1 1 yes 0 0 no [default] 0 1 yes 1 0 yes Fixes: 35e015e1 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Fixes: a2d3f3e3 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real") CC: Stefano Brivio <sbrivio@redhat.com> CC: Matteo Croce <mcroce@redhat.com> CC: Erik Kline <ek@google.com> Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Phil Sutter authored
[ Upstream commit d03a4557 ] The recently added fib_metrics_match() causes a regression for routes with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has TCP_CONG_NEEDS_ECN flag set: | # ip link add d0 type dummy | # ip link set d0 up | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp | RTNETLINK answers: No such process During route insertion, fib_convert_metrics() detects that the given CC algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in RTAX_FEATURES. During route deletion though, fib_metrics_match() compares stored RTAX_FEATURES value with that from userspace (which obviously has no knowledge about DST_FEATURE_ECN_CA) and fails. Fixes: 5f9ae3d9 ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by:
Phil Sutter <phil@nwl.cc> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Russell King authored
[ Upstream commit 74ee0e8c ] Ensure that we mark AN as enabled at boot time, rather than leaving it disabled. This is noticable if your SFP module is fiber, and it supports faster speeds than 1G with 2.5G support in place. Fixes: 9525ae83 ("phylink: add phylink infrastructure") Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Russell King authored
[ Upstream commit 182088aa ] When setting the ethtool settings, ensure that the validated PHY interface mode is propagated to the current link settings, so that 2500BaseX can be selected. Fixes: 9525ae83 ("phylink: add phylink infrastructure") Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Calvin Owens authored
[ Upstream commit 2edbdb31 ] After applying 2270bc5d ("bnxt_en: Fix netpoll handling") and 903649e7 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll <snip> Call Trace: [<ffffffff814be5cd>] dump_stack+0x4d/0x70 [<ffffffff8107e013>] __warn+0xd3/0xf0 [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60 [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160 [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250 [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440 [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250 [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110 [<ffffffff810c9549>] console_unlock+0x339/0x5b0 [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450 [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30 [<ffffffff81173df5>] printk+0x48/0x50 [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac] [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core] [<ffffffff81095f8b>] process_one_work+0x19b/0x480 [<ffffffff810967ca>] worker_thread+0x6a/0x520 [<ffffffff8109c7c4>] kthread+0xe4/0x100 [<ffffffff81884c52>] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by:
Calvin Owens <calvinowens@fb.com> Acked-by:
Michael Chan <michael.chan@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Pirko authored
[ Upstream commit b59e6979 ] Move static key increments to the beginning of the init function so they pair 1:1 with decrements in ingress/clsact_destroy, which is called in case ingress/clsact_init fails. Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by:
Jiri Pirko <jiri@mellanox.com> Acked-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit f870c1ff ] Stefano Brivio says: Commit a985343b ("vxlan: refactor verification and application of configuration") introduced a change in the behaviour of initial MTU setting: earlier, the MTU for a link created on top of a given lower device, without an initial MTU specification, was set to the MTU of the lower device minus headroom as a result of this path in vxlan_dev_configure(): if (!conf->mtu) dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); which is now gone. Now, the initial MTU, in absence of a configured value, is simply set by ether_setup() to ETH_DATA_LEN (1500 bytes). This breaks userspace expectations in case the MTU of the lower device is higher than 1500 bytes minus headroom. This patch restores the previous behaviour on newlink operation. Since max_mtu can be negative and we update dev->mtu directly, also check it for valid minimum. Reported-by:
Junhan Yan <juyan@redhat.com> Fixes: a985343b ("vxlan: refactor verification and application of configuration") Signed-off-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kamal Heib authored
[ Upstream commit bae115a2 ] Currently, if a size of zero is passed to mlx5_fpga_mem_{read|write}_i2c() the "err" return value will not be initialized, which triggers gcc warnings: [..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error: uninitialized symbol 'err'. [..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error: uninitialized symbol 'err'. fix that. Fixes: a9956d35 ('net/mlx5: FPGA, Add SBU infrastructure') Signed-off-by:
Kamal Heib <kamalh@mellanox.com> Reviewed-by:
Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 4688eb7c ] Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e2070 ("tcp: use tp->tcp_mstamp in output path") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Mike Maloney <maloney@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Soheil Hassas Yeganeh <soheil@google.com> Acked-by:
Mike Maloney <maloney@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ido Schimmel authored
[ Upstream commit 58acfd71 ] Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Acked-by:
David Ahern <dsahern@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zhao Qiang authored
[ Upstream commit c505873e ] 88E1145 also need this autoneg errata. Fixes: f2899788 ("net: phy: marvell: Limit errata to 88m1101") Signed-off-by:
Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Wang authored
[ Upstream commit 9ee11bd0 ] When ms timestamp is used, current logic uses 1us in tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us. This could cause rcv_rtt underestimation. Fix it by always using a min value of 1ms if ms timestamp is used. Fixes: 645f4c6f ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by:
Wei Wang <weiwan@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yuval Mintz authored
[ Upstream commit fccff086 ] Learning is currently enabled for ports which are OVS slaves - even though OVS doesn't need this indication. Since we're not associating a fid with the port, HW would continuously notify driver of learned [& aged] MACs which would be logged as errors. Fixes: 2b94e58d ("mlxsw: spectrum: Allow ports to work under OVS master") Signed-off-by:
Yuval Mintz <yuvalm@mellanox.com> Reviewed-by:
Ido Schimmel <idosch@mellanox.com> Signed-off-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Parthasarathy Bhuvaragan authored
[ Upstream commit 517d7c79 ] In commit 42b531de ("tipc: Fix missing connection request handling"), we replaced unconditional wakeup() with condtional wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND. This breaks the applications which do a connect followed by poll with POLLOUT flag. These applications are not woken when the connection is ESTABLISHED and hence sleep forever. In this commit, we fix it by including the POLLOUT event for sockets in TIPC_CONNECTING state. Fixes: 42b531de ("tipc: Fix missing connection request handling") Acked-by:
Jon Maloy <jon.maloy@ericsson.com> Signed-off-by:
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xin Long authored
[ Upstream commit 2342b8d9 ] Now in sctp_setsockopt_reset_streams, it only does the check optlen < sizeof(*params) for optlen. But it's not enough, as params->srs_number_streams should also match optlen. If the streams in params->srs_stream_list are less than stream nums in params->srs_number_streams, later when dereferencing the stream list, it could cause a slab-out-of-bounds crash, as reported by syzbot. This patch is to fix it by also checking the stream numbers in sctp_setsockopt_reset_streams to make sure at least it's not greater than the streams in the list. Fixes: 7f9d68ac ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Reported-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Xin Long <lucien.xin@gmail.com> Acked-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Wiedmann authored
[ Upstream commit ad3cbf61 ] Make sure to check both return code fields before processing the response. Otherwise we risk operating on invalid data. Fixes: c9475369 ("s390/qeth: rework RX/TX checksum offload") Signed-off-by:
Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit 4b52d010 ] The PHY on BCM7278 has an additional bit that needs to be cleared: IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out of suspend/resume cycles. Fixes: 0fe99338 ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bert Kenward authored
[ Upstream commit d4a7a889 ] The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers cannot be NULL. Add a paranoid warning to check this condition and fix the one case where they were NULL. efx_enqueue_unwind() is called very rarely, during error handling. Without this fix it would fail with a NULL pointer dereference in efx_dequeue_buffer, with efx_enqueue_skb in the call stack. Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2") Reported-by:
Jarod Wilson <jarod@redhat.com> Signed-off-by:
Bert Kenward <bkenward@solarflare.com> Tested-by:
Jarod Wilson <jarod@redhat.com> Acked-by:
Jarod Wilson <jarod@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Garver authored
[ Upstream commit c48e7473 ] skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbad ("openvswitch: add processing of L3 packets") Signed-off-by:
Eric Garver <e@erig.me> Reviewed-by:
Jiri Benc <jbenc@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Moni Shoua authored
[ Upstream commit dbff26e4 ] In error flow, when DESTROY_QP command should be executed, the wrong mailbox was set with data, not the one that is written to hardware, Fix that. Fixes: 09a7d9ec '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc' Signed-off-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gal Pressman authored
[ Upstream commit 0c1cc8b2 ] When calling add/remove VXLAN port, a lock must be held in order to prevent race scenarios when more than one add/remove happens at the same time. Fix by holding our state_lock (mutex) as done by all other parts of the driver. Note that the spinlock protecting the radix-tree is still needed in order to synchronize radix-tree access from softirq context. Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by:
Gal Pressman <galp@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gal Pressman authored
[ Upstream commit 23f4cc2c ] A refcount mechanism must be implemented in order to prevent unwanted scenarios such as: - Open an IPv4 VXLAN interface - Open an IPv6 VXLAN interface (different socket) - Remove one of the interfaces With current implementation, the UDP port will be removed from our VXLAN database and turn off the offloads for the other interface, which is still active. The reference count mechanism will only allow UDP port removals once all consumers are gone. Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by:
Gal Pressman <galp@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gal Pressman authored
[ Upstream commit 2989ad1e ] The assumption that the next header field contains the transport protocol is wrong for IPv6 packets with extension headers. Instead, we should look the inner-most next header field in the buffer. This will fix TSO offload for tunnels over IPv6 with extension headers. Performance testing: 19.25x improvement, cool! Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap. CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] TSO: Enabled Before: 4,926.24 Mbps Now : 94,827.91 Mbps Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by:
Gal Pressman <galp@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gal Pressman authored
[ Upstream commit 63235141 ] mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user context) and mlx5e_features_check (softirq), but the lock acquired does not disable bottom half and might result in deadlock. Fix it by simply replacing spin_lock() with spin_lock_bh(). While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh(). lockdep's WARNING: inconsistent lock state [ 654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes: [ 654.028321] (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028528] {SOFTIRQ-ON-W} state was registered at: [ 654.028607] _raw_spin_lock+0x3c/0x70 [ 654.028689] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028794] mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core] [ 654.028878] process_one_work+0x1e9/0x640 [ 654.028942] worker_thread+0x4a/0x3f0 [ 654.029002] kthread+0x141/0x180 [ 654.029056] ret_from_fork+0x24/0x30 [ 654.029114] irq event stamp: 579088 [ 654.029174] hardirqs last enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0 [ 654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0 [ 654.029446] softirqs last enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80 [ 654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0 [ 654.029684] other info that might help us debug this: [ 654.029781] Possible unsafe locking scenario: [ 654.029868] CPU0 [ 654.029908] ---- [ 654.029947] lock(&(&vxlan_db->lock)->rlock); [ 654.030045] <Interrupt> [ 654.030090] lock(&(&vxlan_db->lock)->rlock); [ 654.030162] *** DEADLOCK *** Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by:
Gal Pressman <galp@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eran Ben Elisha authored
[ Upstream commit 37e92a9d ] In mlx5_ifc, struct size was not complete, and thus driver was sending garbage after the last defined field. Fixed it by adding reserved field to complete the struct size. In addition, rename all set_rate_limit to set_pp_rate_limit to be compliant with the Firmware <-> Driver definition. Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates") Fixes: 1466cc5b ("net/mlx5: Rate limit tables support") Signed-off-by:
Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yousuk Seung authored
[ Upstream commit d4761754 ] Mark tcp_sock during a SACK reneging event and invalidate rate samples while marked. Such rate samples may overestimate bw by including packets that were SACKed before reneging. < ack 6001 win 10000 sack 7001:38001 < ack 7001 win 0 sack 8001:38001 // Reneg detected > seq 7001:8001 // RTO, SACK cleared. < ack 38001 win 10000 In above example the rate sample taken after the last ack will count 7001-38001 as delivered while the actual delivery rate likely could be much lower i.e. 7001-8001. This patch adds a new field tcp_sock.sack_reneg and marks it when we declare SACK reneging and entering TCP_CA_Loss, and unmarks it after the last rate sample was taken before moving back to TCP_CA_Open. This patch also invalidates rate samples taken while tcp_sock.is_sack_reneg is set. Fixes: b9f64820 ("tcp: track data delivery rate for a TCP connection") Signed-off-by:
Yousuk Seung <ysseung@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
Yuchung Cheng <ycheng@google.com> Acked-by:
Soheil Hassas Yeganeh <soheil@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Priyaranjan Jha <priyarjha@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit 35b99dff ] skb_complete_tx_timestamp must ingest the skb it is passed. Call kfree_skb if the skb cannot be enqueued. Fixes: b245be1f ("net-timestamp: no-payload only sysctl") Fixes: 9ac25fc0 ("net: fix socket refcounting in skb_complete_tx_timestamp()") Reported-by:
Richard Cochran <richardcochran@gmail.com> Signed-off-by:
Willem de Bruijn <willemb@google.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Grygorii Strashko authored
[ Upstream commit c1a8d0a3 ] Under some circumstances driver will perform PHY reset in ksz9031_read_status() to fix autoneg failure case (idle error count = 0xFF). When this happens ksz9031 will not detect link status change any more when connecting to Netgear 1G switch (link can be recovered sometimes by restarting netdevice "ifconfig down up"). Reproduced with TI am572x board equipped with ksz9031 PHY while connecting to Netgear 1G switch. Fix the issue by reconfiguring autonegotiation after PHY reset in ksz9031_read_status(). Fixes: d2fd719b ("net/phy: micrel: Add workaround for bad autoneg") Signed-off-by:
Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
[ Upstream commit 21b59443 ] (I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 0c7aecd4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikolay Aleksandrov authored
[ Upstream commit 84aeb437 ] The early call to br_stp_change_bridge_id in bridge's newlink can cause a memory leak if an error occurs during the newlink because the fdb entries are not cleaned up if a different lladdr was specified, also another minor issue is that it generates fdb notifications with ifindex = 0. Another unrelated memory leak is the bridge sysfs entries which get added on NETDEV_REGISTER event, but are not cleaned up in the newlink error path. To remove this special case the call to br_stp_change_bridge_id is done after netdev register and we cleanup the bridge on changelink error via br_dev_delete to plug all leaks. This patch makes netlink bridge destruction on newlink error the same as dellink and ioctl del which is necessary since at that point we have a fully initialized bridge device. To reproduce the issue: $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1 RTNETLINK answers: Invalid argument $ rmmod bridge [ 1822.142525] ============================================================================= [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown() [ 1822.144821] ----------------------------------------------------------------------------- [ 1822.145990] Disabling lock debugging due to kernel taint [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100 [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87 [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1822.150008] Call Trace: [ 1822.150510] dump_stack+0x78/0xa9 [ 1822.151156] slab_err+0xb1/0xd3 [ 1822.151834] ? __kmalloc+0x1bb/0x1ce [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b [ 1822.153395] shutdown_cache+0x13/0x144 [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb [ 1822.154669] SyS_delete_module+0x194/0x244 [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a [ 1822.156343] RIP: 0033:0x7f929bd38b17 [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17 [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0 [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11 [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80 [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090 [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0 [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128 Fixes: 30313a3d ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") Fixes: 5b8d5429 ("bridge: netlink: register netdevice before executing changelink") Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ido Schimmel authored
[ Upstream commit b4681c28 ] Since commit 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Acked-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit e5a9336a ] When ip6gre is created using ioctl, its features, such as scatter-gather, GSO and tx-checksumming will be turned off: # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off generic-segmentation-offload: off [requested on] But when netlink is used, they will be enabled: # ip link add gre6 type ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on generic-segmentation-offload: on This results in a loss of performance when gre6 is created via ioctl. The issue was found with LTP/gre tests. Fix it by moving the setup of device features to a separate function and invoke it with ndo_init callback because both netlink and ioctl will eventually call it via register_netdevice(): register_netdevice() - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init() - ip6gre_tunnel_init_common() - ip6gre_tnl_init_features() The moved code also contains two minor style fixes: * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line. * fixed the issue reported by checkpatch: "Unnecessary parentheses around 'nt->encap.type == TUNNEL_ENCAP_NONE'" Fixes: ac4eb009 ("ip6gre: Add support for basic offloads offloads excluding GSO") Signed-off-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikita V. Shirokov authored
[ Upstream commit 74c4b656 ] commit 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels") introduced new exit point in ipxip6_rcv. however rcu_read_unlock is missing there. this diff is fixing this v1->v2: instead of doing rcu_read_unlock in place, we are going to "drop" section (to prevent skb leakage) Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by:
Nikita V. Shirokov <tehnerd@fb.com> Acked-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tonghao Zhang authored
[ Upstream commit 8cb38a60 ] The patch(180d8cd9) replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to accessor macros. But the sockets_allocated field of sctp sock is not replaced at all. Then replace it now for unifying the code. Fixes: 180d8cd9 ("foundations of per-cgroup memory pressure controlling.") Cc: Glauber Costa <glommer@parallels.com> Signed-off-by:
Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-