- 02 Jul, 2019 20 commits
-
-
David Howells authored
If sendmsg() or sendmmsg() is called on a connected socket that hasn't had bind() called on it, then an oops will occur when the kernel tries to connect the call because no local endpoint has been allocated. Fix this by implicitly binding the socket if it is in the RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state. Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this to prevent further attempts to bind it. This can be tested with: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> static const unsigned char inet6_addr[16] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa }; int main(void) { struct sockaddr_rxrpc srx; struct cmsghdr *cm; struct msghdr msg; unsigned char control[16]; int fd; memset(&srx, 0, sizeof(srx)); srx.srx_family = 0x21; srx.srx_service = 0; srx.transport_type = AF_INET; srx.transport_len = 0x1c; srx.transport.sin6.sin6_family = AF_INET6; srx.transport.sin6.sin6_port = htons(0x4e22); srx.transport.sin6.sin6_flowinfo = htons(0x4e22); srx.transport.sin6.sin6_scope_id = htons(0xaa3b); memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16); cm = (struct cmsghdr *)control; cm->cmsg_len = CMSG_LEN(sizeof(unsigned long)); cm->cmsg_level = SOL_RXRPC; cm->cmsg_type = RXRPC_USER_CALL_ID; *(unsigned long *)CMSG_DATA(cm) = 0; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = control; msg.msg_controllen = cm->cmsg_len; msg.msg_flags = 0; fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); connect(fd, (struct sockaddr *)&srx, sizeof(srx)); sendmsg(fd, &msg, 0); return 0; } Leading to the following oops: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:rxrpc_connect_call+0x42/0xa01 ... Call Trace: ? mark_held_locks+0x47/0x59 ? __local_bh_enable_ip+0xb6/0xba rxrpc_new_client_call+0x3b1/0x762 ? rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_sendmsg+0x16b/0x1b5 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1a4/0x22a ? release_sock+0x19/0x9e ? reacquire_held_locks+0x136/0x160 ? release_sock+0x19/0x9e ? find_held_lock+0x2b/0x6e ? __lock_acquire+0x268/0xf73 ? rxrpc_connect+0xdd/0xe4 ? __local_bh_enable_ip+0xb6/0xba __sys_sendmsg+0x5e/0x94 do_syscall_64+0x7d/0x1bf entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 2341e077 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op") Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nikolay Aleksandrov says: ==================== net: bridge: fix possible stale skb pointers In the bridge driver we have a couple of places which call pskb_may_pull but we've cached skb pointers before that and use them after which can lead to out-of-bounds/stale pointer use. I've had these in my "to fix" list for some time and now we got a report (patch 01) so here they are. Patches 02-04 are fixes based on code inspection. Also patch 01 was tested by Martin Weinelt, Martin if you don't mind please add your tested-by tag to it by replying with Tested-by: name <email>. I've also briefly tested the set by trying to exercise those code paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Don't cache eth dest pointer before calling pskb_may_pull. Fixes: cf0f02d0 ("[BRIDGE]: use llc for receiving STP packets") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
We would cache ether dst pointer on input in br_handle_frame_finish but after the neigh suppress code that could lead to a stale pointer since both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to always reload it after the suppress code so there's no point in having it cached just retrieve it directly. Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Fixes: ed842fae ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may call pskb_may_pull afterwards and end up using a stale pointer. So use the header directly, it's just 1 place where it's needed. Fixes: 08b202b6 ("bridge br_multicast: IPv6 MLD support.") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
We take a pointer to grec prior to calling pskb_may_pull and use it afterwards to get nsrcs so record nsrcs before the pull when handling igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling mld2 which again could lead to reading 2 bytes out-of-bounds. ================================================================== BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge] Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16 CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G OE 5.2.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 ? br_multicast_rcv+0x480c/0x4ad0 [bridge] __kasan_report+0x152/0x1aa ? br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_rcv+0x480c/0x4ad0 [bridge] kasan_report+0xe/0x20 br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_disable_port+0x150/0x150 [bridge] ? ktime_get_with_offset+0xb4/0x150 ? __kasan_kmalloc.constprop.6+0xa6/0xf0 ? __netif_receive_skb+0x1b0/0x1b0 ? br_fdb_update+0x10e/0x6e0 [bridge] ? br_handle_frame_finish+0x3c6/0x11d0 [bridge] br_handle_frame_finish+0x3c6/0x11d0 [bridge] ? br_pass_frame_up+0x3a0/0x3a0 [bridge] ? virtnet_probe+0x1c80/0x1c80 [virtio_net] br_handle_frame+0x731/0xd90 [bridge] ? select_idle_sibling+0x25/0x7d0 ? br_handle_frame_finish+0x11d0/0x11d0 [bridge] __netif_receive_skb_core+0xced/0x2d70 ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring] ? do_xdp_generic+0x20/0x20 ? virtqueue_napi_complete+0x39/0x70 [virtio_net] ? virtnet_poll+0x94d/0xc78 [virtio_net] ? receive_buf+0x5120/0x5120 [virtio_net] ? __netif_receive_skb_one_core+0x97/0x1d0 __netif_receive_skb_one_core+0x97/0x1d0 ? __netif_receive_skb_core+0x2d70/0x2d70 ? _raw_write_trylock+0x100/0x100 ? __queue_work+0x41e/0xbe0 process_backlog+0x19c/0x650 ? _raw_read_lock_irq+0x40/0x40 net_rx_action+0x71e/0xbc0 ? __switch_to_asm+0x40/0x70 ? napi_complete_done+0x360/0x360 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __schedule+0x85e/0x14d0 __do_softirq+0x1db/0x5f9 ? takeover_tasklets+0x5f0/0x5f0 run_ksoftirqd+0x26/0x40 smpboot_thread_fn+0x443/0x680 ? sort_range+0x20/0x20 ? schedule+0x94/0x210 ? __kthread_parkme+0x78/0xf0 ? sort_range+0x20/0x20 kthread+0x2ae/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0xffffc000000000() raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000 raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Fixes: bc8c20ac ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Reported-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hayes Wang authored
1. Rename r8153b_queue_wake() to r8153_queue_wake(). 2. Correct the setting. The enable bit should be 0xd38c bit 0. Besides, the 0xd38a bit 0 and 0xd398 bit 8 have to be cleared for both enabled and disabled. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Commit 86029d10 ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: 86029d10 ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Neither drivers nor the tls offload code currently supports TLS version 1.3. Check the TLS version when installing connection state. TLS 1.3 will just fallback to the kernel crypto for now. Fixes: 130b392c ("net: tls: Add tls 1.3 support") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Cong Wang says: ==================== idr: fix overflow cases on 32-bit CPU idr_get_next_ul() is problematic by design, it can't handle the following overflow case well on 32-bit CPU: u32 id = UINT_MAX; idr_alloc_u32(&id); while (idr_get_next_ul(&id) != NULL) id++; when 'id' overflows and becomes 0 after UINT_MAX, the loop goes infinite. Fix this by eliminating external users of idr_get_next_ul() and migrating them to idr_for_each_entry_continue_ul(). And add an additional parameter for these iteration macros to detect overflow properly. Please merge this through networking tree, as all the users are in networking subsystem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
Similarly, other callers of idr_get_next_ul() suffer the same overflow bug as they don't handle it properly either. Introduce idr_for_each_entry_continue_ul() to help these callers iterate from a given ID. cls_flower needs more care here because it still has overflow when does arg->cookie++, we have to fold its nested loops into one and remove the arg->cookie++. Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr") Fixes: 12d6066c ("net/mlx5: Add flow counters idr") Reported-by: Li Shuang <shuali@redhat.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Chris Mi <chrism@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
idr_for_each_entry_ul() is buggy as it can't handle overflow case correctly. When we have an ID == UINT_MAX, it becomes an infinite loop. This happens when running on 32-bit CPU where unsigned long has the same size with unsigned int. There is no better way to fix this than casting it to a larger integer, but we can't just 64 bit integer on 32 bit CPU. Instead we could just use an additional integer to help us to detect this overflow case, that is, adding a new parameter to this macro. Fortunately tc action is its only user right now. Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR") Reported-by: Li Shuang <shuali@redhat.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mi <chrism@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stefano Garzarella says: ==================== vsock/virtio: several fixes in the .probe() and .remove() During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock before registering the driver", Stefan pointed out some possible issues in the .probe() and .remove() callbacks of the virtio-vsock driver. This series tries to solve these issues: - Patch 1 adds RCU critical sections to avoid use-after-free of 'the_virtio_vsock' pointer. - Patch 2 stops workers before to call vdev->config->reset(vdev) to be sure that no one is accessing the device. - Patch 3 moves the works flush at the end of the .remove() to avoid use-after-free of 'vsock' object. v2: - Patch 1: use RCU to protect 'the_virtio_vsock' pointer - Patch 2: no changes - Patch 3: flush works only at the end of .remove() - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers allocated. v1: https://patchwork.kernel.org/cover/10964733/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
This patch moves the flush of works after vdev->config->del_vqs(vdev), because we need to be sure that no workers run before to free the 'vsock' object. Since we stopped the workers using the [tx|rx|event]_run flags, we are sure no one is accessing the device while we are calling vdev->config->reset(vdev), so we can safely move the workers' flush. Before the vdev->config->del_vqs(vdev), workers can be scheduled by VQ callbacks, so we must flush them after del_vqs(), to avoid use-after-free of 'vsock' object. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
Before to call vdev->config->reset(vdev) we need to be sure that no one is accessing the device, for this reason, we add new variables in the struct virtio_vsock to stop the workers during the .remove(). This patch also add few comments before vdev->config->reset(vdev) and vdev->config->del_vqs(vdev). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
Some callbacks used by the upper layers can run while we are in the .remove(). A potential use-after-free can happen, because we free the_virtio_vsock without knowing if the callbacks are over or not. To solve this issue we move the assignment of the_virtio_vsock at the end of .probe(), when we finished all the initialization, and at the beginning of .remove(), before to release resources. For the same reason, we do the same also for the vdev->priv. We use RCU to be sure that all callbacks that use the_virtio_vsock ended before freeing it. This is not required for callbacks that use vdev->priv, because after the vdev->config->del_vqs() we are sure that they are ended and will no longer be invoked. We also take the mutex during the .remove() to avoid that .probe() can run while we are resetting the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Taehee Yoo authored
__vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
It allocates the extended area for outbound streams only on sendmsg calls, if they are not yet allocated. When using the priority stream scheduler, this initialization may imply into a subsequent allocation, which may fail. In this case, it was aborting the stream scheduler initialization but leaving the ->ext pointer (allocated) in there, thus in a partially initialized state. On a subsequent call to sendmsg, it would notice the ->ext pointer in there, and trip on uninitialized stuff when trying to schedule the data chunk. The fix is undo the ->ext initialization if the stream scheduler initialization fails and avoid the partially initialized state. Although syzkaller bisected this to commit 4ff40b86 ("sctp: set chunk transport correctly when it's a new asoc"), this bug was actually introduced on the commit I marked below. Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations") Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
When the skb is associated with a new sock, just assigning it to skb->sk is not sufficient, we have to set its destructor to free the sock properly too. Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 01 Jul, 2019 5 commits
-
-
Matteo Croce authored
Avoid the situation where an IPV6 only flag is applied to an IPv4 address: # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global noprefixroute dummy0 valid_lft forever preferred_lft forever Or worse, by sending a malicious netlink command: # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0 valid_lft forever preferred_lft forever Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vandana BN authored
Fix GUE_PFLAG_REMCSUM to use "U" cast to avoid shifting signed 32-bit value by 31 bits problem. Signed-off-by: Vandana BN <bnvandana@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vandana BN authored
Fix DST_FEATURE_ECN_CA to use "U" cast to avoid shifting signed 32-bit value by 31 bits problem. Signed-off-by: Vandana BN <bnvandana@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hangbin Liu authored
default_ttl should be integer instead of bool Reported-by: Ying Xu <yinxu@redhat.com> Fixes: a59166e4 ("mpls: allow TTL propagation from IP packets to be configured") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc599 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Jun, 2019 6 commits
-
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Bug fixes. Miscellaneous bug fix patches, including two resource handling fixes for the RDMA driver, a PCI shutdown patch to add pci_disable_device(), a patch to fix ethtool selftest crash, and the last one suppresses an unnecessry error message. Please also queue patches 1, 2, and 3 for -stable. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Some firmware versions do not support this so use the silent variant to send the message to firmware to suppress the harmless error. This error message is unnecessarily alarming the user. Fixes: afdc8a84 ("bnxt_en: Add DCBNL DSCP application protocol support.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
In an earlier commit to improve NQ reservations on 57500 chips, we set the resv_irqs on the 57500 VFs to the fixed value assigned by the PF regardless of how many are actually used. The current code assumes that resv_irqs minus the ones used by the network driver must be the ones for the RDMA driver. This is no longer true and we may return more MSIX vectors than requested, causing inconsistency. Fix it by capping the value. Fixes: 01989c6b ("bnxt_en: Improve NQ reservations.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The current logic assumes that the RDMA driver uses one statistics context adjacent to the ones used by the network driver. This assumption is not true and the statistics context used by the RDMA driver is tied to its MSIX base vector. This wrong assumption can cause RDMA driver failure after changing ethtool rings on the network side. Fix the statistics reservation logic accordingly. Fixes: 780baad4 ("bnxt_en: Reserve 1 stat_ctx for RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
After ethtool loopback packet tests, we re-open the nic for the next IRQ test. If the open fails, we must not proceed with the IRQ test or we will crash with NULL pointer dereference. Fix it by checking the bnxt_open_nic() return code before proceeding. Reported-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Fixes: 67fea463 ("bnxt_en: Add interrupt test to ethtool -t selftest.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Some chips with older firmware can continue to perform DMA read from context memory even after the memory has been freed. In the PCI shutdown method, we need to call pci_disable_device() to shutdown DMA to prevent this DMA before we put the device into D3hot. DMA memory request in D3hot state will generate PCI fatal error. Similarly, in the driver remove method, the context memory should only be freed after DMA has been shutdown for correctness. Fixes: 98f04cf0 ("bnxt_en: Check context memory requirements from firmware.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 29 Jun, 2019 9 commits
-
-
Baruch Siach authored
Add a 1ms delay after reset deactivation. Otherwise the chip returns bogus ID value. This is observed with 88E6390 (Peridot) chip. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guilherme G. Piccoli authored
Currently bnx2x ptp worker tries to read a register with timestamp information in case of TX packet timestamping and in case it fails, the routine reschedules itself indefinitely. This was reported as a kworker always at 100% of CPU usage, which was narrowed down to be bnx2x ptp_task. By following the ioctl handler, we could narrow down the problem to an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with RX filter zeroed; this isn't reproducible for example with ptp4l (from linuxptp) since this tool requests a supported RX filter. It seems NIC FW timestamp mechanism cannot work well with RX_FILTER_NONE - driver's PTP filter init routine skips a register write to the adapter if there's not a supported filter request. This patch addresses the problem of bnx2x ptp thread's everlasting reschedule by retrying the register read 10 times; between the read attempts the thread sleeps for an increasing amount of time starting in 1ms to give FW some time to perform the timestamping. If it still fails after all retries, we bail out in order to prevent an unbound resource consumption from bnx2x. The patch also adds an ethtool statistic for accounting the skipped TX timestamp packets and it reduces the priority of timestamping error messages to prevent log flooding. The code was tested using both linuxptp and chrony. Reported-and-tested-by: Przemyslaw Hausman <przemyslaw.hausman@canonical.com> Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
im->tomb and/or im->sources might not be NULL, but we currently overwrite their values blindly. Using swap() will make sure the following call to kfree_pmc(pmc) will properly free the psf structures. Tested with the C repro provided by syzbot, which basically does : socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0 setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0 exit_group(0) = ? BUG: memory leak unreferenced object 0xffff88811450f140 (size 64): comm "softirq", pid 0, jiffies 4294942448 (age 32.070s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline] [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline] [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline] [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline] [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100 [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484 [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959 [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248 [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618 [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126 [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072 [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline] [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline] [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080 [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Harini Katakam says: ==================== Sub ns increment fixes in Macb PTP The subns increment register fields are not captured correctly in the driver. Fix the same and also increase the subns incr resolution. Sub ns resolution was increased to 24 bits in r1p06f2 version. To my knowledge, this PTP driver, with its current BD time stamp implementation, is only useful to that version or above. So, I have increased the resolution unconditionally. Please let me know if there is any IP versions incompatible with this - there is no register to obtain this information from. Changes from RFC: None ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Harini Katakam authored
The subns increment register has 24 bits as follows: RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0] Fix the same in the driver and increase sub ns resolution to the best capable, 24 bits. This should be the case on all GEM versions that this PTP driver supports. Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Harini Katakam authored
The scaled ppm parameter passed to _adjfine() contains a 16 bit fraction. This just happens to be the same as SUBNSINCR_SIZE now. Hence define this separately. Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiunn Chang authored
Shifting signed 32-bit value by 31 bits is undefined. Changing most significant bit to unsigned. Changes included in v2: - use subsystem specific subject lines - CC required mailing lists Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
netfilter did not expect that skb_dst_force() can cause skb to lose its dst entry. I got a bug report with a skb->dst NULL dereference in netfilter output path. The backtrace contains nf_reinject(), so the dst might have been cleared when skb got queued to userspace. Other users were fixed via if (skb_dst(skb)) { skb_dst_force(skb); if (!skb_dst(skb)) goto handle_err; } But I think its preferable to make the 'dst might be cleared' part of the function explicit. In netfilter case, skb with a null dst is expected when queueing in prerouting hook, so drop skb for the other hooks. v2: v1 of this patch returned true in case skb had no dst entry. Eric said: Say if we have two skb_dst_force() calls for some reason on the same skb, only the first one will return false. This now returns false even when skb had no dst, as per Erics suggestion, so callers might need to check skb_dst() first before skb_dst_force(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Now when sctp_connect() is called with a wrong sa_family, it binds to a port but doesn't set bp->port, then sctp_get_af_specific will return NULL and sctp_connect() returns -EINVAL. Then if sctp_bind() is called to bind to another port, the last port it has bound will leak due to bp->port is NULL by then. sctp_connect() doesn't need to bind ports, as later __sctp_connect will do it if bp->port is NULL. So remove it from sctp_connect(). While at it, remove the unnecessary sockaddr.sa_family len check as it's already done in sctp_inet_connect. Fixes: 644fbdea ("sctp: fix the issue that flags are ignored when using kernel_connect") Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-