1. 02 Oct, 2019 5 commits
    • Dongli Zhang's avatar
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a761129e
      Dongli Zhang authored
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b36850 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a761129e
    • David Ahern's avatar
      ipv6: Handle race in addrconf_dad_work · a3ce2a21
      David Ahern authored
      Rajendra reported a kernel panic when a link was taken down:
      
      [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
      <snip>
      
      [ 6870.570501] Call Trace:
      [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
      [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
      [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
      [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
      [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
      [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
      [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
      [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
      [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
      [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
      [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      This scenario does not occur when the ipv6 address is not kept
      (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
      state of the ifp to DEAD. Handle when the addresses are kept by checking
      IF_READY which is reset by addrconf_ifdown.
      
      The 'dead' flag for an inet6_addr is set only under rtnl, in
      addrconf_ifdown and it means the device is getting removed (or IPv6 is
      disabled). The interesting cases for changing the idev flag are
      addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
      (reset the flag). The former does not have the idev lock - only rtnl;
      the latter has both. Based on that the existing dead + IF_READY check
      can be moved to right after the rtnl_lock in addrconf_dad_work.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ce2a21
    • Eric Dumazet's avatar
      tcp: adjust rto_base in retransmits_timed_out() · 3256a2d6
      Eric Dumazet authored
      The cited commit exposed an old retransmits_timed_out() bug
      which assumed it could call tcp_model_timeout() with
      TCP_RTO_MIN as rto_base for all states.
      
      But flows in SYN_SENT or SYN_RECV state uses a different
      RTO base (1 sec instead of 200 ms, unless BPF choses
      another value)
      
      This caused a reduction of SYN retransmits from 6 to 4 with
      the default /proc/sys/net/ipv4/tcp_syn_retries value.
      
      Fixes: a41e8a88 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Marek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3256a2d6
    • Dexuan Cui's avatar
      vsock: Fix a lockdep warning in __vsock_release() · 0d9138ff
      Dexuan Cui authored
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d9138ff
    • Johan Hovold's avatar
      hso: fix NULL-deref on tty open · 8353da9f
      Johan Hovold authored
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8353da9f
  2. 01 Oct, 2019 31 commits
  3. 30 Sep, 2019 3 commits
    • Haishuang Yan's avatar
      erspan: remove the incorrect mtu limit for erspan · 0e141f75
      Haishuang Yan authored
      erspan driver calls ether_setup(), after commit 61e84623
      ("net: centralize net_device min/max MTU checking"), the range
      of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
      
      It causes the dev mtu of the erspan device to not be greater
      than 1500, this limit value is not correct for ipgre tap device.
      
      Tested:
      Before patch:
      # ip link set erspan0 mtu 1600
      Error: mtu greater than device maximum.
      After patch:
      # ip link set erspan0 mtu 1600
      # ip -d link show erspan0
      21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e141f75
    • Eric Dumazet's avatar
      sch_cbq: validate TCA_CBQ_WRROPT to avoid crash · e9789c7c
      Eric Dumazet authored
      syzbot reported a crash in cbq_normalize_quanta() caused
      by an out of range cl->priority.
      
      iproute2 enforces this check, but malicious users do not.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      Modules linked in:
      CPU: 1 PID: 26447 Comm: syz-executor.1 Not tainted 5.3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:cbq_normalize_quanta.part.0+0x1fd/0x430 net/sched/sch_cbq.c:902
      RSP: 0018:ffff8801a5c333b0 EFLAGS: 00010206
      RAX: 0000000020000003 RBX: 00000000fffffff8 RCX: ffffc9000712f000
      RDX: 00000000000043bf RSI: ffffffff83be8962 RDI: 0000000100000018
      RBP: ffff8801a5c33420 R08: 000000000000003a R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000002ef
      R13: ffff88018da95188 R14: dffffc0000000000 R15: 0000000000000015
      FS:  00007f37d26b1700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c7cec CR3: 00000001bcd0a006 CR4: 00000000001626f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffff83be9d57>] cbq_normalize_quanta include/net/pkt_sched.h:27 [inline]
       [<ffffffff83be9d57>] cbq_addprio net/sched/sch_cbq.c:1097 [inline]
       [<ffffffff83be9d57>] cbq_set_wrr+0x2d7/0x450 net/sched/sch_cbq.c:1115
       [<ffffffff83bee8a7>] cbq_change_class+0x987/0x225b net/sched/sch_cbq.c:1537
       [<ffffffff83b96985>] tc_ctl_tclass+0x555/0xcd0 net/sched/sch_api.c:2329
       [<ffffffff83a84655>] rtnetlink_rcv_msg+0x485/0xc10 net/core/rtnetlink.c:5248
       [<ffffffff83cadf0a>] netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2510
       [<ffffffff83a7db6d>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5266
       [<ffffffff83cac2c6>] netlink_unicast_kernel net/netlink/af_netlink.c:1324 [inline]
       [<ffffffff83cac2c6>] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1350
       [<ffffffff83cacd4a>] netlink_sendmsg+0x89a/0xd50 net/netlink/af_netlink.c:1939
       [<ffffffff8399d46e>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff8399d46e>] sock_sendmsg+0x12e/0x170 net/socket.c:684
       [<ffffffff8399f1fd>] ___sys_sendmsg+0x81d/0x960 net/socket.c:2359
       [<ffffffff839a2d05>] __sys_sendmsg+0x105/0x1d0 net/socket.c:2397
       [<ffffffff839a2df9>] SYSC_sendmsg net/socket.c:2406 [inline]
       [<ffffffff839a2df9>] SyS_sendmsg+0x29/0x30 net/socket.c:2404
       [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
       [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9789c7c
    • Michal Vokáč's avatar
      net: dsa: qca8k: Use up to 7 ports for all operations · 7ae6d93c
      Michal Vokáč authored
      The QCA8K family supports up to 7 ports. So use the existing
      QCA8K_NUM_PORTS define to allocate the switch structure and limit all
      operations with the switch ports.
      
      This was not an issue until commit 0394a63a ("net: dsa: enable and
      disable all ports") disabled all unused ports. Since the unused ports 7-11
      are outside of the correct register range on this switch some registers
      were rewritten with invalid content.
      
      Fixes: 6b93fb46 ("net-next: dsa: add new driver for qca8xxx family")
      Fixes: a0c02161 ("net: dsa: variable number of ports")
      Fixes: 0394a63a ("net: dsa: enable and disable all ports")
      Signed-off-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ae6d93c
  4. 29 Sep, 2019 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 02dc96ef
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Sanity check URB networking device parameters to avoid divide by
          zero, from Oliver Neukum.
      
       2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
          don't work properly. Longer term this needs a better fix tho. From
          Vijay Khemka.
      
       3) Small fixes to selftests (use ping when ping6 is not present, etc.)
          from David Ahern.
      
       4) Bring back rt_uses_gateway member of struct rtable, it's semantics
          were not well understood and trying to remove it broke things. From
          David Ahern.
      
       5) Move usbnet snaity checking, ignore endpoints with invalid
          wMaxPacketSize. From Bjørn Mork.
      
       6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
      
       7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
          Alex Vesker, and Yevgeny Kliteynik
      
       8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
      
       9) Fix crash when removing sch_cbs entry while offloading is enabled,
          from Vinicius Costa Gomes.
      
      10) Signedness bug fixes, generally in looking at the result given by
          of_get_phy_mode() and friends. From Dan Crapenter.
      
      11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
      
      12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
      
      13) Fix quantization code in tcp_bbr, from Kevin Yang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
        net: tap: clean up an indentation issue
        nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
        tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
        sk_buff: drop all skb extensions on free and skb scrubbing
        tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
        mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
        Documentation: Clarify trap's description
        mlxsw: spectrum: Clear VLAN filters during port initialization
        net: ena: clean up indentation issue
        NFC: st95hf: clean up indentation issue
        net: phy: micrel: add Asym Pause workaround for KSZ9021
        net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
        ptp: correctly disable flags on old ioctls
        lib: dimlib: fix help text typos
        net: dsa: microchip: Always set regmap stride to 1
        nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
        nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
        net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
        vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
        net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
        ...
      02dc96ef