1. 30 Jan, 2019 12 commits
    • Toshiaki Makita's avatar
      virtio_net: Don't call free_old_xmit_skbs for xdp_frames · 534da5e8
      Toshiaki Makita authored
      When napi_tx is enabled, virtnet_poll_cleantx() called
      free_old_xmit_skbs() even for xdp send queue.
      This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled
      device tx bytes counters because skb->len is meaningless value, and even
      triggered oops due to general protection fault on freeing them.
      
      Since xdp send queues do not aquire locks, old xdp_frames should be
      freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for
      xdp send queues.
      
      Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI
      handler is called even without calling start_xmit() because cb for tx is
      by default enabled. Once the handler is called, it enabled the cb again,
      and then the handler would be called again. We don't need this handler
      for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().
      
      Also, we need to disable tx NAPI when disabling XDP, so
      virtnet_poll_tx() can safely access curr_queue_pairs and
      xdp_queue_pairs, which are not atomically updated while disabling XDP.
      
      Fixes: b92f1e67 ("virtio-net: transmit napi")
      Fixes: 7b0411ef ("virtio-net: clean tx descriptors from rx napi")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      534da5e8
    • Toshiaki Makita's avatar
      virtio_net: Don't enable NAPI when interface is down · 8be4d9a4
      Toshiaki Makita authored
      Commit 4e09ff53 ("virtio-net: disable NAPI only when enabled during
      XDP set") tried to fix inappropriate NAPI enabling/disabling when
      !netif_running(), but was not complete.
      
      On error path virtio_net could enable NAPI even when !netif_running().
      This can cause enabling NAPI twice on virtnet_open(), which would
      trigger BUG_ON() in napi_enable().
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8be4d9a4
    • David S. Miller's avatar
      Merge branch 'erspan-always-reports-output-key-to-userspace' · 41ef81be
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      erspan: always reports output key to userspace
      
      Erspan protocol relies on output key to set session id header field.
      However TUNNEL_KEY bit is cleared in order to not add key field to
      the external GRE header and so the configured o_key is not reported
      to userspace.
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter dumping
      device info
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41ef81be
    • Lorenzo Bianconi's avatar
      net: ip6_gre: always reports o_key to userspace · c706863b
      Lorenzo Bianconi authored
      As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
          key 1 seq erspan_ver 1
      $ip link set ip6erspan1 up
      ip -d link sh ip6erspan1
      
      ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
          link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
          ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ip6gre_fill_info
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c706863b
    • Lorenzo Bianconi's avatar
      net: ip_gre: always reports o_key to userspace · feaf5c79
      Lorenzo Bianconi authored
      Erspan protocol (version 1 and 2) relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      erspan_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
          key 1 seq erspan_ver 1
      $ip link set erspan1 up
      $ip -d link sh erspan1
      
      erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
        link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
        erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ipgre_fill_info
      
      Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      feaf5c79
    • Mathias Thore's avatar
      ucc_geth: Reset BQL queue when stopping device · e15aa3b2
      Mathias Thore authored
      After a timeout event caused by for example a broadcast storm, when
      the MAC and PHY are reset, the BQL TX queue needs to be reset as
      well. Otherwise, the device will exhibit severe performance issues
      even after the storm has ended.
      Co-authored-by: default avatarDavid Gounaris <david.gounaris@infinera.com>
      Signed-off-by: default avatarMathias Thore <mathias.thore@infinera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e15aa3b2
    • David S. Miller's avatar
      Merge branch 'net-various-compat-ioctl-fixes' · 794827f3
      David S. Miller authored
      Johannes Berg says:
      
      ====================
      various compat ioctl fixes
      
      Back a long time ago, I already fixed a few of these by passing
      the size of the struct ifreq to do_sock_ioctl(). However, Robert
      found more cases, and now it won't be as simple because we'd have
      to pass that down all the way to e.g. bond_do_ioctl() which isn't
      really feasible.
      
      Therefore, restore the old code.
      
      While looking at why SIOCGIFNAME was broken, I realized that Al
      had removed that case - which had been handled in an explicit
      separate function - as well, and looking through his work at the
      time I saw that bond ioctls were also affected by the erroneous
      removal.
      
      I've restored SIOCGIFNAME and bond ioctls by going through the
      (now renamed) dev_ifsioc() instead of reintroducing their own
      helper functions, which I hope is correct but have only tested
      with SIOCGIFNAME.
      ====================
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      794827f3
    • Johannes Berg's avatar
      net: socket: make bond ioctls go through compat_ifreq_ioctl() · 98406133
      Johannes Berg authored
      Same story as before, these use struct ifreq and thus need
      to be read with the shorter version to not cause faults.
      
      Cc: stable@vger.kernel.org
      Fixes: f92d4fc9 ("kill bond_ioctl()")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98406133
    • Johannes Berg's avatar
      net: socket: fix SIOCGIFNAME in compat · c6c9fee3
      Johannes Berg authored
      As reported by Robert O'Callahan in
      https://bugzilla.kernel.org/show_bug.cgi?id=202273
      reverting the previous changes in this area broke
      the SIOCGIFNAME ioctl in compat again (I'd previously
      fixed it after his previous report of breakage in
      https://bugzilla.kernel.org/show_bug.cgi?id=199469).
      
      This is obviously because I fixed SIOCGIFNAME more or
      less by accident.
      
      Fix it explicitly now by making it pass through the
      restored compat translation code.
      
      Cc: stable@vger.kernel.org
      Fixes: 4cf808e7 ("kill dev_ifname32()")
      Reported-by: default avatarRobert O'Callahan <robert@ocallahan.org>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6c9fee3
    • Johannes Berg's avatar
      Revert "kill dev_ifsioc()" · 37ac39bd
      Johannes Berg authored
      This reverts commit bf440573 ("kill dev_ifsioc()").
      
      This wasn't really unused as implied by the original commit,
      it still handles the copy to/from user differently, and the
      commit thus caused issues such as
        https://bugzilla.kernel.org/show_bug.cgi?id=199469
      and
        https://bugzilla.kernel.org/show_bug.cgi?id=202273
      
      However, deviating from a strict revert, rename dev_ifsioc()
      to compat_ifreq_ioctl() to be clearer as to its purpose and
      add a comment.
      
      Cc: stable@vger.kernel.org
      Fixes: bf440573 ("kill dev_ifsioc()")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37ac39bd
    • Johannes Berg's avatar
      Revert "socket: fix struct ifreq size in compat ioctl" · 63ff03ab
      Johannes Berg authored
      This reverts commit 1cebf8f1 ("socket: fix struct ifreq
      size in compat ioctl"), it's a bugfix for another commit that
      I'll revert next.
      
      This is not a 'perfect' revert, I'm keeping some coding style
      intact rather than revert to the state with indentation errors.
      
      Cc: stable@vger.kernel.org
      Fixes: 1cebf8f1 ("socket: fix struct ifreq size in compat ioctl")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63ff03ab
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 62967898
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Need to save away the IV across tls async operations, from Dave
          Watson.
      
       2) Upon successful packet processing, we should liberate the SKB with
          dev_consume_skb{_irq}(). From Yang Wei.
      
       3) Only apply RX hang workaround on effected macb chips, from Harini
          Katakam.
      
       4) Dummy netdev need a proper namespace assigned to them, from Josh
          Elsasser.
      
       5) Some paths of nft_compat run lockless now, and thus we need to use a
          proper refcnt_t. From Florian Westphal.
      
       6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.
      
       7) netrom does not refcount sockets properly wrt. timers, fix that by
          using the sock timer API. From Cong Wang.
      
       8) Fix locking of inexact inserts of xfrm policies, from Florian
          Westphal.
      
       9) Missing xfrm hash generation bump, also from Florian.
      
      10) Missing of_node_put() in hns driver, from Yonglong Liu.
      
      11) Fix DN_IFREQ_SIZE, from Johannes Berg.
      
      12) ip6mr notifier is invoked during traversal of wrong table, from Nir
          Dotan.
      
      13) TX promisc settings not performed correctly in qed, from Manish
          Chopra.
      
      14) Fix OOB access in vhost, from Jason Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        MAINTAINERS: Add entry for XDP (eXpress Data Path)
        net: set default network namespace in init_dummy_netdev()
        net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
        net: caif: call dev_consume_skb_any when skb xmit done
        net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: macb: Apply RXUBR workaround only to versions with errata
        net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
        net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
        net: tls: Fix deadlock in free_resources tx
        net: tls: Save iv in tls_rec for async crypto requests
        vhost: fix OOB in get_rx_bufs()
        qed: Fix stack out of bounds bug
        qed: Fix system crash in ll2 xmit
        qed: Fix VF probe failure while FLR
        qed: Fix LACP pdu drops for VFs
        qed: Fix bug in tx promiscuous mode settings
        net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        netfilter: ipt_CLUSTERIP: fix warning unused variable cn
        ...
      62967898
  2. 29 Jan, 2019 13 commits
  3. 28 Jan, 2019 15 commits
    • David S. Miller's avatar
      Merge branch 'qed-Bug-fixes' · bfe2599d
      David S. Miller authored
      Manish Chopra says:
      
      ====================
      qed: Bug fixes
      
      This series have SR-IOV and some general fixes.
      Please consider applying it to "net"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfe2599d
    • Manish Chopra's avatar
      qed: Fix stack out of bounds bug · ffb057f9
      Manish Chopra authored
      KASAN reported following bug in qed_init_qm_get_idx_from_flags
      due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".
      
      [  196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
      [  196.624712] Read of size 8 at addr ffff809b00bc7360 by task kworker/0:9/1712
      [  196.624714]
      [  196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
      [  196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
      [  196.624733] Workqueue: events work_for_cpu_fn
      [  196.624738] Call trace:
      [  196.624742]  dump_backtrace+0x0/0x2f8
      [  196.624745]  show_stack+0x24/0x30
      [  196.624749]  dump_stack+0xe0/0x11c
      [  196.624755]  print_address_description+0x68/0x260
      [  196.624759]  kasan_report+0x178/0x340
      [  196.624762]  __asan_report_load_n_noabort+0x38/0x48
      [  196.624786]  qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
      [  196.624808]  qed_init_qm_info+0xec0/0x2200 [qed]
      [  196.624830]  qed_resc_alloc+0x284/0x7e8 [qed]
      [  196.624853]  qed_slowpath_start+0x6cc/0x1ae8 [qed]
      [  196.624864]  __qede_probe.isra.10+0x1cc/0x12c0 [qede]
      [  196.624874]  qede_probe+0x78/0xf0 [qede]
      [  196.624879]  local_pci_probe+0xc4/0x180
      [  196.624882]  work_for_cpu_fn+0x54/0x98
      [  196.624885]  process_one_work+0x758/0x1900
      [  196.624888]  worker_thread+0x4e0/0xd18
      [  196.624892]  kthread+0x2c8/0x350
      [  196.624897]  ret_from_fork+0x10/0x18
      [  196.624899]
      [  196.624902] Allocated by task 2:
      [  196.624906]  kasan_kmalloc.part.1+0x40/0x108
      [  196.624909]  kasan_kmalloc+0xb4/0xc8
      [  196.624913]  kasan_slab_alloc+0x14/0x20
      [  196.624916]  kmem_cache_alloc_node+0x1dc/0x480
      [  196.624921]  copy_process.isra.1.part.2+0x1d8/0x4a98
      [  196.624924]  _do_fork+0x150/0xfa0
      [  196.624926]  kernel_thread+0x48/0x58
      [  196.624930]  kthreadd+0x3a4/0x5a0
      [  196.624932]  ret_from_fork+0x10/0x18
      [  196.624934]
      [  196.624937] Freed by task 0:
      [  196.624938] (stack is not available)
      [  196.624940]
      [  196.624943] The buggy address belongs to the object at ffff809b00bc0000
      [  196.624943]  which belongs to the cache thread_stack of size 32768
      [  196.624946] The buggy address is located 29536 bytes inside of
      [  196.624946]  32768-byte region [ffff809b00bc0000, ffff809b00bc8000)
      [  196.624948] The buggy address belongs to the page:
      [  196.624952] page:ffff7fe026c02e00 count:1 mapcount:0 mapping:ffff809b4001c000 index:0x0 compound_mapcount: 0
      [  196.624960] flags: 0xfffff8000008100(slab|head)
      [  196.624967] raw: 0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
      [  196.624970] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
      [  196.624973] page dumped because: kasan: bad access detected
      [  196.624974]
      [  196.624976] Memory state around the buggy address:
      [  196.624980]  ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624983]  ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624985] >ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
      [  196.624988]                                                        ^
      [  196.624990]  ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624993]  ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624995] ==================================================================
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffb057f9
    • Manish Chopra's avatar
      qed: Fix system crash in ll2 xmit · 7c81626a
      Manish Chopra authored
      Cache number of fragments in the skb locally as in case
      of linear skb (with zero fragments), tx completion
      (or freeing of skb) may happen before driver tries
      to get number of frgaments from the skb which could
      lead to stale access to an already freed skb.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c81626a
    • Manish Chopra's avatar
      qed: Fix VF probe failure while FLR · 327852ec
      Manish Chopra authored
      VFs may hit VF-PF channel timeout while probing, as in some
      cases it was observed that VF FLR and VF "acquire" message
      transaction (i.e first message from VF to PF in VF's probe flow)
      could occur simultaneously which could lead VF to fail sending
      "acquire" message to PF as VF is marked disabled from HW perspective
      due to FLR, which will result into channel timeout and VF probe failure.
      
      In such cases, try retrying VF "acquire" message so that in later
      attempts it could be successful to pass message to PF after the VF
      FLR is completed and can be probed successfully.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      327852ec
    • Manish Chopra's avatar
      qed: Fix LACP pdu drops for VFs · ff929696
      Manish Chopra authored
      VF is always configured to drop control frames
      (with reserved mac addresses) but to work LACP
      on the VFs, it would require LACP control frames
      to be forwarded or transmitted successfully.
      
      This patch fixes this in such a way that trusted VFs
      (marked through ndo_set_vf_trust) would be allowed to
      pass the control frames such as LACP pdus.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff929696
    • Manish Chopra's avatar
      qed: Fix bug in tx promiscuous mode settings · 9e71a15d
      Manish Chopra authored
      When running tx switched traffic between VNICs
      created via a bridge(to which VFs are added),
      adapter drops the unicast packets in tx flow due to
      VNIC's ucast mac being unknown to it. But VF interfaces
      being in promiscuous mode should have caused adapter
      to accept all the unknown ucast packets. Later, it
      was found that driver doesn't really configure tx
      promiscuous mode settings to accept all unknown unicast macs.
      
      This patch fixes tx promiscuous mode settings to accept all
      unknown/unmatched unicast macs and works out the scenario.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e71a15d
    • Yang Wei's avatar
      net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · ca899324
      Yang Wei authored
      dev_consume_skb_irq() should be called in i596_interrupt() when skb
      xmit done. It makes drop profiles(dropwatch, perf) more friendly.
      Signed-off-by: default avatarYang Wei <albin_yang@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca899324
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ff44a837
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree:
      
      1) The nftnl mutex is now per-netns, therefore use reference counter
         for matches and targets to deal with concurrent updates from netns.
         Moreover, place extensions in a pernet list. Patches from Florian Westphal.
      
      2) Bail out with EINVAL in case of negative timeouts via setsockopt()
         through ip_vs_set_timeout(), from ZhangXiaoxu.
      
      3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
         from Florian.
      
      4) Reset TCP option header parser in case of fingerprint mismatch,
         otherwise follow up overlapping fingerprint definitions including
         TCP options do not work, from Fernando Fernandez Mancera.
      
      5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
         From Anders Roxell.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff44a837
    • Michal Hocko's avatar
      Revert "mm, memory_hotplug: initialize struct pages for the full memory section" · 4aa9fc2a
      Michal Hocko authored
      This reverts commit 2830bf6f.
      
      The underlying assumption that one sparse section belongs into a single
      numa node doesn't hold really. Robert Shteynfeld has reported a boot
      failure. The boot log was not captured but his memory layout is as
      follows:
      
        Early memory node ranges
          node   1: [mem 0x0000000000001000-0x0000000000090fff]
          node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
          node   1: [mem 0x0000000100000000-0x0000001423ffffff]
          node   0: [mem 0x0000001424000000-0x0000002023ffffff]
      
      This means that node0 starts in the middle of a memory section which is
      also in node1.  memmap_init_zone tries to initialize padding of a
      section even when it is outside of the given pfn range because there are
      code paths (e.g.  memory hotplug) which assume that the full worth of
      memory section is always initialized.
      
      In this particular case, though, such a range is already intialized and
      most likely already managed by the page allocator.  Scribbling over
      those pages corrupts the internal state and likely blows up when any of
      those pages gets used.
      Reported-by: default avatarRobert Shteynfeld <robert.shteynfeld@gmail.com>
      Fixes: 2830bf6f ("mm, memory_hotplug: initialize struct pages for the full memory section")
      Cc: stable@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4aa9fc2a
    • Anders Roxell's avatar
      netfilter: ipt_CLUSTERIP: fix warning unused variable cn · 206b8cc5
      Anders Roxell authored
      When CONFIG_PROC_FS isn't set the variable cn isn't used.
      
      net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
      net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
        struct clusterip_net *cn = clusterip_pernet(net);
                              ^~
      
      Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".
      
      Fixes: b12f7bad ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      206b8cc5
    • Fernando Fernandez Mancera's avatar
      netfilter: nfnetlink_osf: add missing fmatch check · 1a6a0951
      Fernando Fernandez Mancera authored
      When we check the tcp options of a packet and it doesn't match the current
      fingerprint, the tcp packet option pointer must be restored to its initial
      value in order to do the proper tcp options check for the next fingerprint.
      
      Here we can see an example.
      Assumming the following fingerprint base with two lines:
      
      S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
      S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1
      
      Where TCP options are the last field in the OS signature, all of them overlap
      except by the last one, ie. 'W6' versus 'W7'.
      
      In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
      TCP options pointer is updated after checking for the TCP options in the first
      line.
      
      Therefore, reset pointer back to where it should be.
      
      Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
      Signed-off-by: default avatarFernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1a6a0951
    • Florian Westphal's avatar
      netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present · 2035f3ff
      Florian Westphal authored
      Unlike ip(6)tables ebtables only counts user-defined chains.
      
      The effect is that a 32bit ebtables binary on a 64bit kernel can do
      'ebtables -N FOO' only after adding at least one rule, else the request
      fails with -EINVAL.
      
      This is a similar fix as done in
      3f1e53ab ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").
      
      Fixes: 7d7d7e02 ("netfilter: compat: reject huge allocation requests")
      Reported-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2035f3ff
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Fix serdes irq setup going recursive · 6fb6e637
      Andrew Lunn authored
      Duec to a typo, mv88e6390_serdes_irq_setup() calls itself, rather than
      mv88e6390x_serdes_irq_setup(). It then blows the stack, and shortly
      after the machine blows up.
      
      Fixes: 2defda1f ("net: dsa: mv88e6xxx: Add support for SERDES on ports 2-8 for 6390X")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fb6e637
    • Nir Dotan's avatar
      ip6mr: Fix notifiers call on mroute_clean_tables() · 146820cc
      Nir Dotan authored
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      146820cc
    • Johannes Berg's avatar
      decnet: fix DN_IFREQ_SIZE · 50c29366
      Johannes Berg authored
      Digging through the ioctls with Al because of the previous
      patches, we found that on 64-bit decnet's dn_dev_ioctl()
      is wrong, because struct ifreq::ifr_ifru is actually 24
      bytes (not 16 as expected from struct sockaddr) due to the
      ifru_map and ifru_settings members.
      
      Clearly, decnet expects the ioctl to be called with a struct
      like
        struct ifreq_dn {
          char ifr_name[IFNAMSIZ];
          struct sockaddr_dn ifr_addr;
        };
      
      since it does
        struct ifreq *ifr = ...;
        struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;
      
      This means that DN_IFREQ_SIZE is too big for what it wants on
      64-bit, as it is
        sizeof(struct ifreq) - sizeof(struct sockaddr) +
        sizeof(struct sockaddr_dn)
      
      This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
      but that isn't true.
      
      Fix this to use offsetof(struct ifreq, ifr_ifru).
      
      This indeed doesn't really matter much - the result is that we
      copy in/out 8 bytes more than we should on 64-bit platforms. In
      case the "struct ifreq_dn" lands just on the end of a page though
      it might lead to faults.
      
      As far as I can tell, it has been like this forever, so it seems
      very likely that nobody cares.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c29366