1. 27 Mar, 2017 3 commits
    • Liping Zhang's avatar
      netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister · 9c3f3794
      Liping Zhang authored
      If one cpu is doing nf_ct_extend_unregister while another cpu is doing
      __nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
      there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
      NULL, so it's possible that we may access invalid pointer.
      
      But actually, most of the ct extends are built-in, so the problem listed
      above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
      and NF_CT_EXT_SYNPROXY.
      
      For _EXT_NAT, the panic will not happen, since adding the nat extend and
      unregistering the nat extend are located in the same file(nf_nat_core.c),
      this means that after the nat module is removed, we cannot add the nat
      extend too.
      
      For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
      synproxy extend unregister will be done by synproxy_core_exit. So after
      nf_synproxy_core.ko is removed, we may still try to add the synproxy
      extend, then kernel panic may happen.
      
      I know it's very hard to reproduce this issue, but I can play a tricky
      game to make it happen very easily :)
      
      Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
        # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
      Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
              Also note, in the userspace we only add a 20s' delay, then
              reinject the syn packet to the kernel:
        # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
      Step 3. Using "nc 2.2.2.2 1234" to connect the server.
      Step 4. Now remove the nf_synproxy_core.ko quickly:
        # iptables -F FORWARD
        # rmmod ipt_SYNPROXY
        # rmmod nf_synproxy_core
      Step 5. After 20s' delay, the syn packet is reinjected to the kernel.
      
      Now you will see the panic like this:
        kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
        Call Trace:
         ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
         init_conntrack+0x12b/0x600 [nf_conntrack]
         nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
         ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
         nf_reinject+0x104/0x270
         nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
         ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
         ? nla_parse+0xa0/0x100
         nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
         [...]
      
      One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
      introduce nf_conntrack_synproxy.c and only do ct extend register and
      unregister in it, similar to nf_conntrack_timeout.c.
      
      But having such a obscure restriction of nf_ct_extend_unregister is not a
      good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
      to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
      it will be easier if we add new ct extend in the future.
      
      Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
      anymore, remove it too.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9c3f3794
    • Liping Zhang's avatar
      netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table · 83d90219
      Liping Zhang authored
      The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
      nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
      So it's possible that one CPU is walking the nf_ct_helper_hash for
      cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
      at the same time. This is dangrous, and may cause use after free error.
      
      Note, delete operation will flush all cthelpers added via nfnetlink, so
      using rcu to do protect is not easy.
      
      Now introduce a dummy list to record all the cthelpers added via
      nfnetlink, then we can walk the dummy list instead of walking the
      nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
      may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      83d90219
    • Liping Zhang's avatar
      netfilter: invoke synchronize_rcu after set the _hook_ to NULL · 3b7dabf0
      Liping Zhang authored
      Otherwise, another CPU may access the invalid pointer. For example:
          CPU0                CPU1
           -              rcu_read_lock();
           -              pfunc = _hook_;
        _hook_ = NULL;          -
        mod unload              -
           -                 pfunc(); // invalid, panic
           -             rcu_read_unlock();
      
      So we must call synchronize_rcu() to wait the rcu reader to finish.
      
      Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
      by later nf_conntrack_helper_unregister, but I'm inclined to add a
      explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
      on such obscure assumptions is not a good idea.
      
      Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
      so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
      remove it too.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3b7dabf0
  2. 22 Mar, 2017 2 commits
  3. 21 Mar, 2017 1 commit
    • Liping Zhang's avatar
      netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max · ae5c6821
      Liping Zhang authored
      The helper->expect_class_max must be set to the total number of
      expect_policy minus 1, since we will use the statement "if (class >
      helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in
      ctnetlink_alloc_expect.
      
      So for compatibility, set the helper->expect_class_max to the
      NFCTH_POLICY_SET_NUM attr's value minus 1.
      
      Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero.
      1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);";
      2. we cannot set the helper->expect_class_max to a proper value.
      
      So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to
      the userspace.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ae5c6821
  4. 17 Mar, 2017 9 commits
  5. 16 Mar, 2017 3 commits
  6. 15 Mar, 2017 13 commits
    • Eric Dumazet's avatar
      net: properly release sk_frag.page · 22a0e18e
      Eric Dumazet authored
      I mistakenly added the code to release sk->sk_frag in
      sk_common_release() instead of sk_destruct()
      
      TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
      sk_common_release() at close time, thus leaking one (order-3) page.
      
      iSCSI is using such sockets.
      
      Fixes: 5640f768 ("net: use a per task frag allocator")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22a0e18e
    • Lendacky, Thomas's avatar
      amd-xgbe: Fix jumbo MTU processing on newer hardware · 622c36f1
      Lendacky, Thomas authored
      Newer hardware does not provide a cumulative payload length when multiple
      descriptors are needed to handle the data. Once the MTU increases beyond
      the size that can be handled by a single descriptor, the SKB does not get
      built properly by the driver.
      
      The driver will now calculate the size of the data buffers used by the
      hardware.  The first buffer of the first descriptor is for packet headers
      or packet headers and data when the headers can't be split. Subsequent
      descriptors in a multi-descriptor chain will not use the first buffer. The
      second buffer is used by all the descriptors in the chain for payload data.
      Based on whether the driver is processing the first, intermediate, or last
      descriptor it can calculate the buffer usage and build the SKB properly.
      
      Tested and verified on both old and new hardware.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      622c36f1
    • Florian Fainelli's avatar
      net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled · 5371bbf4
      Florian Fainelli authored
      Suspending the PHY would be putting it in a low power state where it
      may no longer allow us to do Wake-on-LAN.
      
      Fixes: cc013fb4 ("net: bcmgenet: correctly suspend and resume PHY device")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5371bbf4
    • Pablo Neira's avatar
      MAINTAINERS: remove MACVLAN and VLAN entries · 88d339e2
      Pablo Neira authored
      macvlan.c file seems to be both in VLAN and MACVLAN DRIVER, so remove
      the MACVLAN DRIVER since this is redundant.
      
      I propose with this patch to remove the VLAN (802.1Q) entry so this just
      falls into the NETWORKING [GENERAL].
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88d339e2
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · e11607aa
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree, a
      rather large batch of fixes targeted to nf_tables, conntrack and bridge
      netfilter. More specifically, they are:
      
      1) Don't track fragmented packets if the socket option IP_NODEFRAG is set.
         From Florian Westphal.
      
      2) SCTP protocol tracker assumes that ICMP error messages contain the
         checksum field, what results in packet drops. From Ying Xue.
      
      3) Fix inconsistent handling of AH traffic from nf_tables.
      
      4) Fix new bitmap set representation with big endian. Fix mismatches in
         nf_tables due to incorrect big endian handling too. Both patches
         from Liping Zhang.
      
      5) Bridge netfilter doesn't honor maximum fragment size field, cap to
         largest fragment seen. From Florian Westphal.
      
      6) Fake conntrack entry needs to be aligned to 8 bytes since the 3 LSB
         bits are now used to store the ctinfo. From Steven Rostedt.
      
      7) Fix element comments with the bitmap set type. Revert the flush
         field in the nft_set_iter structure, not required anymore after
         fixing up element comments.
      
      8) Missing error on invalid conntrack direction from nft_ct, also from
         Liping Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e11607aa
    • Or Gerlitz's avatar
      net/openvswitch: Set the ipv6 source tunnel key address attribute correctly · 3d20f1f7
      Or Gerlitz authored
      When dealing with ipv6 source tunnel key address attribute
      (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel
      dst ip, fix that.
      
      Fixes: 6b26ba3a ('openvswitch: netlink attributes for IPv6 tunneling')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d20f1f7
    • Taku Izumi's avatar
      fjes: Fix wrong netdevice feature flags · fe8daf5f
      Taku Izumi authored
      This patch fixes netdev->features for Extended Socket network device.
      
      Currently Extended Socket network device's netdev->feature claims
      NETIF_F_HW_CSUM, however this is completely wrong. There's no feature
      of checksum offloading.
      That causes invalid TCP/UDP checksum and packet rejection when IP
      forwarding from Extended Socket network device to other network device.
      
      NETIF_F_HW_CSUM should be omitted.
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8daf5f
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 95422dec
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a rather large set of fixes. The bulk are for lpfc correcting
        a lot of issues in the new NVME driver code which just went in in the
        merge window.
      
        The others are:
      
         - fix a hang in the vmware paravirt driver caused by incorrect
           handling of the new MSI vector allocation
      
         - long standing bug in storvsc, which recent block changes turned
           from being a harmless annoyance into a hang
      
         - yet more fallout (in mpt3sas) from the changes to device blocking
      
        The remainder are small fixes and updates"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (34 commits)
        scsi: lpfc: Add shutdown method for kexec
        scsi: storvsc: Workaround for virtual DVD SCSI version
        scsi: lpfc: revise version number to 11.2.0.10
        scsi: lpfc: code cleanups in NVME initiator discovery
        scsi: lpfc: code cleanups in NVME initiator base
        scsi: lpfc: correct rdp diag portnames
        scsi: lpfc: remove dead sli3 nvme code
        scsi: lpfc: correct double print
        scsi: lpfc: Rename LPFC_MAX_EQ_DELAY to LPFC_MAX_EQ_DELAY_EQID_CNT
        scsi: lpfc: Rework lpfc Kconfig for NVME options
        scsi: lpfc: add transport eh_timed_out reference
        scsi: lpfc: Fix eh_deadline setting for sli3 adapters.
        scsi: lpfc: add NVME exchange aborts
        scsi: lpfc: Fix nvme allocation bug on failed nvme_fc_register_localport
        scsi: lpfc: Fix IO submission if WQ is full
        scsi: lpfc: Fix NVME CMD IU byte swapped word 1 problem
        scsi: lpfc: Fix RCTL value on NVME LS request and response
        scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
        scsi: lpfc: fix missing spin_unlock on sql_list_lock
        scsi: lpfc: don't dereference dma_buf->iocbq before null check
        ...
      95422dec
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · aabcf5fc
      Linus Torvalds authored
      Pull gfs2 fix from Bob Peterson:
       "This is an emergency patch for 4.11-rc3
      
        The GFS2 developers uncovered a really nasty problem that can lead to
        random corruption and kernel panic, much like the last one. Andreas
        Gruenbacher wrote a simple one-line patch to fix the problem."
      
      * tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Avoid alignment hole in struct lm_lockname
      aabcf5fc
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · defc7d75
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
      
       - self-test failure of crc32c on powerpc
      
       - regressions of ecb(aes) when used with xts/lrw in s5p-sss
      
       - a number of bugs in the omap RNG driver
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: s5p-sss - Fix spinlock recursion on LRW(AES)
        hwrng: omap - Do not access INTMASK_REG on EIP76
        hwrng: omap - use devm_clk_get() instead of of_clk_get()
        hwrng: omap - write registers after enabling the clock
        crypto: s5p-sss - Fix completing crypto request in IRQ handler
        crypto: powerpc - Fix initialisation of crc32c context
      defc7d75
    • Liping Zhang's avatar
      netfilter: nft_ct: do cleanup work when NFTA_CT_DIRECTION is invalid · 4494dbc6
      Liping Zhang authored
      We should jump to invoke __nft_ct_set_destroy() instead of just
      return error.
      
      Fixes: edee4f1e ("netfilter: nft_ct: add zone id set support")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4494dbc6
    • Andreas Gruenbacher's avatar
      gfs2: Avoid alignment hole in struct lm_lockname · 28ea06c4
      Andreas Gruenbacher authored
      Commit 88ffbf3e switches to using rhashtables for glocks, hashing over
      the entire struct lm_lockname instead of its individual fields.  On some
      architectures, struct lm_lockname contains a hole of uninitialized
      memory due to alignment rules, which now leads to incorrect hash values.
      Get rid of that hole.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      CC: <stable@vger.kernel.org> #v4.3+
      28ea06c4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ae50dfd6
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver,
          from Steffen Klassert.
      
       2) Fix crashes when user tries to get_next_key on an LPM bpf map, from
          Alexei Starovoitov.
      
       3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from
          Michal Schmidt.
      
       4) We can get a divide by zero when TCP socket are morphed into
          listening state, fix from Eric Dumazet.
      
       5) Fix socket refcounting bugs in skb_complete_wifi_ack() and
          skb_complete_tx_timestamp(). From Eric Dumazet.
      
       6) Use after free in dccp_feat_activate_values(), also from Eric
          Dumazet.
      
       7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from
          Jarod Wilson.
      
       8) Fix use after free in vrf_xmit(), from David Ahern.
      
       9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from
          Alexey Kodanev.
      
      10) Properly check napi_complete_done() return value in order to decide
          whether to re-enable IRQs or not in amd-xgbe driver, from Thomas
          Lendacky.
      
      11) Fix double free of hwmon device in marvell phy driver, from Andrew
          Lunn.
      
      12) Don't crash on malformed netlink attributes in act_connmark, from
          Etienne Noss.
      
      13) Don't remove routes with a higher metric in ipv6 ECMP route replace,
          from Sabrina Dubroca.
      
      14) Don't write into a cloned SKB in ipv6 fragmentation handling, from
          Florian Westphal.
      
      15) Fix routing redirect races in dccp and tcp, basically the ICMP
          handler can't modify the socket's cached route in it's locked by the
          user at this moment. From Jon Maxwell.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits)
        qed: Enable iSCSI Out-of-Order
        qed: Correct out-of-bound access in OOO history
        qed: Fix interrupt flags on Rx LL2
        qed: Free previous connections when releasing iSCSI
        qed: Fix mapping leak on LL2 rx flow
        qed: Prevent creation of too-big u32-chains
        qed: Align CIDs according to DORQ requirement
        mlxsw: reg: Fix SPVMLR max record count
        mlxsw: reg: Fix SPVM max record count
        net: Resend IGMP memberships upon peer notification.
        dccp: fix memory leak during tear-down of unsuccessful connection request
        tun: fix premature POLLOUT notification on tun devices
        dccp/tcp: fix routing redirect race
        ucc/hdlc: fix two little issue
        vxlan: fix ovs support
        net: use net->count to check whether a netns is alive or not
        bridge: drop netfilter fake rtable unconditionally
        ipv6: avoid write to a possibly cloned skb
        net: wimax/i2400m: fix NULL-deref at probe
        isdn/gigaset: fix NULL-deref at probe
        ...
      ae50dfd6
  7. 14 Mar, 2017 9 commits