1. 25 Apr, 2017 2 commits
    • Liping Zhang's avatar
      netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded · 277a2928
      Liping Zhang authored
      Currently, after adding the following nft rules:
        # nft add set x target1 { type ipv4_addr \; flags timeout \;}
        # nft add rule x y set add ip daddr timeout 1d @target1 counter
      
      the counters will always be zero despite of the elements are added
      to the dynamic set "target1" or not, as we will break the nft expr
      traversal unconditionally:
        # nft list ruleset
        ...
        set target1 {
            ...
            elements = { 8.8.8.8 expires 23h59m53s}
        }
        chain output {
            ...
            set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
                                                                 ^       ^
            ...
        }
      
      Since we add the elements to the set successfully, we should continue
      to the next expression.
      
      Additionally, if elements are added to "flow table" successfully, we
      will _always_ continue to the next expr, even if the operation is
      _OP_ADD. So it's better to keep them to be consistent.
      
      Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
      Reported-by: default avatarRobert White <rwhite@pobox.com>
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      277a2928
    • Linus Lüssing's avatar
      bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port · cf3cb246
      Linus Lüssing authored
      When trying to redirect bridged frames to the bridge device itself or
      a bridge port (brouting) via the dnat target then this currently fails:
      
      The ethernet destination of the frame is dnat'ed to the MAC address of
      the bridge device or port just fine. However, the IP code drops it in
      the beginning of ip_input.c/ip_rcv() as the dnat target left
      the skb->pkt_type as PACKET_OTHERHOST.
      
      Fixing this by resetting skb->pkt_type to an appropriate type after
      dnat'ing.
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cf3cb246
  2. 24 Apr, 2017 9 commits
    • Peter Tirsek's avatar
      netfilter: xt_socket: Fix broken IPv6 handling · 6bd3d192
      Peter Tirsek authored
      Commit 834184b1 ("netfilter: defrag: only register defrag
      functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro
      which was removed earlier in commit 8db4c5be ("netfilter: move
      socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro
      never being defined, the xt_socket match emits an "Unknown family 10"
      warning when used with IPv6:
      
      WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket]
      Unknown family 10
      Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...]
      CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1
      Hardware name: [...]
      Call Trace:
      ? __warn+0xe7/0x100
      ? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
      ? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
      ? warn_slowpath_fmt+0x39/0x40
      ? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
      ? socket_mt_v2_check+0x12/0x40 [xt_socket]
      ? xt_check_match+0x6b/0x1a0 [x_tables]
      ? xt_find_match+0x93/0xd0 [x_tables]
      ? xt_request_find_match+0x20/0x80 [x_tables]
      ? translate_table+0x48e/0x870 [ip6_tables]
      ? translate_table+0x577/0x870 [ip6_tables]
      ? walk_component+0x3a/0x200
      ? kmalloc_order+0x1d/0x50
      ? do_ip6t_set_ctl+0x181/0x490 [ip6_tables]
      ? filename_lookup+0xa5/0x120
      ? nf_setsockopt+0x3a/0x60
      ? ipv6_setsockopt+0xb0/0xc0
      ? sock_common_setsockopt+0x23/0x30
      ? SyS_socketcall+0x41d/0x630
      ? vfs_read+0xfa/0x120
      ? do_fast_syscall_32+0x7a/0x110
      ? entry_SYSENTER_32+0x47/0x71
      
      This patch brings the conditional back in line with how the rest of the
      file handles IPv6.
      
      Fixes: 834184b1 ("netfilter: defrag: only register defrag functionality if needed")
      Signed-off-by: default avatarPeter Tirsek <peter@tirsek.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6bd3d192
    • Liping Zhang's avatar
      netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj · 64f3967c
      Liping Zhang authored
      We should acquire the ct->lock before accessing or modifying the
      nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same
      time during its packet proccessing.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      64f3967c
    • Liping Zhang's avatar
      netfilter: ctnetlink: make it safer when updating ct->status · 53b56da8
      Liping Zhang authored
      After converting to use rcu for conntrack hash, one CPU may update
      the ct->status via ctnetlink, while another CPU may process the
      packets and update the ct->status.
      
      So the non-atomic operation "ct->status |= status;" via ctnetlink
      becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
      another CPU unexpectedly. For example:
               CPU0                            CPU1
        ctnetlink_change_status        __nf_conntrack_find_get
            old = ct->status              nf_ct_gc_expired
                -                         nf_ct_kill
                -                      test_and_set_bit(IPS_DYING_BIT
            new = old | status;                 -
        ct->status = new; <-- oops, _DYING_ is cleared!
      
      Now using a series of atomic bit operation to solve the above issue.
      
      Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
      so make these two bits be unchangable too.
      
      If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
      but actually it is alloced by nf_conntrack_alloc.
      If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
      deference, as the nfct_seqadj(ct) maybe NULL.
      
      Last, add some comments to describe the logic change due to the
      commit a963d710 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
      processing"), which makes me feel a little confusing.
      
      Fixes: 76507f69 ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      53b56da8
    • Liping Zhang's avatar
      netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice · 88be4c09
      Liping Zhang authored
      Currently, ctnetlink_change_conntrack is always protected by _expect_lock,
      but this will cause a deadlock when deleting the helper from a conntrack,
      as the _expect_lock will be acquired again by nf_ct_remove_expectations:
      
               CPU0
              ----
        lock(nf_conntrack_expect_lock);
        lock(nf_conntrack_expect_lock);
      
        *** DEADLOCK ***
        May be due to missing lock nesting notation
      
        2 locks held by lt-conntrack_gr/12853:
        #0:  (&table[i].mutex){+.+.+.}, at: [<ffffffffa05e2009>]
             nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink]
        #1:  (nf_conntrack_expect_lock){+.....}, at: [<ffffffffa05f2c1f>]
             ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink]
      
        Call Trace:
         dump_stack+0x85/0xc2
         __lock_acquire+0x1608/0x1680
         ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink]
         lock_acquire+0x100/0x1f0
         ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
         _raw_spin_lock_bh+0x3f/0x50
         ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
         nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
         ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink]
         ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink]
         nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
         ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
         ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink]
         netlink_rcv_skb+0xa4/0xc0
         nfnetlink_rcv+0x87/0x770 [nfnetlink]
      
      Since the operations are unrelated to nf_ct_expect, so we can drop the
      _expect_lock. Also note, after removing the _expect_lock protection,
      another CPU may invoke nf_conntrack_helper_unregister, so we should
      use rcu_read_lock to protect __nf_conntrack_helper_find invoked by
      ctnetlink_change_helper.
      
      Fixes: ca7433df ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      88be4c09
    • Liping Zhang's avatar
      netfilter: ctnetlink: drop the incorrect cthelper module request · 14e56761
      Liping Zhang authored
      First, when creating a new ct, we will invoke request_module to try to
      load the related inkernel cthelper. So there's no need to call the
      request_module again when updating the ct helpinfo.
      
      Second, ctnetlink_change_helper may be called with rcu_read_lock held,
      i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse ->
      ctnetlink_glue_parse -> ctnetlink_glue_parse_ct ->
      ctnetlink_change_helper. But the request_module invocation may sleep,
      so we can't call it with the rcu_read_lock held.
      
      Remove it now.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      14e56761
    • Liping Zhang's avatar
      netfilter: nft_set_bitmap: free dummy elements when destroy the set · 54a5f9d9
      Liping Zhang authored
      We forget to free dummy elements when deleting the set. So when I was
      running nft-test.py, I saw many kmemleak warnings:
        kmemleak: 1344 new suspected memory leaks ...
        # cat /sys/kernel/debug/kmemleak
        unreferenced object 0xffff8800631345c8 (size 32):
        comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s)
        hex dump (first 32 bytes):
          f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff  .c.c.....y.c....
          04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00  ................
        backtrace:
          [<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff81288174>] __kmalloc+0x164/0x310
          [<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables]
          [<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables]
          [<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables]
          [<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink]
          [<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310
          [<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0
        ...
      
      Fixes: e920dde5 ("netfilter: nft_set_bitmap: keep a list of dummy elements")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      54a5f9d9
    • Liping Zhang's avatar
      netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink · 66e5a6b1
      Liping Zhang authored
      cthelpers added via nfnetlink may have the same tuple, i.e. except for
      the l3proto and l4proto, other fields are all zero. So even with the
      different names, we will also fail to add them:
        # nfct helper add ssdp inet udp
        # nfct helper add tftp inet udp
        nfct v1.4.3: netlink error: File exists
      
      So in order to avoid unpredictable behaviour, we should:
      1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
      report error if duplicated { name, l3proto, l4proto } tuple exist.
      2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
      nf_ct_auto_assign_helper is enabled, so also report error if duplicated
      { l3proto, l4proto, src-port } tuple exist.
      
      Also note, if the cthelper is added from userspace, then the src-port will
      always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
      need to check the second point listed above.
      
      Fixes: 893e093c ("netfilter: nf_ct_helper: bail out on duplicated helpers")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      66e5a6b1
    • Jarno Rajahalme's avatar
      openvswitch: Delete conntrack entry clashing with an expectation. · cf5d7091
      Jarno Rajahalme authored
      Conntrack helpers do not check for a potentially clashing conntrack
      entry when creating a new expectation.  Also, nf_conntrack_in() will
      check expectations (via init_conntrack()) only if a conntrack entry
      can not be found.  The expectation for a packet which also matches an
      existing conntrack entry will not be removed by conntrack, and is
      currently handled inconsistently by OVS, as OVS expects the
      expectation to be removed when the connection tracking entry matching
      that expectation is confirmed.
      
      It should be noted that normally an IP stack would not allow reuse of
      a 5-tuple of an old (possibly lingering) connection for a new data
      connection, so this is somewhat unlikely corner case.  However, it is
      possible that a misbehaving source could cause conntrack entries be
      created that could then interfere with new related connections.
      
      Fix this in the OVS module by deleting the clashing conntrack entry
      after an expectation has been matched.  This causes the following
      nf_conntrack_in() call also find the expectation and remove it when
      creating the new conntrack entry, as well as the forthcoming reply
      direction packets to match the new related connection instead of the
      old clashing conntrack entry.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Reported-by: default avatarYang Song <yangsong@vmware.com>
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cf5d7091
    • Gao Feng's avatar
      netfilter: xt_CT: fix refcnt leak on error path · 470acf55
      Gao Feng authored
      There are two cases which causes refcnt leak.
      
      1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
      free the timeout refcnt.
      Now goto the err_put_timeout error handler instead of going ahead.
      
      2. When the time policy is not found, we should call module_put.
      Otherwise, the related cthelper module cannot be removed anymore.
      It is easy to reproduce by typing the following command:
        # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      470acf55
  3. 21 Apr, 2017 24 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux · 94836ecf
      Linus Torvalds authored
      Pull nfsd bugfix from Bruce Fields:
       "Fix a 4.11 regression that triggers a BUG() on an attempt to use an
        unsupported NFSv4 compound op"
      
      * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
        nfsd: fix oops on unsupported operation
      94836ecf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 057a650b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't race in IPSEC dumps, from Yuejie Shi.
      
       2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
      
       3) Fix out of bounds access in ipv6 segment routing code, from David
          Lebrun.
      
       4) Don't write into the header of cloned SKBs in smsc95xx driver, from
          James Hughes.
      
       5) Several other drivers have this bug too, fix them. From Eric
          Dumazet.
      
       6) Fix access to uninitialized data in TC action cookie code, from
          Wolfgang Bumiller.
      
       7) Fix double free in IPV6 segment routing, again from David Lebrun.
      
       8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
      
       9) Fix use after free in qrtr code, from Dan Carpenter.
      
      10) Don't double-destroy devices in ip6mr code, from Nikolay
          Aleksandrov.
      
      11) Don't pass out-of-range TX queue indices into drivers, from Tushar
          Dave.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        netpoll: Check for skb->queue_mapping
        ip6mr: fix notification device destruction
        bpf, doc: update bpf maintainers entry
        net: qrtr: potential use after free in qrtr_sendmsg()
        bpf: Fix values type used in test_maps
        net: ipv6: RTF_PCPU should not be settable from userspace
        gso: Validate assumption of frag_list segementation
        kaweth: use skb_cow_head() to deal with cloned skbs
        ch9200: use skb_cow_head() to deal with cloned skbs
        lan78xx: use skb_cow_head() to deal with cloned skbs
        sr9700: use skb_cow_head() to deal with cloned skbs
        cx82310_eth: use skb_cow_head() to deal with cloned skbs
        smsc75xx: use skb_cow_head() to deal with cloned skbs
        ipv6: sr: fix double free of skb after handling invalid SRH
        MAINTAINERS: Add "B:" field for networking.
        net sched actions: allocate act cookie early
        qed: Fix issue in populating the PFC config paramters.
        qed: Fix possible system hang in the dcbnl-getdcbx() path.
        qed: Fix sending an invalid PFC error mask to MFW.
        qed: Fix possible error in populating max_tc field.
        ...
      057a650b
    • Tushar Dave's avatar
      netpoll: Check for skb->queue_mapping · c70b17b7
      Tushar Dave authored
      Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
      otherwise skbs with queue_mapping greater than real_num_tx_queues
      can be sent to the underlying driver and can result in kernel panic.
      
      One such event is running netconsole and enabling VF on the same
      device. Or running netconsole and changing number of tx queues via
      ethtool on same device.
      
      e.g.
      Unable to handle kernel NULL pointer dereference
      tsk->{mm,active_mm}->context = 0000000000001525
      tsk->{mm,active_mm}->pgd = fff800130ff9a000
                    \|/ ____ \|/
                    "@'/ .. \`@"
                    /_| \__/ |_\
                       \__U_/
      kworker/48:1(475): Oops [#1]
      CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
      4.11.0-rc3-davem-net+ #7
      Workqueue: events queue_process
      task: fff80013113299c0 task.stack: fff800131132c000
      TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
      00000000    Tainted: G           OE
      TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
      g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
      0000000000000001
      g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
      00000000000000c0
      o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
      0000000000000003
      o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
      000000000049ed94
      RPC: <set_next_entity+0x34/0xb80>
      l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
      0000000000000000
      l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
      fff8001fa7605028
      i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
      0000000000000000
      i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
      00000000103fa4b0
      I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
      Call Trace:
       [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
       [0000000000998c74] netpoll_start_xmit+0xf4/0x200
       [0000000000998e10] queue_process+0x90/0x160
       [0000000000485fa8] process_one_work+0x188/0x480
       [0000000000486410] worker_thread+0x170/0x4c0
       [000000000048c6b8] kthread+0xd8/0x120
       [0000000000406064] ret_from_fork+0x1c/0x2c
       [0000000000000000]           (null)
      Disabling lock debugging due to kernel taint
      Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
      Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
      Caller[0000000000998e10]: queue_process+0x90/0x160
      Caller[0000000000485fa8]: process_one_work+0x188/0x480
      Caller[0000000000486410]: worker_thread+0x170/0x4c0
      Caller[000000000048c6b8]: kthread+0xd8/0x120
      Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
      Caller[0000000000000000]:           (null)
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c70b17b7
    • Nikolay Aleksandrov's avatar
      ip6mr: fix notification device destruction · 723b929c
      Nikolay Aleksandrov authored
      Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
      because we call unregister_netdevice_many for a device that is already
      being destroyed. In IPv4's ipmr that has been resolved by two commits
      long time ago by introducing the "notify" parameter to the delete
      function and avoiding the unregister when called from a notifier, so
      let's do the same for ip6mr.
      
      The trace from Andrey:
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:6813!
      invalid opcode: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Workqueue: netns cleanup_net
      task: ffff880069208000 task.stack: ffff8800692d8000
      RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
      RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
      RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
      RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
      R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
      R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
      FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
      Call Trace:
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
       ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
       notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
       call_netdevice_notifiers net/core/dev.c:1663
       rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many net/core/dev.c:7880
       default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
       ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
       cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
       process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
      47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
      0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
      RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
      ---[ end trace e0b29c57e9b3292c ]---
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      723b929c
    • Daniel Borkmann's avatar
      bpf, doc: update bpf maintainers entry · cdb90499
      Daniel Borkmann authored
      Add various related files that have been missing under
      BPF entry covering essential parts of its infrastructure
      and also add myself as co-maintainer.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb90499
    • Dan Carpenter's avatar
      net: qrtr: potential use after free in qrtr_sendmsg() · 6f60f438
      Dan Carpenter authored
      If skb_pad() fails then it frees the skb so we should check for errors.
      
      Fixes: bdabad3e ("net: Add Qualcomm IPC router")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f60f438
    • David Miller's avatar
      bpf: Fix values type used in test_maps · 89087c45
      David Miller authored
      Maps of per-cpu type have their value element size adjusted to 8 if it
      is specified smaller during various map operations.
      
      This makes test_maps as a 32-bit binary fail, in fact the kernel
      writes past the end of the value's array on the user's stack.
      
      To be quite honest, I think the kernel should reject creation of a
      per-cpu map that doesn't have a value size of at least 8 if that's
      what the kernel is going to silently adjust to later.
      
      If the user passed something smaller, it is a sizeof() calcualtion
      based upon the type they will actually use (just like in this testcase
      code) in later calls to the map operations.
      
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      89087c45
    • David Ahern's avatar
      net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be
      David Ahern authored
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      557c44be
    • Ilan Tayari's avatar
      gso: Validate assumption of frag_list segementation · 43170c4e
      Ilan Tayari authored
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43170c4e
    • David S. Miller's avatar
      Merge branch 'skb_cow_head' · 918b7024
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: use skb_cow_head() to deal with cloned skbs
      
      James Hughes found an issue with smsc95xx driver. Same problematic code
      is found in other drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      918b7024
    • Eric Dumazet's avatar
      kaweth: use skb_cow_head() to deal with cloned skbs · 39fba783
      Eric Dumazet authored
      We can use skb_cow_head() to properly deal with clones,
      especially the ones coming from TCP stack that allow their head being
      modified. This avoids a copy.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39fba783
    • Eric Dumazet's avatar
      ch9200: use skb_cow_head() to deal with cloned skbs · 6bc6895b
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 4a476bd6 ("usbnet: New driver for QinHeng CH9200 devices")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bc6895b
    • Eric Dumazet's avatar
      lan78xx: use skb_cow_head() to deal with cloned skbs · d4ca7359
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4ca7359
    • Eric Dumazet's avatar
      sr9700: use skb_cow_head() to deal with cloned skbs · d532c108
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: c9b37458 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d532c108
    • Eric Dumazet's avatar
      cx82310_eth: use skb_cow_head() to deal with cloned skbs · a9e840a2
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: cc28a20e ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9e840a2
    • Eric Dumazet's avatar
      smsc75xx: use skb_cow_head() to deal with cloned skbs · b7c6d267
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: d0cad871 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7c6d267
    • David Lebrun's avatar
      ipv6: sr: fix double free of skb after handling invalid SRH · 95b9b88d
      David Lebrun authored
      The icmpv6_param_prob() function already does a kfree_skb(),
      this patch removes the duplicate one.
      
      Fixes: 1ababeba ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95b9b88d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 92b4fc75
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Just two fixes.
      
        The first fixes kprobing a stdu, and is marked for stable as it's been
        broken for ~ever. In hindsight this could have gone in next.
      
        The other is a fix for a change we merged this cycle, where if we take
        a certain exception when the kernel is running relocated (currently
        only used for kdump), we checkstop the box.
      
        Thanks to Ravi Bangoria"
      
      * tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
        powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
      92b4fc75
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · fe7ba289
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "Sorry this is so late. It's been in -next for over a week, but I
        forgot to send it on until now.
      
        A single fix to the DT binding of the HiSilicon PCIe host support"
      
      * tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
      fe7ba289
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · a9aa1908
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "A couple of last minute fixes for regressions in this cycle. More
        specifically:
      
         - Two patches from Andy, adjusting the NVMe APST quirks to avoid some
           issues specific to one Toshiba drive, and some variant of Samsung
           on two specific Dell laptops.
      
         - A fix for mtip32xx, turning off mq scheduling on that device. We
           have a real fix for this, but it's too late in the cycle.
           Thankfully we already have a NO_SCHED flag we can apply here. A
           prep patch for this is ensuring that we honor the NO_SCHED flag
           when attempting to online switch schedulers, previsouly we only did
           so for drive load time. From Ming.
      
         - Fixing an oops in blk-mq polling with scheduling attached. This one
           is easily reproducible, it would be a shame to release 4.11 with
           that issue. From me.
      
        I'd prefer not having to send in patches at this point in time, but
        the above are all things that have regressed in this cycle and the
        fixes are relatively straight forward"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: fix potential oops with polling and blk-mq scheduler
        nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
        nvme: Adjust the Samsung APST quirk
        mtip32xx: pass BLK_MQ_F_NO_SCHED
        block: respect BLK_MQ_F_NO_SCHED
      a9aa1908
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4664e322
      Linus Torvalds authored
      Pull ACPI build fix from Rafael Wysocki:
       "This avoids a false-positive build warning from the compiler.
      
        Specifics:
      
         - Avoid a false-positive warning regarding a variable that may not be
           initialized that started to trigger after a previous general build
           fix (Arnd Bergmann)"
      
      * tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / power: Avoid maybe-uninitialized warning
      4664e322
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 11b211ed
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
      
         - kmalloc sdio scratch buffer to make it DMA-friendly
      
        MMC host:
      
         - dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
      
         - sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
           cards"
      
      * tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
        mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
        mmc: sdio: fix alignment issue in struct sdio_func
      11b211ed
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 4d4dfc1c
      Linus Torvalds authored
      Pull input fixlet from Dmitry Torokhov:
       "An update to Elan PS/2 driver to allow working on yet another
        Lifebook"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
      4d4dfc1c
    • David S. Miller's avatar
      MAINTAINERS: Add "B:" field for networking. · b0522e13
      David S. Miller authored
      We want people to report bugs to the netdev list.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0522e13
  4. 20 Apr, 2017 5 commits