1. 18 Oct, 2018 40 commits
    • Taehee Yoo's avatar
      ip: frags: fix crash in ip_do_fragment() · 85e59af9
      Taehee Yoo authored
      commit 5d407b07 upstream
      
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85e59af9
    • Peter Oskolkov's avatar
      ip: process in-order fragments efficiently · 4077ddb2
      Peter Oskolkov authored
      This patch changes the runtime behavior of IP defrag queue:
      incoming in-order fragments are added to the end of the current
      list/"run" of in-order fragments at the tail.
      
      On some workloads, UDP stream performance is substantially improved:
      
      RX: ./udp_stream -F 10 -T 2 -l 60
      TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60
      
      with this patchset applied on a 10Gbps receiver:
      
        throughput=9524.18
        throughput_units=Mbit/s
      
      upstream (net-next):
      
        throughput=4608.93
        throughput_units=Mbit/s
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit a4fd284a)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4077ddb2
    • Peter Oskolkov's avatar
      ip: add helpers to process in-order fragments faster. · e9e4ac48
      Peter Oskolkov authored
      This patch introduces several helper functions/macros that will be
      used in the follow-up patch. No runtime changes yet.
      
      The new logic (fully implemented in the second patch) is as follows:
      
      * Nodes in the rb-tree will now contain not single fragments, but lists
        of consecutive fragments ("runs").
      
      * At each point in time, the current "active" run at the tail is
        maintained/tracked. Fragments that arrive in-order, adjacent
        to the previous tail fragment, are added to this tail run without
        triggering the re-balancing of the rb-tree.
      
      * If a fragment arrives out of order with the offset _before_ the tail run,
        it is inserted into the rb-tree as a single fragment.
      
      * If a fragment arrives after the current tail fragment (with a gap),
        it starts a new "tail" run, as is inserted into the rb-tree
        at the end as the head of the new run.
      
      skb->cb is used to store additional information
      needed here (suggested by Eric Dumazet).
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 353c9cb3)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e4ac48
    • Peter Oskolkov's avatar
      ip: use rb trees for IP frag queue. · 10043954
      Peter Oskolkov authored
      (commit fa0f5273 upstream)
      
      Similar to TCP OOO RX queue, it makes sense to use rb trees to store
      IP fragments, so that OOO fragments are inserted faster.
      
      Tested:
      
      - a follow-up patch contains a rather comprehensive ip defrag
        self-test (functional)
      - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
          netstat --statistics
          Ip:
              282078937 total packets received
              0 forwarded
              0 incoming packets discarded
              946760 incoming packets delivered
              18743456 requests sent out
              101 fragments dropped after timeout
              282077129 reassemblies required
              944952 packets reassembled ok
              262734239 packet reassembles failed
         (The numbers/stats above are somewhat better re:
          reassemblies vs a kernel without this patchset. More
          comprehensive performance testing TBD).
      Reported-by: default avatarJann Horn <jannh@google.com>
      Reported-by: default avatarJuha-Matti Tilli <juha-matti.tilli@iki.fi>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10043954
    • Eric Dumazet's avatar
      net: add rb_to_skb() and other rb tree helpers · b475cf3b
      Eric Dumazet authored
      Geeralize private netem_rb_to_skb()
      
      TCP rtx queue will soon be converted to rb-tree,
      so we will need skb_rbtree_walk() helpers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 18a4c0ea)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b475cf3b
    • Eric Dumazet's avatar
      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends · 791521e2
      Eric Dumazet authored
      After working on IP defragmentation lately, I found that some large
      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
      zero paddings on the last (small) fragment.
      
      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
      fragments had CHECKSUM_COMPLETE set.
      
      We can instead compute the checksum of the part we are trimming,
      usually smaller than the part we keep.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 88078d98)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      791521e2
    • Florian Westphal's avatar
      ipv6: defrag: drop non-last frags smaller than min mtu · a8444b1c
      Florian Westphal authored
      don't bother with pathological cases, they only waste cycles.
      IPv6 requires a minimum MTU of 1280 so we should never see fragments
      smaller than this (except last frag).
      
      v3: don't use awkward "-offset + len"
      v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
          There were concerns that there could be even smaller frags
          generated by intermediate nodes, e.g. on radio networks.
      
      Cc: Peter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 0ed4229b)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8444b1c
    • Peter Oskolkov's avatar
      net: modify skb_rbtree_purge to return the truesize of all purged skbs. · 87169595
      Peter Oskolkov authored
      Tested: see the next patch is the series.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 385114de)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87169595
    • Eric Dumazet's avatar
      net: speed up skb_rbtree_purge() · f5d17b55
      Eric Dumazet authored
      As measured in my prior patch ("sch_netem: faster rb tree removal"),
      rbtree_postorder_for_each_entry_safe() is nice looking but much slower
      than using rb_next() directly, except when tree is small enough
      to fit in CPU caches (then the cost is the same)
      
      Also note that there is not even an increase of text size :
      $ size net/core/skbuff.o.before net/core/skbuff.o
         text	   data	    bss	    dec	    hex	filename
        40711	   1298	      0	  42009	   a419	net/core/skbuff.o.before
        40711	   1298	      0	  42009	   a419	net/core/skbuff.o
      
      From: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 7c90584c)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5d17b55
    • Peter Oskolkov's avatar
      ip: discard IPv4 datagrams with overlapping segments. · 82f36cbc
      Peter Oskolkov authored
      This behavior is required in IPv6, and there is little need
      to tolerate overlapping fragments in IPv4. This change
      simplifies the code and eliminates potential DDoS attack vectors.
      
      Tested: ran ip_defrag selftest (not yet available uptream).
      Suggested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 7969e5c4)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82f36cbc
    • Eric Dumazet's avatar
      inet: frags: fix ip6frag_low_thresh boundary · d8384866
      Eric Dumazet authored
      Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
      since linker might place next to it a non zero value preventing a change
      to ip6frag_low_thresh.
      
      ip6frag_low_thresh is not used anymore in the kernel, but we do not
      want to prematuraly break user scripts wanting to change it.
      
      Since specifying a minimal value of 0 for proc_doulongvec_minmax()
      is moot, let's remove these zero values in all defrag units.
      
      Fixes: 6e00f7dd ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 3d234012)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8384866
    • Eric Dumazet's avatar
      inet: frags: get rid of ipfrag_skb_cb/FRAG_CB · 316986fe
      Eric Dumazet authored
      ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
      this integer is currently in a different cache line than skb->next,
      meaning that we use two cache lines per skb when finding the insertion point.
      
      By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
      in a single cache line and save precious memory bandwidth.
      
      Note that after the fast path added by Changli Gao in commit
      d6bebca9 ("fragment: add fast path for in-order fragments")
      this change wont help the fast path, since we still need
      to access prev->len (2nd cache line), but will show great
      benefits when slow path is entered, since we perform
      a linear scan of a potentially long list.
      
      Also, note that this potential long list is an attack vector,
      we might consider also using an rb-tree there eventually.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit bf663371)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      316986fe
    • Eric Dumazet's avatar
      inet: frags: reorganize struct netns_frags · 5b68fda0
      Eric Dumazet authored
      Put the read-mostly fields in a separate cache line
      at the beginning of struct netns_frags, to reduce
      false sharing noticed in inet_frag_kill()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit c2615cf5)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b68fda0
    • Eric Dumazet's avatar
      rhashtable: reorganize struct rhashtable layout · 6060bcdc
      Eric Dumazet authored
      While under frags DDOS I noticed unfortunate false sharing between
      @nelems and @params.automatic_shrinking
      
      Move @nelems at the end of struct rhashtable so that first cache line
      is shared between all cpus, because almost never dirtied.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit e5d672a0)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6060bcdc
    • Eric Dumazet's avatar
      ipv6: frags: rewrite ip6_expire_frag_queue() · cbc45497
      Eric Dumazet authored
      Make it similar to IPv4 ip_expire(), and release the lock
      before calling icmp functions.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 05c0b86b)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbc45497
    • Eric Dumazet's avatar
      inet: frags: do not clone skb in ip_expire() · 7a87ec92
      Eric Dumazet authored
      An skb_clone() was added in commit ec4fbd64 ("inet: frag: release
      spinlock before calling icmp_send()")
      
      While fixing the bug at that time, it also added a very high cost
      for DDOS frags, as the ICMP rate limit is applied after this
      expensive operation (skb_clone() + consume_skb(), implying memory
      allocations, copy, and freeing)
      
      We can use skb_get(head) here, all we want is to make sure skb wont
      be freed by another cpu.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 1eec5d56)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a87ec92
    • Eric Dumazet's avatar
      inet: frags: break the 2GB limit for frags storage · 7f617068
      Eric Dumazet authored
      Some users are willing to provision huge amounts of memory to be able
      to perform reassembly reasonnably well under pressure.
      
      Current memory tracking is using one atomic_t and integers.
      
      Switch to atomic_long_t so that 64bit arches can use more than 2GB,
      without any cost for 32bit arches.
      
      Note that this patch avoids an overflow error, if high_thresh was set
      to ~2GB, since this test in inet_frag_alloc() was never true :
      
      if (... || frag_mem_limit(nf) > nf->high_thresh)
      
      Tested:
      
      $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh
      
      <frag DDOS>
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 14705885 memory 16000002880
      
      $ nstat -n ; sleep 1 ; nstat | grep Reas
      IpReasmReqds                    3317150            0.0
      IpReasmFails                    3317112            0.0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 3e67f106)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f617068
    • Eric Dumazet's avatar
      inet: frags: remove inet_frag_maybe_warn_overflow() · 965e2adc
      Eric Dumazet authored
      This function is obsolete, after rhashtable addition to inet defrag.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 2d44ed22)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      965e2adc
    • Eric Dumazet's avatar
      inet: frags: get rif of inet_frag_evicting() · 49106f36
      Eric Dumazet authored
      This refactors ip_expire() since one indentation level is removed.
      
      Note: in the future, we should try hard to avoid the skb_clone()
      since this is a serious performance cost.
      Under DDOS, the ICMP message wont be sent because of rate limits.
      
      Fact that ip6_expire_frag_queue() does not use skb_clone() is
      disturbing too. Presumably IPv6 should have the same
      issue than the one we fixed in commit ec4fbd64
      ("inet: frag: release spinlock before calling icmp_send()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 399d1404)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49106f36
    • Eric Dumazet's avatar
      inet: frags: remove some helpers · ea7496f0
      Eric Dumazet authored
      Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()
      
      Also since we use rhashtable we can bring back the number of fragments
      in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
      removed in commit 434d3054 ("inet: frag: don't account number
      of fragment queues")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 6befe4a7)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea7496f0
    • Eric Dumazet's avatar
      inet: frags: use rhashtables for reassembly units · 23ce9c5c
      Eric Dumazet authored
      Some applications still rely on IP fragmentation, and to be fair linux
      reassembly unit is not working under any serious load.
      
      It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
      
      A work queue is supposed to garbage collect items when host is under memory
      pressure, and doing a hash rebuild, changing seed used in hash computations.
      
      This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
      occurring every 5 seconds if host is under fire.
      
      Then there is the problem of sharing this hash table for all netns.
      
      It is time to switch to rhashtables, and allocate one of them per netns
      to speedup netns dismantle, since this is a critical metric these days.
      
      Lookup is now using RCU. A followup patch will even remove
      the refcount hold/release left from prior implementation and save
      a couple of atomic operations.
      
      Before this patch, 16 cpus (16 RX queue NIC) could not handle more
      than 1 Mpps frags DDOS.
      
      After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
      of storage for the fragments (exact number depends on frags being evicted
      after timeout)
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 1966916 memory 2140004608
      
      A followup patch will change the limits for 64bit arches.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 648700f7)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23ce9c5c
    • Eric Dumazet's avatar
      rhashtable: add schedule points · fb19348b
      Eric Dumazet authored
      Rehashing and destroying large hash table takes a lot of time,
      and happens in process context. It is safe to add cond_resched()
      in rhashtable_rehash_table() and rhashtable_free_and_destroy()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit ae6da1f5)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb19348b
    • Eric Dumazet's avatar
      ipv6: export ip6 fragments sysctl to unprivileged users · 620018dd
      Eric Dumazet authored
      IPv4 was changed in commit 52a773d6 ("net: Export ip fragment
      sysctl to unprivileged users")
      
      The only sysctl that is not per-netns is not used :
      ip6frag_secret_interval
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 18dcbe12)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      620018dd
    • Eric Dumazet's avatar
      inet: frags: refactor lowpan_net_frag_init() · 1b363f81
      Eric Dumazet authored
      We want to call lowpan_net_frag_init() earlier.
      Similar to commit "inet: frags: refactor ipv6_frag_init()"
      
      This is a prereq to "inet: frags: use rhashtables for reassembly units"
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 807f1844)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b363f81
    • Eric Dumazet's avatar
      inet: frags: refactor ipv6_frag_init() · dae73e7d
      Eric Dumazet authored
      We want to call inet_frags_init() earlier.
      
      This is a prereq to "inet: frags: use rhashtables for reassembly units"
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 5b975bab)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dae73e7d
    • Eric Dumazet's avatar
      inet: frags: refactor ipfrag_init() · bbf6d860
      Eric Dumazet authored
      We need to call inet_frags_init() before register_pernet_subsys(),
      as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 483a6e4f)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbf6d860
    • Eric Dumazet's avatar
      inet: frags: add a pointer to struct netns_frags · 2ffb1c36
      Eric Dumazet authored
      In order to simplify the API, add a pointer to struct inet_frags.
      This will allow us to make things less complex.
      
      These functions no longer have a struct inet_frags parameter :
      
      inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
      inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
      ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 093ba729)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ffb1c36
    • Eric Dumazet's avatar
      inet: frags: change inet_frags_init_net() return value · 7fca7715
      Eric Dumazet authored
      We will soon initialize one rhashtable per struct netns_frags
      in inet_frags_init_net().
      
      This patch changes the return value to eventually propagate an
      error.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 787bea77)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fca7715
    • Eric Dumazet's avatar
      inet: make sure to grab rcu_read_lock before using ireq->ireq_opt · 6b2f36b9
      Eric Dumazet authored
      [ Upstream commit 2ab2ddd3 ]
      
      Timer handlers do not imply rcu_read_lock(), so my recent fix
      triggered a LOCKDEP warning when SYNACK is retransmit.
      
      Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
      usages instead of guessing what is done by callers, since it is
      not worth the pain.
      
      Get rid of ireq_opt_deref() helper since it hides the logic
      without real benefit, since it is now a standard rcu_dereference().
      
      Fixes: 1ad98e9d ("tcp/dccp: fix lockdep issue when SYN is backlogged")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b2f36b9
    • Eric Dumazet's avatar
      tcp/dccp: fix lockdep issue when SYN is backlogged · 4cded0a3
      Eric Dumazet authored
      [ Upstream commit 1ad98e9d ]
      
      In normal SYN processing, packets are handled without listener
      lock and in RCU protected ingress path.
      
      But syzkaller is known to be able to trick us and SYN
      packets might be processed in process context, after being
      queued into socket backlog.
      
      In commit 06f877d6 ("tcp/dccp: fix other lockdep splats
      accessing ireq_opt") I made a very stupid fix, that happened
      to work mostly because of the regular path being RCU protected.
      
      Really the thing protecting ireq->ireq_opt is RCU read lock,
      and the pseudo request refcnt is not relevant.
      
      This patch extends what I did in commit 449809a6 ("tcp/dccp:
      block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
      pair in the paths that might be taken when processing SYN from
      socket backlog (thus possibly in process context)
      
      Fixes: 06f877d6 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cded0a3
    • Eric Dumazet's avatar
      rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 · 069bb7fd
      Eric Dumazet authored
      [ Upstream commit 0e1d6eca ]
      
      We have an impressive number of syzkaller bugs that are linked
      to the fact that syzbot was able to create a networking device
      with millions of TX (or RX) queues.
      
      Let's limit the number of RX/TX queues to 4096, this really should
      cover all known cases.
      
      A separate patch will add various cond_resched() in the loops
      handling sysfs entries at device creation and dismantle.
      
      Tested:
      
      lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
      RTNETLINK answers: Invalid argument
      
      lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap
      
      real	0m0.180s
      user	0m0.000s
      sys	0m0.107s
      
      Fixes: 76ff5cc9 ("rtnl: allow to specify number of rx and tx queues on device creation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      069bb7fd
    • Florian Fainelli's avatar
      net: systemport: Fix wake-up interrupt race during resume · c3291efb
      Florian Fainelli authored
      [ Upstream commit 45ec3185 ]
      
      The AON_PM_L2 is normally used to trigger and identify the source of a
      wake-up event. Since the RX_SYS clock is no longer turned off, we also
      have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and
      that interrupt remains active up until the magic packet detector is
      disabled which happens much later during the driver resumption.
      
      The race happens if we have a CPU that is entering the SYSTEMPORT
      INTRL2_0 handler during resume, and another CPU has managed to clear the
      wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we
      have the first CPU stuck in the interrupt handler with an interrupt
      cause that has been cleared under its feet, and so we keep returning
      IRQ_NONE and we never make any progress.
      
      This was not a problem before because we would always turn off the
      RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned
      off as well, thus not latching the interrupt.
      
      The fix is to make sure we do not enable either the MPD or
      BRCM_TAG_MATCH interrupts since those are redundant with what the
      AON_PM_L2 interrupt controller already processes and they would cause
      such a race to occur.
      
      Fixes: bb9051a2 ("net: systemport: Add support for WAKE_FILTER")
      Fixes: 83e82f4c ("net: systemport: add Wake-on-LAN support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3291efb
    • Maxime Chevallier's avatar
      net: mvpp2: Extract the correct ethtype from the skb for tx csum offload · 353825f2
      Maxime Chevallier authored
      [ Upstream commit 35f3625c ]
      
      When offloading the L3 and L4 csum computation on TX, we need to extract
      the l3_proto from the ethtype, independently of the presence of a vlan
      tag.
      
      The actual driver uses skb->protocol as-is, resulting in packets with
      the wrong L4 checksum being sent when there's a vlan tag in the packet
      header and checksum offloading is enabled.
      
      This commit makes use of vlan_protocol_get() to get the correct ethtype
      regardless the presence of a vlan tag.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      353825f2
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix unbind ordering · 7f10b7e2
      Florian Fainelli authored
      [ Upstream commit bf3b452b ]
      
      The order in which we release resources is unfortunately leading to bus
      errors while dismantling the port. This is because we set
      priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now
      permissible to clock gate the switch. Later on, when dsa_slave_destroy()
      comes in from dsa_unregister_switch() and calls
      dsa_switch_ops::port_disable, we perform the same dismantling again, and
      this time we hit registers that are clock gated.
      
      Make sure that dsa_unregister_switch() is the first thing that happens,
      which takes care of releasing all user visible resources, then proceed
      with clock gating hardware. We still need to set priv->wol_ports_mask to
      0 to make sure that an enabled port properly gets disabled in case it
      was previously used as part of Wake-on-LAN.
      
      Fixes: d9338023 ("net: dsa: bcm_sf2: Make it a real platform device driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f10b7e2
    • Ido Schimmel's avatar
      team: Forbid enslaving team device to itself · 68d6d372
      Ido Schimmel authored
      [ Upstream commit 471b83bd ]
      
      team's ndo_add_slave() acquires 'team->lock' and later tries to open the
      newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
      that causes the VLAN driver to add VLAN 0 on the team device. team's
      ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
      deadlock.
      
      Fix this by checking early at the enslavement function that a team
      device is not being enslaved to itself.
      
      A similar check was added to the bond driver in commit 09a89c21
      ("bonding: disallow enslaving a bond to itself").
      
      WARNING: possible recursive locking detected
      4.18.0-rc7+ #176 Not tainted
      --------------------------------------------
      syz-executor4/6391 is trying to acquire lock:
      (____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
      
      but task is already holding lock:
      (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&team->lock);
        lock(&team->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by syz-executor4/6391:
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
       #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947
      
      stack backtrace:
      CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
       check_deadlock kernel/locking/lockdep.c:1809 [inline]
       validate_chain kernel/locking/lockdep.c:2405 [inline]
       __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
       lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
       __mutex_lock_common kernel/locking/mutex.c:757 [inline]
       __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
       team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
       vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
       __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
       vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
       vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
       notifier_call_chain+0x180/0x390 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
       call_netdevice_notifiers net/core/dev.c:1753 [inline]
       dev_open+0x173/0x1b0 net/core/dev.c:1433
       team_port_add drivers/net/team/team.c:1219 [inline]
       team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
       do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
       do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
       rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:642 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:652
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
       __sys_sendmsg+0x11d/0x290 net/socket.c:2164
       __do_sys_sendmsg net/socket.c:2173 [inline]
       __se_sys_sendmsg net/socket.c:2171 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x456b29
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000
      
      Fixes: 87002b03 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68d6d372
    • Giacinto Cifelli's avatar
      qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface · fef73013
      Giacinto Cifelli authored
      [ Upstream commit 4f761770 ]
      
      Added support for Gemalto's Cinterion ALASxx WWAN interfaces
      by adding QMI_FIXED_INTF with Cinterion's VID and PID.
      Signed-off-by: default avatarGiacinto Cifelli <gciofono@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fef73013
    • Shahed Shaikh's avatar
      qlcnic: fix Tx descriptor corruption on 82xx devices · 902730f1
      Shahed Shaikh authored
      [ Upstream commit c333fa0c ]
      
      In regular NIC transmission flow, driver always configures MAC using
      Tx queue zero descriptor as a part of MAC learning flow.
      But with multi Tx queue supported NIC, regular transmission can occur on
      any non-zero Tx queue and from that context it uses
      Tx queue zero descriptor to configure MAC, at the same time TX queue
      zero could be used by another CPU for regular transmission
      which could lead to Tx queue zero descriptor corruption and cause FW
      abort.
      
      This patch fixes this in such a way that driver always configures
      learned MAC address from the same Tx queue which is used for
      regular transmission.
      
      Fixes: 7e2cf4fe ("qlcnic: change driver hardware interface mechanism")
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      902730f1
    • Yu Zhao's avatar
      net/usb: cancel pending work when unbinding smsc75xx · 91c41e3f
      Yu Zhao authored
      [ Upstream commit f7b2a56e ]
      
      Cancel pending work before freeing smsc75xx private data structure
      during binding. This fixes the following crash in the driver:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      IP: mutex_lock+0x2b/0x3f
      <snipped>
      Workqueue: events smsc75xx_deferred_multicast_write [smsc75xx]
      task: ffff8caa83e85700 task.stack: ffff948b80518000
      RIP: 0010:mutex_lock+0x2b/0x3f
      <snipped>
      Call Trace:
       smsc75xx_deferred_multicast_write+0x40/0x1af [smsc75xx]
       process_one_work+0x18d/0x2fc
       worker_thread+0x1a2/0x269
       ? pr_cont_work+0x58/0x58
       kthread+0xfa/0x10a
       ? pr_cont_work+0x58/0x58
       ? rcu_read_unlock_sched_notrace+0x48/0x48
       ret_from_fork+0x22/0x40
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91c41e3f
    • Sean Tranchetti's avatar
      netlabel: check for IPV4MASK in addrinfo_get · a7fa1e6b
      Sean Tranchetti authored
      [ Upstream commit f88b4c01 ]
      
      netlbl_unlabel_addrinfo_get() assumes that if it finds the
      NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
      NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
      not necessarily the case as the current checks in
      netlbl_unlabel_staticadd() and friends are not sufficent to
      enforce this.
      
      If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
      NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
      these functions will all call netlbl_unlabel_addrinfo_get() which
      will then attempt dereference NULL when fetching the non-existent
      NLBL_UNLABEL_A_IPV4MASK attribute:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0
      Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
      Call trace:
      	netlbl_unlabel_addrinfo_get+0x44/0xd8
      	netlbl_unlabel_staticremovedef+0x98/0xe0
      	genl_rcv_msg+0x354/0x388
      	netlink_rcv_skb+0xac/0x118
      	genl_rcv+0x34/0x48
      	netlink_unicast+0x158/0x1f0
      	netlink_sendmsg+0x32c/0x338
      	sock_sendmsg+0x44/0x60
      	___sys_sendmsg+0x1d0/0x2a8
      	__sys_sendmsg+0x64/0xb4
      	SyS_sendmsg+0x34/0x4c
      	el0_svc_naked+0x34/0x38
      Code: 51001149 7100113f 540000a0 f9401508 (79400108)
      ---[ end trace f6438a488e737143 ]---
      Kernel panic - not syncing: Fatal exception
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7fa1e6b
    • Jeff Barnhill's avatar
      net/ipv6: Display all addresses in output of /proc/net/if_inet6 · ea14355a
      Jeff Barnhill authored
      [ Upstream commit 86f9bd1f ]
      
      The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
      handle starting/stopping the iteration.  The problem is that at some point
      during the iteration, an overflow is detected and the process is
      subsequently stopped.  The item being shown via seq_printf() when the
      overflow occurs is not actually shown, though.  When start() is
      subsequently called to resume iterating, it returns the next item, and
      thus the item that was being processed when the overflow occurred never
      gets printed.
      
      Alter the meaning of the private data member "offset".  Currently, when it
      is not 0 (which only happens at the very beginning), "offset" represents
      the next hlist item to be printed.  After this change, "offset" always
      represents the current item.
      
      This is also consistent with the private data member "bucket", which
      represents the current bucket, and also the use of "pos" as defined in
      seq_file.txt:
          The pos passed to start() will always be either zero, or the most
          recent pos used in the previous session.
      Signed-off-by: default avatarJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea14355a