1. 15 Nov, 2016 12 commits
    • Anoob Soman's avatar
      packet: call fanout_release, while UNREGISTERING a netdev · d72cb5fb
      Anoob Soman authored
      [ Upstream commit 66644982 ]
      
      If a socket has FANOUT sockopt set, a new proto_hook is registered
      as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
      af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
      registered as part of fanout_add is not removed. Call fanout_release, on a
      NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
      fanout_list.
      
      This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d72cb5fb
    • Andrew Collins's avatar
      net: Add netdev all_adj_list refcnt propagation to fix panic · 63091b2c
      Andrew Collins authored
      [ Upstream commit 93409033 ]
      
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: default avatarAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63091b2c
    • Shmulik Ladkani's avatar
      net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions · 9edbf4a0
      Shmulik Ladkani authored
      [ Upstream commit f39acc84 ]
      
      Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the
      case where the input skb data pointer does not point at the mac header:
      
      - They're doing push/pop, but fail to properly unwind data back to its
        original location.
        For example, in the skb_vlan_push case, any subsequent
        'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes
        BEFORE start of frame, leading to bogus frames that may be transmitted.
      
      - They update rcsum per the added/removed 4 bytes tag.
        Alas if data is originally after the vlan/eth headers, then these
        bytes were already pulled out of the csum.
      
      OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header
      present no issues.
      
      act_vlan is the only caller to skb_vlan_*() that has skb->data pointing
      at network header (upon ingress).
      Other calles (ovs, bpf) already adjust skb->data at mac_header.
      
      This patch fixes act_vlan to point to the mac_header prior calling
      skb_vlan_*() functions, as other callers do.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9edbf4a0
    • Paolo Abeni's avatar
      net: pktgen: fix pkt_size · bb7ffb6b
      Paolo Abeni authored
      [ Upstream commit 63d75463 ]
      
      The commit 879c7220 ("net: pktgen: Observe needed_headroom
      of the device") increased the 'pkt_overhead' field value by
      LL_RESERVED_SPACE.
      As a side effect the generated packet size, computed as:
      
      	/* Eth + IPh + UDPh + mpls */
      	datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
      		  pkt_dev->pkt_overhead;
      
      is decreased by the same value.
      The above changed slightly the behavior of existing pktgen users,
      and made the procfs interface somewhat inconsistent.
      Fix it by restoring the previous pkt_overhead value and using
      LL_RESERVED_SPACE as extralen in skb allocation.
      Also, change pktgen_alloc_skb() to only partially reserve
      the headroom to allow the caller to prefetch from ll header
      start.
      
      v1 -> v2:
       - fixed some typos in the comments
      
      Fixes: 879c7220 ("net: pktgen: Observe needed_headroom of the device")
      Suggested-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb7ffb6b
    • Gavin Schenk's avatar
      net: fec: set mac address unconditionally · bc5d8ced
      Gavin Schenk authored
      [ Upstream commit b82d44d7 ]
      
      If the mac address origin is not dt, you can only safely assign a mac
      address after "link up" of the device. If the link is off the clocks are
      disabled and because of issues assigning registers when clocks are off the
      new mac address cannot be written in .ndo_set_mac_address() on some soc's.
      This fix sets the mac address unconditionally in fec_restart(...) and
      ensures consistency between fec registers and the network layer.
      Signed-off-by: default avatarGavin Schenk <g.schenk@eckelmann.de>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Fixes: 9638d19e ("net: fec: add netif status check before set mac address")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc5d8ced
    • Milton Miller's avatar
      tg3: Avoid NULL pointer dereference in tg3_io_error_detected() · 0ee4acb7
      Milton Miller authored
      [ Upstream commit 1b0ff898 ]
      
      While the driver is probing the adapter, an error may occur before the
      netdev structure is allocated and attached to pci_dev. In this case,
      not only netdev isn't available, but the tg3 private structure is also
      not available as it is just math from the NULL pointer, so dereferences
      must be skipped.
      
      The following trace is seen when the error is triggered:
      
        [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
        [1.402410] Faulting instruction address: 0xc0000000007e33f8
        [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
        [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
        [1.402513] Modules linked in:
        [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
        [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
        [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
        [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
        [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
        [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
        GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
        GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
        GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
        GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
        GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
        GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
        [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
        [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340
      
      This patch avoids the NULL pointer dereference by moving the access after
      the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
      check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].
      
      Fixes: 0486a063 ("tg3: prevent ifup/ifdown during PCI error recovery")
      Fixes: dfc8f370 ("net/tg3: Release IRQs on permanent error")
      Tested-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: default avatarMilton Miller <miltonm@us.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ee4acb7
    • Nikolay Aleksandrov's avatar
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 6eb0061f
      Nikolay Aleksandrov authored
      [ Upstream commit 2cf75070 ]
      
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eb0061f
    • Lance Richardson's avatar
      ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() · 4f312a80
      Lance Richardson authored
      [ Upstream commit db32e4e4 ]
      
      Similar to commit 3be07244 ("ip6_gre: fix flowi6_proto value in
      xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.
      
      Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
      This affected output route lookup for packets sent on an ip6gretap device
      in cases where routing was dependent on the value of flowi6_proto.
      
      Since the correct proto is already set in the tunnel flowi6 template via
      commit 252f3f5a ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
      path."), simply delete the line setting the incorrect flowi6_proto value.
      Suggested-by: default avatarJiri Benc <jbenc@redhat.com>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reviewed-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: default avatarLance Richardson <lrichard@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f312a80
    • Eric Dumazet's avatar
      tcp: fix a compile error in DBGUNDO() · aadcd6a9
      Eric Dumazet authored
      [ Upstream commit 019b1c9f ]
      
      If DBGUNDO() is enabled (FASTRETRANS_DEBUG > 1), a compile
      error will happen, since inet6_sk(sk)->daddr became sk->sk_v6_daddr
      
      Fixes: efe4208f ("ipv6: make lookups simpler and faster")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aadcd6a9
    • Douglas Caetano dos Santos's avatar
      tcp: fix wrong checksum calculation on MTU probing · ac401485
      Douglas Caetano dos Santos authored
      [ Upstream commit 2fe664f1 ]
      
      With TCP MTU probing enabled and offload TX checksumming disabled,
      tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
      into the probe's SKB had an odd length. This was caused by the direct use
      of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
      fragment being copied, if needed. When this fragment was not the last, a
      subsequent call used the previous checksum without considering this
      padding.
      
      The effect was a stale connection in one way, as even retransmissions
      wouldn't solve the problem, because the checksum was never recalculated for
      the full SKB length.
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac401485
    • Eric Dumazet's avatar
      net: avoid sk_forward_alloc overflows · d2e01b15
      Eric Dumazet authored
      [ Upstream commit 20c64d5c ]
      
      A malicious TCP receiver, sending SACK, can force the sender to split
      skbs in write queue and increase its memory usage.
      
      Then, when socket is closed and its write queue purged, we might
      overflow sk_forward_alloc (It becomes negative)
      
      sk_mem_reclaim() does nothing in this case, and more than 2GB
      are leaked from TCP perspective (tcp_memory_allocated is not changed)
      
      Then warnings trigger from inet_sock_destruct() and
      sk_stream_kill_queues() seeing a not zero sk_forward_alloc
      
      All TCP stack can be stuck because TCP is under memory pressure.
      
      A simple fix is to preemptively reclaim from sk_mem_uncharge().
      
      This makes sure a socket wont have more than 2 MB forward allocated,
      after burst and idle period.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2e01b15
    • Eric Dumazet's avatar
      tcp: fix overflow in __tcp_retransmit_skb() · a35ce624
      Eric Dumazet authored
      [ Upstream commit ffb4d6c8 ]
      
      If a TCP socket gets a large write queue, an overflow can happen
      in a test in __tcp_retransmit_skb() preventing all retransmits.
      
      The flow then stalls and resets after timeouts.
      
      Tested:
      
      sysctl -w net.core.wmem_max=1000000000
      netperf -H dest -- -s 1000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a35ce624
  2. 10 Nov, 2016 28 commits