1. 30 Jul, 2015 13 commits
    • Sunil Goutham's avatar
      net: thunderx: Wakeup TXQ only if CQE_TX are processed · 74840b83
      Sunil Goutham authored
      Previously TXQ is wakedup whenever napi is executed
      and irrespective of if any CQE_TX are processed or not.
      Added 'txq_stop' and 'txq_wake' counters to aid in debugging
      if there are any future issues.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74840b83
    • Sunil Goutham's avatar
      net: thunderx: Suppress alloc_pages() failure warnings · f8ce9666
      Sunil Goutham authored
      Suppressing standard alloc_pages() warnings. Some kernel configs limit
      alloc size and the network driver may fail. Do not drop a kernel
      warning in this case, instead just drop a oneliner that the network
      driver could not be loaded since the buffer could not be allocated.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8ce9666
    • Sunil Goutham's avatar
      net: thunderx: Fix TSO packet statistic · 2cb468e0
      Sunil Goutham authored
      Fixing TSO packages not being counted.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cb468e0
    • Sunil Goutham's avatar
      net: thunderx: Fix memory leak when changing queue count · c62cd3c4
      Sunil Goutham authored
      Fix for memory leak when changing queue/channel count via ethtool
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c62cd3c4
    • Sunil Goutham's avatar
      net: thunderx: Fix RQ_DROP miscalculation · 32c1b965
      Sunil Goutham authored
      With earlier configured value sufficient number of CQEs are not
      being reserved for transmitted packets. Hence under heavy incoming
      traffic load, receive notifications will take away most of the CQ
      thus transmit notifications will be lost resulting in tx skbs not
      being freed.
      
      Finally SQ will be full and it will be stopped, watchdog timer
      will kick in. After this fix receive notifications will not take
      morethan half of CQ reserving the rest for transmit notifications.
      
      Also changed CQ & SQ sizes from 16k to 4k.
      This is also due to the receive notifications taking first half of
      CQ under heavy load and time taken by NAPI to clear transmit notifications
      will increase with higher queue sizes. Again results in SQ being stopped.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32c1b965
    • Sunil Goutham's avatar
      net: thunderx: Fix memory leak while tearing down interface · 143ceb0b
      Sunil Goutham authored
      Fixed 'tso_hdrs' memory not being freed properly.
      Also fixed SQ skbuff maintenance issues.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      143ceb0b
    • Sunil Goutham's avatar
      net: thunderx: Fix data integrity issues with LDWB · 4b561c17
      Sunil Goutham authored
      Switching back to LDD transactions from LDWB.
      
      While transmitting packets out with LDWB transactions
      data integrity issues are seen very frequently.
      hence switching back to LDD.
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarRobert Richter <rrichter@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b561c17
    • Eric Dumazet's avatar
      ipv6: flush nd cache on IFF_NOARP change · c8507fb2
      Eric Dumazet authored
      This patch is the IPv6 equivalent of commit
      6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
      
      Without it, we keep buggy neighbours in the cache, with destination
      MAC address equal to our own MAC address.
      
      Tested:
       tcpdump -i eth0 -s 0 ip6 -n -e &
       ip link set dev eth0 arp off
       ping6 remote   // sends buggy frames
       ip link set dev eth0 arp on
       ping6 remote   // should work once kernel is patched
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMario Fanelli <mariofanelli@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8507fb2
    • David S. Miller's avatar
      Merge branch 'netcp-fixes' · b2428f94
      David S. Miller authored
      Murali Karicheri says:
      
      ====================
      net: netcp: bug fixes for dynamic module support
      
      This series fixes few bugs to allow keystone netcp modules to be
      dynamically loaded and removed. Currently it allows following
      sequence multiple times
      
       insmod cpsw_ale.ko
       insmod davinci_mdio.ko
       insmod keystone_netcp.ko
       insmod keystone_netcp_ethss.ko
       ifup eth0
       ifup eth1
       ping <hosts on eth0>
       ping <hosts on eth1>
       ifdown eth1
       ifdown eth0
       rmmod keystone_netcp_ethss.ko
       rmmod keystone_netcp.ko
       rmmod davinci_mdio.ko
       rmmod cpsw_ale.ko
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2428f94
    • Karicheri, Muralidharan's avatar
      net: netcp: ethss: cleanup gbe_probe() and gbe_remove() functions · 31a184b7
      Karicheri, Muralidharan authored
      This patch clean up error handle code to use goto label properly. In some
      cases, the code unnecessarily use goto instead of just returning the error
      code.  Code also make explicit calls to devm_* APIs on error which is
      not necessary. In the gbe_remove() also it makes similar calls which is
      also unnecessary.
      
      Also fix few checkpatch warnings
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31a184b7
    • Karicheri, Muralidharan's avatar
      net: netcp: ethss: fix up incorrect use of list api · c20afae7
      Karicheri, Muralidharan authored
      The code seems to assume a null is returned when the list is empty
      from first_sec_slave() to break the loop which is incorrect. Fix the
      code by using list_empty().
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c20afae7
    • Karicheri, Muralidharan's avatar
      net: netcp: fix cleanup interface list in netcp_remove() · 01a03099
      Karicheri, Muralidharan authored
      Currently if user do rmmod keystone_netcp.ko following warning is
      seen :-
      
      [   59.035891] ------------[ cut here ]------------
      [   59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
      netcp_core.c:2127 netcp_remove)
      
      This is because the interface list is not cleaned up in netcp_remove.
      This patch fixes this. Also fix some checkpatch related warnings.
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01a03099
    • Daniel Borkmann's avatar
      ebpf, x86: fix general protection fault when tail call is invoked · 2482abb9
      Daniel Borkmann authored
      With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
      the following general protection fault out of an eBPF program with a simple
      tail call, f.e. tracex5 (or a stripped down version of it):
      
        [  927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
        [...]
        [  927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
        [  927.102096] RIP: 0010:[<ffffffffa002440d>]  [<ffffffffa002440d>] 0xffffffffa002440d
        [  927.103390] RSP: 0018:ffff880016a67a68  EFLAGS: 00010006
        [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
        [  927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
        [  927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
        [  927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
        [  927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
        [  927.110787] FS:  00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
        [  927.112021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [  927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
        [  927.114500] Stack:
        [  927.115737]  ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
        [  927.117005]  0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
        [  927.118276]  00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
        [  927.119543] Call Trace:
        [  927.120797]  [<ffffffff8106c548>] ? lookup_address+0x28/0x30
        [  927.122058]  [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
        [  927.123314]  [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
        [  927.124562]  [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
        [  927.125806]  [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
        [  927.127033]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
        [  927.128254]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
        [  927.129461]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
        [  927.130654]  [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
        [  927.131837]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
        [  927.133015]  [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
        [  927.134195]  [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
        [  927.135367]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
        [  927.136523]  [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
        [  927.137666]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
        [  927.138802]  [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
        [  927.139934]  [<ffffffffa022b0d5>] 0xffffffffa022b0d5
        [  927.141066]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
        [  927.142199]  [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
        [  927.143323]  [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
        [  927.144450]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
        [  927.145572]  [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
        [  927.146666]  [<ffffffff817f9a9f>] tracesys+0xd/0x44
        [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
                             c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
                             0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
                             ff 4c
        [  927.150046] RIP  [<ffffffffa002440d>] 0xffffffffa002440d
      
      The code section with the instructions that traps points into the eBPF JIT
      image of the root program (the one invoking the tail call instruction).
      
      Using bpf_jit_disasm -o on the eBPF root program image:
      
        [...]
        4e:   mov    -0x204(%rbp),%eax
              8b 85 fc fd ff ff
        54:   cmp    $0x20,%eax               <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
              83 f8 20
        57:   ja     0x000000000000007a
              77 21
        59:   add    $0x1,%eax                <--- tail_call_cnt++
              83 c0 01
        5c:   mov    %eax,-0x204(%rbp)
              89 85 fc fd ff ff
        62:   lea    -0x80(%rsi,%rdx,8),%rax  <--- prog = array->prog[index]
              48 8d 44 d6 80
        67:   mov    (%rax),%rax
              48 8b 00
        6a:   cmp    $0x0,%rax                <--- check for NULL
              48 83 f8 00
        6e:   je     0x000000000000007a
              74 0a
        70:   mov    0x20(%rax),%rax          <--- GPF triggered here! fetch of bpf_func
              48 8b 40 20                              [ matches <48> 8b 40 20 ... from above ]
        74:   add    $0x33,%rax               <--- prologue skip of new prog
              48 83 c0 33
        78:   jmpq   *%rax                    <--- jump to new prog insns
              ff e0
        [...]
      
      The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
      jump to map slot 0 is pointing to a poisoned page. The issue is the following:
      
      lea instruction has a wrong offset, i.e. it should be ...
      
        lea    0x80(%rsi,%rdx,8),%rax
      
      ... but it actually seems to be ...
      
        lea   -0x80(%rsi,%rdx,8),%rax
      
      ... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
      to be positive instead of negative. Disassembling the interpreter, we btw
      similarly do:
      
        [...]
        c88:  lea     0x80(%rax,%rdx,8),%rax  <--- prog = array->prog[index]
              48 8d 84 d0 80 00 00 00
        c90:  add     $0x1,%r13d
              41 83 c5 01
        c94:  mov     (%rax),%rax
              48 8b 00
        [...]
      
      Now the other interesting fact is that this panic triggers only when things
      like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
      prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
      Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
      case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
      members).
      
      Changing the emitter to always use the 4 byte displacement in the lea
      instruction fixes the panic on my side. It increases the tail call instruction
      emission by 3 more byte, but it should cover us from various combinations
      (and perhaps other future increases on related structures).
      
      After patch, disassembly:
      
        [...]
        9e:   lea    0x80(%rsi,%rdx,8),%rax   <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
              48 8d 84 d6 80 00 00 00
        a6:   mov    (%rax),%rax
              48 8b 00
        [...]
      
        [...]
        9e:   lea    0x50(%rsi,%rdx,8),%rax   <--- No CONFIG_LOCKDEP
              48 8d 84 d6 50 00 00 00
        a6:   mov    (%rax),%rax
              48 8b 00
        [...]
      
      Fixes: b52f00e6 ("x86: bpf_jit: implement bpf_tail_call() helper")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2482abb9
  2. 29 Jul, 2015 7 commits
    • Nikolay Aleksandrov's avatar
      bridge: mdb: fix delmdb state in the notification · 7ae90a4f
      Nikolay Aleksandrov authored
      Since mdb states were introduced when deleting an entry the state was
      left as it was set in the delete request from the user which leads to
      the following output when doing a monitor (for example):
      $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
      (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
      $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
      (monitor) dev br0 port eth3 grp 239.0.0.1 temp
      ^^^
      Note the "temp" state in the delete notification which is wrong since
      the entry was permanent, the state in a delete is always reported as
      "temp" regardless of the real state of the entry.
      
      After this patch:
      $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
      (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
      $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
      (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
      
      There's one important note to make here that the state is actually not
      matched when doing a delete, so one can delete a permanent entry by
      stating "temp" in the end of the command, I've chosen this fix in order
      not to break user-space tools which rely on this (incorrect) behaviour.
      
      So to give an example after this patch and using the wrong state:
      $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
      (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
      $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
      (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
      
      Note the state of the entry that got deleted is correct in the
      notification.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ae90a4f
    • Satish Ashok's avatar
      bridge: mcast: give fast leave precedence over multicast router and querier · 544586f7
      Satish Ashok authored
      When fast leave is configured on a bridge port and an IGMP leave is
      received for a group, the group is not deleted immediately if there is
      a router detected or if multicast querier is configured.
      Ideally the group should be deleted immediately when fast leave is
      configured.
      Signed-off-by: default avatarSatish Ashok <sashok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      544586f7
    • Toshiaki Makita's avatar
      bridge: Fix network header pointer for vlan tagged packets · df356d5e
      Toshiaki Makita authored
      There are several devices that can receive vlan tagged packets with
      CHECKSUM_PARTIAL like tap, possibly veth and xennet.
      When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
      by bridge to a device with the IP_CSUM feature, they end up with checksum
      error because before entering bridge, the network header is set to
      ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
      get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.
      
      Since the network header is exepected to be ETH_HLEN in flow-dissection
      and hash-calculation in RPS in rx path, and since the header pointer fix
      is needed only in tx path, set the appropriate network header on forwarding
      packets.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df356d5e
    • Alexander Drozdov's avatar
      packet: tpacket_snd(): fix signed/unsigned comparison · dbd46ab4
      Alexander Drozdov authored
      tpacket_fill_skb() can return a negative value (-errno) which
      is stored in tp_len variable. In that case the following
      condition will be (but shouldn't be) true:
      
      tp_len > dev->mtu + dev->hard_header_len
      
      as dev->mtu and dev->hard_header_len are both unsigned.
      
      That may lead to just returning an incorrect EMSGSIZE errno
      to the user.
      
      Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbd46ab4
    • Eric Dumazet's avatar
      arp: filter NOARP neighbours for SIOCGARP · 11c91ef9
      Eric Dumazet authored
      When arp is off on a device, and ioctl(SIOCGARP) is queried,
      a buggy answer is given with MAC address of the device, instead
      of the mac address of the destination/gateway.
      
      We filter out NUD_NOARP neighbours for /proc/net/arp,
      we must do the same for SIOCGARP ioctl.
      
      Tested:
      
      lpaa23:~# ./arp 10.246.7.190
      MAC=00:01:e8:22:cb:1d      // correct answer
      
      lpaa23:~# ip link set dev eth0 arp off
      lpaa23:~# cat /proc/net/arp   # check arp table is now 'empty'
      IP address       HW type     Flags       HW address    Mask     Device
      lpaa23:~# ./arp 10.246.7.190
      MAC=00:1a:11:c3:0d:7f   // buggy answer before patch (this is eth0 mac)
      
      After patch :
      
      lpaa23:~# ip link set dev eth0 arp off
      lpaa23:~# ./arp 10.246.7.190
      ioctl(SIOCGARP) failed: No such device or address
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarVytautas Valancius <valas@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11c91ef9
    • David Ward's avatar
      net/ipv4: suppress NETDEV_UP notification on address lifetime update · 865b8042
      David Ward authored
      This notification causes the FIB to be updated, which is not needed
      because the address already exists, and more importantly it may undo
      intentional changes that were made to the FIB after the address was
      originally added. (As a point of comparison, when an address becomes
      deprecated because its preferred lifetime expired, a notification on
      this chain is not generated.)
      
      The motivation for this commit is fixing an incompatibility between
      DHCP clients which set and update the address lifetime according to
      the lease, and a commercial VPN client which replaces kernel routes
      in a way that outbound traffic is sent only through the tunnel (and
      disconnects if any further route changes are detected via netlink).
      Signed-off-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      865b8042
    • Nikolay Aleksandrov's avatar
      bridge: stp: when using userspace stp stop kernel hello and hold timers · 76b91c32
      Nikolay Aleksandrov authored
      These should be handled only by the respective STP which is in control.
      They become problematic for devices with limited resources with many
      ports because the hold_timer is per port and fires each second and the
      hello timer fires each 2 seconds even though it's global. While in
      user-space STP mode these timers are completely unnecessary so it's better
      to keep them off.
      Also ensure that when the bridge is up these timers are started only when
      running with kernel STP.
      Signed-off-by: default avatarSatish Ashok <sashok@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76b91c32
  3. 27 Jul, 2015 20 commits