1. 18 Mar, 2015 1 commit
    • Daniel Borkmann's avatar
      act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome · ced585c8
      Daniel Borkmann authored
      Revisiting commit d23b8ad8 ("tc: add BPF based action") with regards
      to eBPF support, I was thinking that it might be better to improve
      return semantics from a BPF program invoked through BPF_PROG_RUN().
      
      Currently, in case filter_res is 0, we overwrite the default action
      opcode with TC_ACT_SHOT. A default action opcode configured through tc's
      m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC,
      TC_ACT_OK.
      
      In cls_bpf, we have the possibility to overwrite the default class
      associated with the classifier in case filter_res is _not_ 0xffffffff
      (-1).
      
      That allows us to fold multiple [e]BPF programs into a single one, where
      they would otherwise need to be defined as a separate classifier with
      its own classid, needlessly redoing parsing work, etc.
      
      Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes
      are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode
      mapping, where we would allow above possibilities. Thus, like in cls_bpf,
      a filter_res of 0xffffffff (-1) means that the configured _default_ action
      is used. Any unkown return code from the BPF program would fail in
      tcf_bpf() with TC_ACT_UNSPEC.
      
      Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED,
      which both have the same semantics, we have the option to either use
      that as a default action (filter_res of 0xffffffff) or non-default BPF
      return code.
      
      All that will allow us to transparently use tcf_bpf() for both BPF
      flavours.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ced585c8
  2. 17 Mar, 2015 4 commits
    • Robert Jarzmik's avatar
      Revert "smc91x: retrieve IRQ and trigger flags in a modern way" · 8d7d9cca
      Robert Jarzmik authored
      The commit breaks the legacy platforms, ie. these not using device-tree,
      and setting up the interrupt resources with a flag to activate edge
      detection. The issue was found on the zylonite platform.
      
      The reason is that zylonite uses platform resources to pass the interrupt number
      and the irq flags (here IORESOURCE_IRQ_HIGHEDGE). It expects the driver to
      request the irq with these flags, which in turn setups the irq as high edge
      triggered.
      
      After the patch, this was supposed to be taken care of with :
        irq_resflags = irqd_get_trigger_type(irq_get_irq_data(ndev->irq));
      
      But irq_resflags is 0 for legacy platforms, while for example in
      arch/arm/mach-pxa/zylonite.c, in struct resource smc91x_resources[] the
      irq flag is specified. This breaks zylonite because the interrupt is not
      setup as triggered, and hardware doesn't provide interrupts.
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d7d9cca
    • Eric Dumazet's avatar
      inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() · cb7cf8a3
      Eric Dumazet authored
      I got the following trace with current net-next kernel :
      
      [14723.885290] WARNING: CPU: 26 PID: 22658 at kernel/sched/core.c:7285 __might_sleep+0x89/0xa0()
      [14723.885325] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810e8734>] prepare_to_wait_exclusive+0x34/0xa0
      [14723.885355] CPU: 26 PID: 22658 Comm: netserver Not tainted 4.0.0-dbg-DEV #1379
      [14723.885359]  ffffffff81a223a8 ffff881fae9e7ca8 ffffffff81650b5d 0000000000000001
      [14723.885364]  ffff881fae9e7cf8 ffff881fae9e7ce8 ffffffff810a72e7 0000000000000000
      [14723.885367]  ffffffff81a57620 000000000000093a 0000000000000000 ffff881fae9e7e64
      [14723.885371] Call Trace:
      [14723.885377]  [<ffffffff81650b5d>] dump_stack+0x4c/0x65
      [14723.885382]  [<ffffffff810a72e7>] warn_slowpath_common+0x97/0xe0
      [14723.885386]  [<ffffffff810a73e6>] warn_slowpath_fmt+0x46/0x50
      [14723.885390]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885393]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885396]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885399]  [<ffffffff810ccdc9>] __might_sleep+0x89/0xa0
      [14723.885403]  [<ffffffff81581846>] lock_sock_nested+0x36/0xb0
      [14723.885406]  [<ffffffff815829a3>] ? release_sock+0x173/0x1c0
      [14723.885411]  [<ffffffff815ea1f7>] inet_csk_accept+0x157/0x2a0
      [14723.885415]  [<ffffffff810e8900>] ? abort_exclusive_wait+0xc0/0xc0
      [14723.885419]  [<ffffffff8161b96d>] inet_accept+0x2d/0x150
      [14723.885424]  [<ffffffff8157db6f>] SYSC_accept4+0xff/0x210
      [14723.885428]  [<ffffffff8165a451>] ? retint_swapgs+0xe/0x44
      [14723.885431]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885437]  [<ffffffff81369c0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [14723.885441]  [<ffffffff8157ef40>] SyS_accept+0x10/0x20
      [14723.885444]  [<ffffffff81659872>] system_call_fastpath+0x12/0x17
      [14723.885447] ---[ end trace ff74cd83355b1873 ]---
      
      In commit 26cabd31
      Peter added a sched_annotate_sleep() in sk_wait_event()
      
      Is the following patch needed as well ?
      
      Alternative would be to use sk_wait_event() from inet_csk_wait_for_connect()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb7cf8a3
    • Nicolas Dichtel's avatar
      ip6_tunnel: fix error code when tunnel exists · 37355565
      Nicolas Dichtel authored
      After commit 2b0bb01b, the kernel returns -ENOBUFS when user tries to add
      an existing tunnel with ioctl API:
      $ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
      add tunnel "ip6tnl0" failed: No buffer space available
      
      It's confusing, the right error is EEXIST.
      
      This patch also change a bit the code returned:
       - ENOBUFS -> ENOMEM
       - ENOENT -> ENODEV
      
      Fixes: 2b0bb01b ("ip6_tunnel: Return an error when adding an existing tunnel.")
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: default avatarPierre Cheynier <me@pierre-cheynier.net>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37355565
    • Nicolas Dichtel's avatar
      netdevice.h: fix ndo_bridge_* comments · ad41faa8
      Nicolas Dichtel authored
      The argument 'flags' was missing in ndo_bridge_setlink().
      ndo_bridge_dellink() was missing.
      
      Fixes: 407af329 ("bridge: Add netlink interface to configure vlans on bridge ports")
      Fixes: add511b3 ("bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink")
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad41faa8
  3. 16 Mar, 2015 10 commits
  4. 14 Mar, 2015 4 commits
  5. 13 Mar, 2015 4 commits
  6. 12 Mar, 2015 6 commits
    • Jason Wang's avatar
      virtio-net: correctly delete napi hash · ab3971b1
      Jason Wang authored
      We don't delete napi from hash list during module exit. This will
      cause the following panic when doing module load and unload:
      
      BUG: unable to handle kernel paging request at 0000004e00000075
      IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0
      PGD 3c5d5067 PUD 0
      Oops: 0000 [#1] SMP
      ...
      Call Trace:
      [<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net]
      [<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net]
      [<ffffffff8139e667>] virtio_dev_probe+0x137/0x200
      [<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250
      [<ffffffff814c81d3>] __driver_attach+0x93/0xa0
      [<ffffffff814c8140>] ? __device_attach+0x40/0x40
      [<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0
      [<ffffffff814c7a79>] driver_attach+0x19/0x20
      [<ffffffff814c76f0>] bus_add_driver+0x170/0x220
      [<ffffffffa0a60000>] ? 0xffffffffa0a60000
      [<ffffffff814c894f>] driver_register+0x5f/0xf0
      [<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30
      [<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net]
      
      This patch fixes this by doing this in virtnet_free_queues(). And also
      don't delete napi in virtnet_freeze() since it will call
      virtnet_free_queues() which has already did this.
      
      Fixes 91815639 ("virtio-net: rx busy polling support")
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab3971b1
    • Arnd Bergmann's avatar
      rds: avoid potential stack overflow · f862e07c
      Arnd Bergmann authored
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f862e07c
    • Willem de Bruijn's avatar
      sock: fix possible NULL sk dereference in __skb_tstamp_tx · 3a8dd971
      Willem de Bruijn authored
      Test that sk != NULL before reading sk->sk_tsflags.
      
      Fixes: 49ca0d8b ("net-timestamp: no-payload option")
      Reported-by: default avatarOne Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a8dd971
    • Eric Dumazet's avatar
      xps: must clear sender_cpu before forwarding · c29390c6
      Eric Dumazet authored
      John reported that my previous commit added a regression
      on his router.
      
      This is because sender_cpu & napi_id share a common location,
      so get_xps_queue() can see garbage and perform an out of bound access.
      
      We need to make sure sender_cpu is cleared before doing the transmit,
      otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
      this bug.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJohn <jw@nuclearfallout.net>
      Bisected-by: default avatarJohn <jw@nuclearfallout.net>
      Fixes: 2bd82484 ("xps: fix xps for stacked devices")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29390c6
    • David Vrabel's avatar
      xen-netback: notify immediately after pushing Tx response. · c8a4d299
      David Vrabel authored
      This fixes a performance regression introduced by
      7fbb9d84 (xen-netback: release pending
      index before pushing Tx responses)
      
      Moving the notify outside of the spin locks means it can be delayed a
      long time (if the dealloc thread is descheduled or there is an
      interrupt or softirq).
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarZoltan Kiss <zoltan.kiss@linaro.org>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8a4d299
    • Alexey Kodanev's avatar
      net: sysctl_net_core: check SNDBUF and RCVBUF for min length · b1cb59cf
      Alexey Kodanev authored
      sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
      set to incorrect values. Given that 'struct sk_buff' allocates from
      rcvbuf, incorrectly set buffer length could result to memory
      allocation failures. For example, set them as follows:
      
          # sysctl net.core.rmem_default=64
            net.core.wmem_default = 64
          # sysctl net.core.wmem_default=64
            net.core.wmem_default = 64
          # ping localhost -s 1024 -i 0 > /dev/null
      
      This could result to the following failure:
      
      skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
      head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
      kernel BUG at net/core/skbuff.c:102!
      invalid opcode: 0000 [#1] SMP
      ...
      task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
      RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
      RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
      RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
      RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
      FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
      Stack:
       ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
       ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
       ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
      Call Trace:
       [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
       [<ffffffff81556f56>] sock_write_iter+0x146/0x160
       [<ffffffff811d9612>] new_sync_write+0x92/0xd0
       [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
       [<ffffffff811da499>] SyS_write+0x59/0xd0
       [<ffffffff81651532>] system_call_fastpath+0x12/0x17
      Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
            00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
            eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
      RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP <ffff88003ae8bc68>
      Kernel panic - not syncing: Fatal exception
      
      Moreover, the possible minimum is 1, so we can get another kernel panic:
      ...
      BUG: unable to handle kernel paging request at ffff88013caee5c0
      IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
      ...
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1cb59cf
  7. 11 Mar, 2015 4 commits
  8. 10 Mar, 2015 7 commits