1. 07 Sep, 2014 1 commit
  2. 03 Sep, 2014 4 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink: deliver netlink errors on batch completion · cbb8125e
      Pablo Neira Ayuso authored
      We have to wait until the full batch has been processed to deliver the
      netlink error messages to userspace. Otherwise, we may deliver
      duplicated errors to userspace in case that we need to abort and replay
      the transaction if any of the required modules needs to be autoloaded.
      
      A simple way to reproduce this (assumming nft_meta is not loaded) with
      the following test file:
      
       add table filter
       add chain filter test
       add chain bad test                 # intentional wrong unexistent table
       add rule filter test meta mark 0
      
      Then, when trying to load the batch:
      
       # nft -f test
       test:4:1-19: Error: Could not process rule: No such file or directory
       add chain bad test
       ^^^^^^^^^^^^^^^^^^^
       test:4:1-19: Error: Could not process rule: No such file or directory
       add chain bad test
       ^^^^^^^^^^^^^^^^^^^
      
      The error is reported twice, once when the batch is aborted due to
      missing nft_meta and another when it is fully processed.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cbb8125e
    • Pablo Neira Ayuso's avatar
      rhashtable: fix lockdep splat in rhashtable_destroy() · ae82ddcf
      Pablo Neira Ayuso authored
      No need for rht_dereference() from rhashtable_destroy() since the
      existing callers don't hold the mutex when invoking this function
      from:
      
      1) Netlink, this is called in case of memory allocation errors in the
         initialization path, no nl_sk_hash_lock is held.
      2) Netfilter, this is called from the rcu callback, no nfnl_lock is
         held either.
      
      I think it's reasonable to assume that the caller has to make sure
      that no hash resizing may happen before releasing the bucket array.
      Therefore, the caller should be responsible for releasing this in a
      safe way, document this to make people aware of it.
      
      This resolves a rcu lockdep splat in nft_hash:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.16.0+ #178 Not tainted
      -------------------------------
      lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 1
      1 lock held by ksoftirqd/2/18:
       #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7
      
      stack backtrace:
      CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
      Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
       0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
       ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
       0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
      Call Trace:
       [<ffffffff8143debc>] dump_stack+0x4e/0x68
       [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
       [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
       [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      ae82ddcf
    • Pablo Neira Ayuso's avatar
      netfilter: nft_rbtree: no need for spinlock from set destroy path · d99407f4
      Pablo Neira Ayuso authored
      The sets are released from the rcu callback, after the rule is removed
      from the chain list, which implies that nfnetlink cannot update the
      rbtree and no packets are walking on the set anymore. Thus, we can get
      rid of the spinlock in the set destroy path there.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewied-by: default avatarThomas Graf <tgraf@suug.ch>
      d99407f4
    • Pablo Neira Ayuso's avatar
      netfilter: nft_hash: no need for rcu in the hash set destroy path · 39f39016
      Pablo Neira Ayuso authored
      The sets are released from the rcu callback, after the rule is removed
      from the chain list, which implies that nfnetlink cannot update the
      hashes (thus, no resizing may occur) and no packets are walking on the
      set anymore.
      
      This resolves a lockdep splat in the nft_hash_destroy() path since the
      nfnl mutex is not held there.
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.16.0-rc2+ #168 Not tainted
      -------------------------------
      net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 1
      1 lock held by ksoftirqd/0/3:
       #0:  (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7
      
      stack backtrace:
      CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168
      Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
       0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006
       ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400
       ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08
      Call Trace:
       [<ffffffff8142c922>] dump_stack+0x4e/0x68
       [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103
       [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash]
       [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      39f39016
  3. 02 Sep, 2014 26 commits
  4. 01 Sep, 2014 6 commits
  5. 30 Aug, 2014 3 commits
    • Daniel Borkmann's avatar
      net: sctp: fix ABI mismatch through sctp_assoc_to_state helper · 38ab1fa9
      Daniel Borkmann authored
      Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
      tree, the official <netinet/sctp.h> header carries a copy of enum
      sctp_sstat_state that looks like (compared to the current in-kernel
      enumeration):
      
        User definition:                     Kernel definition:
      
        enum sctp_sstat_state {              typedef enum {
          SCTP_EMPTY             = 0,          <removed>
          SCTP_CLOSED            = 1,          SCTP_STATE_CLOSED            = 0,
          SCTP_COOKIE_WAIT       = 2,          SCTP_STATE_COOKIE_WAIT       = 1,
          SCTP_COOKIE_ECHOED     = 3,          SCTP_STATE_COOKIE_ECHOED     = 2,
          SCTP_ESTABLISHED       = 4,          SCTP_STATE_ESTABLISHED       = 3,
          SCTP_SHUTDOWN_PENDING  = 5,          SCTP_STATE_SHUTDOWN_PENDING  = 4,
          SCTP_SHUTDOWN_SENT     = 6,          SCTP_STATE_SHUTDOWN_SENT     = 5,
          SCTP_SHUTDOWN_RECEIVED = 7,          SCTP_STATE_SHUTDOWN_RECEIVED = 6,
          SCTP_SHUTDOWN_ACK_SENT = 8,          SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
        };                                   } sctp_state_t;
      
      This header was later on also placed into the uapi, so that user space
      programs can compile without having <netinet/sctp.h>, but the shipped
      with <linux/sctp.h> instead.
      
      While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
      sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
      nevertheless have a what it appears to be dummy SCTP_EMPTY state from
      the very early days.
      
      While it seems to do just nothing, commit 0b8f9e25 ("sctp: remove
      completely unsed EMPTY state") did the right thing and removed this dead
      code. That however, causes an off-by-one when the user asks the SCTP
      stack via SCTP_STATUS API and checks for the current socket state thus
      yielding possibly undefined behaviour in applications as they expect
      the kernel to tell the right thing.
      
      The enumeration had to be changed however as based on the current socket
      state, we access a function pointer lookup-table through this. Therefore,
      I think the best way to deal with this is just to add a helper function
      sctp_assoc_to_state() to encapsulate the off-by-one quirk.
      Reported-by: default avatarTristan Su <sooqing@gmail.com>
      Fixes: 0b8f9e25 ("sctp: remove completely unsed EMPTY state")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38ab1fa9
    • Eric Dumazet's avatar
      net: attempt a single high order allocation · d9b2938a
      Eric Dumazet authored
      In commit ed98df33 ("net: use __GFP_NORETRY for high order
      allocations") we tried to address one issue caused by order-3
      allocations.
      
      We still observe high latencies and system overhead in situations where
      compaction is not successful.
      
      Instead of trying order-3, order-2, and order-1, do a single order-3
      best effort and immediately fallback to plain order-0.
      
      This mimics slub strategy to fallback to slab min order if the high
      order allocation used for performance failed.
      
      Order-3 allocations give a performance boost only if they can be done
      without recurring and expensive memory scan.
      
      Quoting David :
      
      The page allocator relies on synchronous (sync light) memory compaction
      after direct reclaim for allocations that don't retry and deferred
      compaction doesn't work with this strategy because the allocation order
      is always decreasing from the previous failed attempt.
      
      This means sync light compaction will always be encountered if memory
      cannot be defragmented or reclaimed several times during the
      skb_page_frag_refill() iteration.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b2938a
    • David S. Miller's avatar
      Merge branch 'mlx4-net' · bcc73547
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Setup mlx4 user space Ethernet QPs to properly handle VXLAN
      
      This short series fixes the mlx4 driver setting of user space Ethernet QPs
      (e.g those opened by DPDK applications) such that they will properly handle
      VXLAN traffic/offloads
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcc73547