1. 17 Mar, 2014 11 commits
    • Eric W. Biederman's avatar
      netpoll: Move netpoll_trap under CONFIG_NETPOLL_TRAP · ad8d4752
      Eric W. Biederman authored
      Now that we no longer need to receive packets to safely drain the
      network drivers receive queue move netpoll_trap and netpoll_set_trap
      under CONFIG_NETPOLL_TRAP
      
      Making netpoll_trap and netpoll_set_trap noop inline functions
      when CONFIG_NETPOLL_TRAP is not set.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad8d4752
    • Eric W. Biederman's avatar
      netpoll: Don't drop all received packets. · b6bacd55
      Eric W. Biederman authored
      Change the strategy of netpoll from dropping all packets received
      during netpoll_poll_dev to calling napi poll with a budget of 0
      (to avoid processing drivers rx queue), and to ignore packets received
      with netif_rx (those will safely be placed on the backlog queue).
      
      All of the netpoll supporting drivers have been reviewed to ensure
      either thay use netif_rx or that a budget of 0 is supported by their
      napi poll routine and that a budget of 0 will not process the drivers
      rx queues.
      
      Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed.
      
      npinfo->rx_flags is removed  as rx_flags with just the NETPOLL_RX_ENABLED
      flag becomes just a redundant mirror of list_empty(&npinfo->rx_np).
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6bacd55
    • Eric W. Biederman's avatar
      netpoll: Add netpoll_rx_processing · ff607631
      Eric W. Biederman authored
      Add a helper netpoll_rx_processing that reports when netpoll has
      receive side processing to perform.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff607631
    • Eric W. Biederman's avatar
      netpoll: Warn if more packets are processed than are budgeted · e97dc3fc
      Eric W. Biederman authored
      There is already a warning for this case in the normal netpoll path,
      but put a copy here in case how netpoll calls the poll functions
      causes a differenet result.
      
      netpoll will shortly call the napi poll routine with a budget 0 to
      avoid any rx packets being processed.  As nothing does that today
      we may encounter drivers that have problems so a netpoll specific
      warning seems desirable.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e97dc3fc
    • Eric W. Biederman's avatar
      netpoll: Visit all napi handlers in poll_napi · eb8143b4
      Eric W. Biederman authored
      In poll_napi loop through all of the napi handlers even when the
      budget falls to 0 to ensure that we process all of the tx_queues, and
      so that we continue to call into drivers when our initial budget is 0.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb8143b4
    • Eric W. Biederman's avatar
      netpoll: Pass budget into poll_napi · 9852fbec
      Eric W. Biederman authored
      This moves the control logic to the top level in netpoll_poll_dev
      instead of having it dispersed throughout netpoll_poll_dev,
      poll_napi and poll_one_napi.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9852fbec
    • Eric W. Biederman's avatar
      netpoll: move setting of NETPOLL_RX_DROP into netpoll_poll_dev · b249b51b
      Eric W. Biederman authored
      Today netpoll depends on setting NETPOLL_RX_DROP before networking
      drivers receive packets in interrupt context so that the packets can
      be dropped.  Move this setting into netpoll_poll_dev from
      poll_one_napi so that if ndo_poll_controller happens to receive
      packets we will drop the packets on the floor instead of letting the
      packets bounce through the networking stack and potentially cause problems.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b249b51b
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · e86e180b
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for net-next,
      most relevantly they are:
      
      * cleanup to remove double semicolon from stephen hemminger.
      
      * calm down sparse warning in xt_ipcomp, from Fan Du.
      
      * nf_ct_labels support for nf_tables, from Florian Westphal.
      
      * new macros to simplify rcu dereferences in the scope of nfnetlink
        and nf_tables, from Patrick McHardy.
      
      * Accept queue and drop (including reason for drop) to verdict
        parsing in nf_tables, also from Patrick.
      
      * Remove unused random seed initialization in nfnetlink_log, from
        Florian Westphal.
      
      * Allow to attach user-specific information to nf_tables rules, useful
        to attach user comments to rule, from me.
      
      * Return errors in ipset according to the manpage documentation, from
        Jozsef Kadlecsik.
      
      * Fix coccinelle warnings related to incorrect bool type usage for ipset,
        from Fengguang Wu.
      
      * Add hash:ip,mark set type to ipset, from Vytas Dauksa.
      
      * Fix message for each spotted by ipset for each netns that is created,
        from Ilia Mirkin.
      
      * Add forceadd option to ipset, which evicts a random entry from the set
        if it becomes full, from Josh Hunt.
      
      * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.
      
      * Improve conntrack scalability by removing a central spinlock, original
        work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
        remaining issues. Several patches to prepare this change come in first
        place.
      
      * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
        on element removal, etc. from Patrick McHardy.
      
      * Restore context in the rule deletion path, as we now release rule objects
        synchronously, from Patrick McHardy. This gets back event notification for
        anonymous sets.
      
      * Fix NAT family validation in nft_nat, also from Patrick.
      
      * Improve scalability of xt_connlimit by using an array of spinlocks and
        by introducing a rb-tree of hashtables for faster lookup of accounted
        objects per network. This patch was preceded by several patches and
        refactorizations to accomodate this change including the use of kmem_cache,
        from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e86e180b
    • Florian Westphal's avatar
      netfilter: connlimit: use rbtree for per-host conntrack obj storage · 7d084877
      Florian Westphal authored
      With current match design every invocation of the connlimit_match
      function means we have to perform (number_of_conntracks % 256) lookups
      in the conntrack table [ to perform GC/delete stale entries ].
      This is also the reason why ____nf_conntrack_find() in perf top has
      > 20% cpu time per core.
      
      This patch changes the storage to rbtree which cuts down the number of
      ct objects that need testing.
      
      When looking up a new tuple, we only test the connections of the host
      objects we visit while searching for the wanted host/network (or
      the leaf we need to insert at).
      
      The slot count is reduced to 32.  Increasing slot count doesn't
      speed up things much because of rbtree nature.
      
      before patch (50kpps rx, 10kpps tx):
      +  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
      +  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
      +  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
      +   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
      +   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
      +   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw
      
      after (90kpps, 51kpps tx):
      +  17.24%       swapper  [nf_conntrack]    [k] ____nf_conntrack_find
      +   6.60%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
      +   2.73%       swapper  [nf_conntrack]    [k] hash_conntrack_raw
      +   2.36%       swapper  [xt_connlimit]    [k] count_tree
      
      Obvious disadvantages to previous version are the increase in code
      complexity and the increased memory cost.
      
      Partially based on Eric Dumazets fq scheduler.
      Reviewed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d084877
    • Florian Westphal's avatar
      netfilter: connlimit: make same_source_net signed · 50e0e9b1
      Florian Westphal authored
      currently returns 1 if they're the same.  Make it work like mem/strcmp
      so it can be used as rbtree search function.
      Reviewed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      50e0e9b1
    • Florian Westphal's avatar
      netfilter: connlimit: use keyed locks · 1442e750
      Florian Westphal authored
      connlimit currently suffers from spinlock contention, example for
      4-core system with rps enabled:
      
      +  20.84%   ksoftirqd/2  [kernel.kallsyms] [k] _raw_spin_lock_bh
      +  20.76%   ksoftirqd/1  [kernel.kallsyms] [k] _raw_spin_lock_bh
      +  20.42%   ksoftirqd/0  [kernel.kallsyms] [k] _raw_spin_lock_bh
      +   6.07%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
      +   6.07%   ksoftirqd/1  [nf_conntrack]    [k] ____nf_conntrack_find
      +   5.97%   ksoftirqd/0  [nf_conntrack]    [k] ____nf_conntrack_find
      +   2.47%   ksoftirqd/2  [nf_conntrack]    [k] hash_conntrack_raw
      +   2.45%   ksoftirqd/0  [nf_conntrack]    [k] hash_conntrack_raw
      +   2.44%   ksoftirqd/1  [nf_conntrack]    [k] hash_conntrack_raw
      
      May allow parallel lookup/insert/delete if the entry is hashed to
      another slot.  With patch:
      
      +  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
      +  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
      +  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
      +   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
      +   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
      +   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw
      +   2.00%  ksoftirqd/1  [kernel.kallsyms] [k] __rcu_read_unlock
      
      Improved rx processing rate from ~35kpps to ~50 kpps.
      Reviewed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1442e750
  2. 15 Mar, 2014 29 commits