1. 03 Feb, 2010 5 commits
    • Patrick McHardy's avatar
      netfilter: ctnetlink: support selective event delivery · 0cebe4b4
      Patrick McHardy authored
      Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
      and use them to filter events. Their default value is "all events" when the
      event sysctl is on and "no events" when it is off. A following patch will add
      specific initializations. Expectation events depend on the ecache struct of
      their master conntrack.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      0cebe4b4
    • Patrick McHardy's avatar
      netfilter: nf_conntrack: split up IPCT_STATUS event · 858b3133
      Patrick McHardy authored
      Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
      when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
      generated when the IPS_ASSURED bit is set.
      
      In combination with a following patch to support selective event delivery,
      this can be used for "sparse" conntrack replication: start replicating the
      conntrack entry after it reached the ASSURED state and that way it's SYN-flood
      resistant.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      858b3133
    • Patrick McHardy's avatar
      add67461
    • Patrick McHardy's avatar
      netfilter: ctnetlink: only assign helpers for matching protocols · 794e6871
      Patrick McHardy authored
      Make sure not to assign a helper for a different network or transport
      layer protocol to a connection.
      
      Additionally change expectation deletion by helper to compare the name
      directly - there might be multiple helper registrations using the same
      name, currently one of them is chosen in an unpredictable manner and
      only those expectations are removed.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      794e6871
    • Patrick McHardy's avatar
      netfilter: xt_hashlimit: fix race condition and simplify locking · 2eff25c1
      Patrick McHardy authored
      As noticed by Shin Hong <hongshin@gmail.com>, there is a race between
      htable_find_get() and htable_put():
      
      htable_put():				htable_find_get():
      
      					spin_lock_bh(&hashlimit_lock);
      					<search entry>
      atomic_dec_and_test(&hinfo->use)
      					atomic_inc(&hinfo->use)
      					spin_unlock_bh(&hashlimit_lock)
      					return hinfo;
      spin_lock_bh(&hashlimit_lock);
      hlist_del(&hinfo->node);
      spin_unlock_bh(&hashlimit_lock);
      htable_destroy(hinfo);
      
      The entire locking concept is overly complicated, tables are only
      created/referenced and released in process context, so a single
      mutex works just fine. Remove the hashinfo_spinlock and atomic
      reference count and use the mutex to protect table lookups/creation
      and reference count changes.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      2eff25c1
  2. 02 Feb, 2010 2 commits
  3. 22 Jan, 2010 1 commit
  4. 20 Jan, 2010 2 commits
  5. 18 Jan, 2010 8 commits
  6. 13 Jan, 2010 2 commits
  7. 11 Jan, 2010 3 commits
  8. 05 Jan, 2010 1 commit
    • Catalin(ux) M. BOIE's avatar
      IPVS: Allow boot time change of hash size · 6f7edb48
      Catalin(ux) M. BOIE authored
      I was very frustrated about the fact that I have to recompile the kernel
      to change the hash size. So, I created this patch.
      
      If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
      command line, or, if you built IPVS as modules, you can add
      options ip_vs conn_tab_bits=??.
      
      To keep everything backward compatible, you still can select the size at
      compile time, and that will be used as default.
      
      It has been about a year since this patch was originally posted
      and subsequently dropped on the basis of insufficient test data.
      
      Mark Bergsma has provided the following test results which seem
      to strongly support the need for larger hash table sizes:
      
      We do however run into the same problem with the default setting (212 =
      4096 entries), as most of our LVS balancers handle around a million
      connections/SLAB entries at any point in time (around 100-150 kpps
      load). With only 4096 hash table entries this implies that each entry
      consists of a linked list of 256 connections *on average*.
      
      To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
      with both the default 4096 table size, and the same kernel recompiled
      with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
      quick test setup with a part of Wikimedia/Wikipedia's live traffic
      mirrored by the switch to the test host.
      
      With the default setting, at ~ 120 kpps packet load we saw a typical %si
      CPU usage of around 30-35%, and oprofile reported a hot spot in
      ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
      302577    7.4554  bnx2                     bnx2          /bnx2
      181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
      128636    3.1695  vmlinux                  vmlinux       ip_route_input
      74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
      68482     1.6874  vmlinux                  vmlinux       mwait_idle
      
      After loading the recompiled kernel with 218 entries, %si CPU usage
      dropped in half to around 12-18%, and oprofile looks much healthier,
      with only 7% spent in ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      265641   14.4616  bnx2                     bnx2         /bnx2
      143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
      140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
      94364     5.1372  vmlinux                  vmlinux      mwait_idle
      86267     4.6964  vmlinux                  vmlinux      ip_route_input
      
      [ horms@verge.net.au: trivial up-port and minor style fixes ]
      Signed-off-by: default avatarCatalin(ux) M. BOIE <catab@embedromix.ro>
      Cc: Mark Bergsma <mark@wikimedia.org>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      6f7edb48
  9. 04 Jan, 2010 10 commits
  10. 30 Dec, 2009 6 commits