1. 05 Jan, 2015 4 commits
    • Jesse Gross's avatar
      geneve: Check family when reusing sockets. · 46b1e4f9
      Jesse Gross authored
      When searching for an existing socket to reuse, the address family
      is not taken into account - only port number. This means that an
      IPv4 socket could be used for IPv6 traffic and vice versa, which
      is sure to cause problems when passing packets.
      
      It is not possible to trigger this problem currently because the
      only user of Geneve creates just IPv4 sockets. However, that is
      likely to change in the near future.
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46b1e4f9
    • Jesse Gross's avatar
      geneve: Remove socket hash table. · df5dba8e
      Jesse Gross authored
      The hash table for open Geneve ports is used only on creation and
      deletion time. It is not performance critical and is not likely to
      grow to a large number of items. Therefore, this can be changed
      to use a simple linked list.
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df5dba8e
    • Jesse Gross's avatar
      geneve: Simplify locking. · 829a3ada
      Jesse Gross authored
      The existing Geneve locking scheme was pulled over directly from
      VXLAN. However, VXLAN has a number of built in mechanisms which make
      the locking more complex and are unlikely to be necessary with Geneve.
      This simplifies the locking to use a basic scheme of a mutex
      when doing updates plus RCU on receive.
      
      In addition to making the code easier to read, this also avoids the
      possibility of a race when creating or destroying sockets since
      UDP sockets and the list of Geneve sockets are protected by different
      locks. After this change, the entire operation is atomic.
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      829a3ada
    • Jesse Gross's avatar
      geneve: Remove workqueue. · 61f3cade
      Jesse Gross authored
      The work queue is used only to free the UDP socket upon destruction.
      This is not necessary with Geneve and generally makes the code more
      difficult to reason about. It also introduces nondeterministic
      behavior such as when a socket is rapidly deleted and recreated, which
      could fail as the the deletion happens asynchronously.
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f3cade
  2. 03 Jan, 2015 10 commits
    • David S. Miller's avatar
      Merge branch 'rhashtable-next' · 7beceebf
      David S. Miller authored
      Thomas Graf says:
      
      ====================
      rhashtable: Per bucket locks & deferred table resizing
      
      Prepares for and introduces per bucket spinlocks and deferred table
      resizing. This allows for parallel table mutations in different hash
      buckets from atomic context. The resizing occurs in the background
      in a separate worker thread while lookups, inserts, and removals can
      continue.
      
      Also modified the chain linked list to be terminated with a special
      nulls marker to allow entries to move between multiple lists.
      
      Last but not least, reintroduces lockless netlink_lookup() with
      deferred Netlink socket destruction to avoid the side effect of
      increased netlink_release() runtime.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7beceebf
    • Thomas Graf's avatar
      netlink: Lockless lookup with RCU grace period in socket release · 21e4902a
      Thomas Graf authored
      Defers the release of the socket reference using call_rcu() to
      allow using an RCU read-side protected call to rhashtable_lookup()
      
      This restores behaviour and performance gains as previously
      introduced by e341694e ("netlink: Convert netlink_lookup() to use
      RCU protected hash table") without the side effect of severely
      delayed socket destruction.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21e4902a
    • Thomas Graf's avatar
      rhashtable: Supports for nulls marker · f89bd6f8
      Thomas Graf authored
      In order to allow for wider usage of rhashtable, use a special nulls
      marker to terminate each chain. The reason for not using the existing
      nulls_list is that the prev pointer usage would not be valid as entries
      can be linked in two different buckets at the same time.
      
      The 4 nulls base bits can be set through the rhashtable_params structure
      like this:
      
      struct rhashtable_params params = {
              [...]
              .nulls_base = (1U << RHT_BASE_SHIFT),
      };
      
      This reduces the hash length from 32 bits to 27 bits.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f89bd6f8
    • Thomas Graf's avatar
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf authored
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97defe1e
    • Thomas Graf's avatar
      spinlock: Add spin_lock_bh_nested() · 113948d8
      Thomas Graf authored
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      113948d8
    • Thomas Graf's avatar
      nft_hash: Remove rhashtable_remove_pprev() · 897362e4
      Thomas Graf authored
      The removal function of nft_hash currently stores a reference to the
      previous element during lookup which is used to optimize removal later
      on. This was possible because a lock is held throughout calling
      rhashtable_lookup() and rhashtable_remove().
      
      With the introdution of deferred table resizing in parallel to lookups
      and insertions, the nftables lock will no longer synchronize all
      table mutations and the stored pprev may become invalid.
      
      Removing this optimization makes removal slightly more expensive on
      average but allows taking the resize cost out of the insert and
      remove path.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      897362e4
    • Thomas Graf's avatar
      rhashtable: Factor out bucket_tail() function · b8e1943e
      Thomas Graf authored
      Subsequent patches will require access to the bucket tail. Access
      to the tail is relatively cheap as the automatic resizing of the
      table should keep the number of entries per bucket to no more
      than 0.75 on average.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8e1943e
    • Thomas Graf's avatar
      rhashtable: Convert bucket iterators to take table and index · 88d6ed15
      Thomas Graf authored
      This patch is in preparation to introduce per bucket spinlocks. It
      extends all iterator macros to take the bucket table and bucket
      index. It also introduces a new rht_dereference_bucket() to
      handle protected accesses to buckets.
      
      It introduces a barrier() to the RCU iterators to the prevent
      the compiler from caching the first element.
      
      The lockdep verifier is introduced as stub which always succeeds
      and properly implement in the next patch when the locks are
      introduced.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88d6ed15
    • Thomas Graf's avatar
    • Thomas Graf's avatar
      rhashtable: Do hashing inside of rhashtable_lookup_compare() · 8d24c0b4
      Thomas Graf authored
      Hash the key inside of rhashtable_lookup_compare() like
      rhashtable_lookup() does. This allows to simplify the hashing
      functions and keep them private.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d24c0b4
  3. 02 Jan, 2015 21 commits
  4. 01 Jan, 2015 5 commits