1. 10 Oct, 2007 40 commits
    • Volker Braun's avatar
      [MAC80211]: ignore key index on pairwise key (WEP only) · 139c3a04
      Volker Braun authored
      Work-around for broken APs that use a non-zero key index for WEP
      pairwise keys. With this patch, WEP encryption only is exempt from
      providing a zero key index.
      Signed-off-by: default avatarVolker Braun <volker.braun@physik.hu-berlin.de>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      139c3a04
    • Johannes Berg's avatar
      [MAC80211]: remove TKIP mixing for hw accel again · c39e3a0d
      Johannes Berg authored
      The TKIP mixing code was added for the benefit of Intel's ipw3945
      chipset but that code ended up not using it. We have previously
      identified many problems with this code and it crystallized that
      library functions for mixing are likely to handle this in much
      more generality and might allow b43 to take advantage of hardware
      acceleration for TKIP.
      
      Due to these reasons, remove the TKIP mixing for hardware
      accelerated crypto operations.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Buesch <mb@bu3sch.de>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c39e3a0d
    • Johannes Berg's avatar
      [MAC80211]: remove HW_KEY_IDX_INVALID · 6a7664d4
      Johannes Berg authored
      This patch makes the mac80211/driver interface rely only on the
      IEEE80211_TXCTL_DO_NOT_ENCRYPT flag to signal to the driver whether
      a frame should be encrypted or not, since mac80211 internally no
      longer relies on HW_KEY_IDX_INVALID either this removes it, changes
      the key index to be a u8 in all places and makes the full range of
      the value available to drivers.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a7664d4
    • Johannes Berg's avatar
      [MAC80211]: some more documentation · 7ac1bd6a
      Johannes Berg authored
      This patch formats some documentation in mac80211.h into kerneldoc
      and also adds some more explanations for hardware crypto.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ac1bd6a
    • Johannes Berg's avatar
      [MAC80211]: remove set_key_idx callback · c15a2050
      Johannes Berg authored
      No existing drivers use this callback, hence there's no telling
      how it might be used. In fact, it is unlikely to be of much use
      as-is because the default key index isn't something that the
      driver can do much with without knowing which interface it was
      for etc. And if it needs the key index for the transmitted frame,
      it can get it by keeping a reference to the key_conf structure
      and looking it up by hw_key_idx.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c15a2050
    • Johannes Berg's avatar
      [MAC80211]: rework hardware crypto flags · 7848ba7d
      Johannes Berg authored
      This patch reworks the various hardware crypto related
      flags to make them more local, i.e. put them with each
      key or each packet instead of into the hw struct.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7848ba7d
    • Johannes Berg's avatar
      [MAC80211]: remove turbo modes · b708e610
      Johannes Berg authored
      This patch removes all mention of the atheros turbo modes that
      can't possibly work properly anyway since in some places we don't
      check for them when we should.
      
      I have no idea what the iwlwifi drivers were doing with these but
      it can't possibly have been correct.
      
      Cc: Zhu Yi <yi.zhu@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b708e610
    • Johannes Berg's avatar
      [MAC80211]: fix race conditions with keys · d4e46a3d
      Johannes Berg authored
      During receive processing, we select the key long before using it and
      because there's no locking it is possible that we kfree() the key
      after having selected it but before using it for crypto operations.
      Obviously, this is bad.
      
      Secondly, during transmit processing, there are two possible races: We
      have a similar race between select_key() and using it for encryption,
      but we also have a race here between select_key() and hardware
      encryption (both when a key is removed.)
      
      This patch solves these issues by using RCU: when a key is to be freed,
      we first remove the pointer from the appropriate places (sdata->keys,
      sdata->default_key, sta->key) using rcu_assign_pointer() and then
      synchronize_rcu(). Then, we can safely kfree() the key and remove it
      from the hardware. There's a window here where the hardware may still
      be using it for decryption, but we can't work around that without having
      two hardware callbacks, one to disable the key for RX and one to disable
      it for TX; but the worst thing that will happen is that we receive a
      packet decrypted that we don't find a key for any more and then drop it.
      
      When we add a key, we first need to upload it to the hardware and then,
      using rcu_assign_pointer() again, link it into our structures.
      
      In the code using keys (TX/RX paths) we use rcu_dereference() to get the
      key and enclose the whole tx/rx section in a rcu_read_lock() ...
      rcu_read_unlock() block. Because we've uploaded the key to hardware
      before linking it into internal structures, we can guarantee that it is
      valid once get to into tx().
      
      One possible race condition remains, however: when we have hardware
      acceleration enabled and the driver shuts down the queues, we end up
      queueing the frame. If now somebody removes the key, the key will be
      removed from hwaccel and then then driver will be asked to encrypt the
      frame with a key index that has been removed. Hence, drivers will need
      to be aware that the hw_key_index they are passed might not be under
      all circumstances. Most drivers will, however, simply ignore that
      condition and encrypt the frame with the selected key anyway, this
      only results in a frame being encrypted with a wrong key or dropped
      (rightfully) because the key was not valid. There isn't much we can
      do about it unless we want to walk the pending frame queue every time
      a key is removed and remove all frames that used it.
      
      This race condition, however, will most likely be solved once we add
      multiqueue support to mac80211 because then frames will be queued
      further up the stack instead of after being processed.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4e46a3d
    • Johannes Berg's avatar
      [MAC80211]: don't send invalid QoS frames · c29b9b9b
      Johannes Berg authored
      Kalle Valo noticed that QoS frames are sent with an invalid QoS control
      field; this is because we increase the header length but neither
      initialise the space nor actually have enough space in the header
      structure for the QoS control field.
      
      This patch fixes it by treating the QoS field specially and appending it
      explicitly, initialising it to zero.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29b9b9b
    • Johannes Berg's avatar
      [MAC80211]: remove spy wext ioctls · 5d4ecd93
      Johannes Berg authored
      mac80211 never calls wireless_spy_update so these aren't
      useful.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarMichael Wu <flamingice@sourmilk.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d4ecd93
    • Eric Dumazet's avatar
      [IPV4]: Convert rt_check_expire() from softirq processing to workqueue. · 39c90ece
      Eric Dumazet authored
      On loaded/big hosts, rt_check_expire() if of litle use, because it
      generally breaks out of its main loop because of a jiffies change.
      
      It can take a long time (read : timer invocations) to actually
      scan the whole hash table, freeing unused entries.
      
      Converting it to use a workqueue instead of softirq is a nice
      move because we can allow rt_check_expire() to do the scan
      it is supposed to do, without hogging the CPU.
      
      This has an impact on the average number of entries in cache,
      reducing ram usage. Cache is more responsive to parameter
      changes (/proc/sys/net/ipv4/route/gc_timeout and
      /proc/sys/net/ipv4/route/gc_interval)
      
      Note: Maybe the default value of gc_interval (60 seconds)
      is too high, since this means we actually need 5 (300/60)
      invocations to scan the whole table.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39c90ece
    • Ivo van Doorn's avatar
      [RFKILL]: Add rfkill documentation · dac24ab3
      Ivo van Doorn authored
      Add a documentation file which contains
      a short description about rfkill with some
      notes about drivers and the userspace interface.
      
      Changes since v1 and v2:
       - Spellchecking
      Signed-off-by: default avatarIvo van Doorn <IvDoorn@gmail.com>
      Acked-by: default avatarDmitry Torokhov <dtor@mail.ru>
      Acked-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      dac24ab3
    • Ivo van Doorn's avatar
      [RFKILL]: Add support for ultrawideband · e0665486
      Ivo van Doorn authored
      This patch will add support for UWB keys to rfkill,
      support for this has been requested by Inaky.
      Signed-off-by: default avatarIvo van Doorn <IvDoorn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0665486
    • Ivo van Doorn's avatar
      [RFKILL]: Remove IRDA · 234a0ca6
      Ivo van Doorn authored
      As Dmitry pointed out earlier, rfkill-input.c
      doesn't support irda because there are no users
      and we shouldn't add unrequired KEY_ defines.
      
      However, RFKILL_TYPE_IRDA was defined in the
      rfkill.h header file and would confuse people
      about whether it is implemented or not.
      
      This patch removes IRDA support completely,
      so it can be added whenever a driver wants the
      feature.
      Signed-off-by: default avatarIvo van Doorn <IvDoorn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      234a0ca6
    • Eric W. Biederman's avatar
      [NET]: Fix race when opening a proc file while a network namespace is exiting. · 077130c0
      Eric W. Biederman authored
      The problem:  proc_net files remember which network namespace the are
      against but do not remember hold a reference count (as that would pin
      the network namespace).   So we currently have a small window where
      the reference count on a network namespace may be incremented when opening
      a /proc file when it has already gone to zero.
      
      To fix this introduce maybe_get_net and get_proc_net.
      
      maybe_get_net increments the network namespace reference count only if it is
      greater then zero, ensuring we don't increment a reference count after it
      has gone to zero.
      
      get_proc_net handles all of the magic to go from a proc inode to the network
      namespace instance and call maybe_get_net on it.
      
      PROC_NET the old accessor is removed so that we don't get confused and use
      the wrong helper function.
      
      Then I fix up the callers to use get_proc_net and handle the case case
      where get_proc_net returns NULL.  In that case I return -ENXIO because
      effectively the network namespace has already gone away so the files
      we are trying to access don't exist anymore.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      077130c0
    • Daniel Lezcano's avatar
      [NETNS]: Fix allnoconfig compilation error. · 4fabcd71
      Daniel Lezcano authored
      When CONFIG_NET=no, init_net is unresolved because net_namespace.c
      is not compiled and the include pull init_net definition.
      
      This problem was very similar with the ipc namespace where the kernel
      can be compiled with SYSV ipc out.
      
      This patch fix that defining a macro which simply remove init_net
      initialization from nsproxy namespace aggregator.
      
      Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.
      Signed-off-by: default avatarDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fabcd71
    • Jesper Dangaard Brouer's avatar
      [NET_SCHED]: Making rate table lookups more flexible. · e08b0998
      Jesper Dangaard Brouer authored
      This is done in order to, add support to changing the rate table to
      use the upper-boundry L2T (length to time) value. Currently we use the
      lower-boundry, which result in under-estimating the actual bandwidth
      usage.
      
      Extend the tc_ratespec struct, with two parameters: 1) "cell_align"
      that allow adjusting the alignment of the rate table. 2) "overhead"
      that allow adding a packet overhead before the lookup.
      Signed-off-by: default avatarJesper Dangaard Brouer <hawk@comx.dk>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e08b0998
    • Jesper Dangaard Brouer's avatar
      [NET_SCHED]: Cleanup L2T macros and handle oversized packets · e9bef55d
      Jesper Dangaard Brouer authored
      Change L2T (length to time) macros, in all rate based schedulers, to
      call a common function qdisc_l2t() that does the rate table lookup.
      This function handles if the packet size lookup is larger than the
      rate table, which often occurs with TSO enabled.
      Signed-off-by: default avatarJesper Dangaard Brouer <hawk@comx.dk>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9bef55d
    • Adrian Bunk's avatar
      [SCTP] net/sctp/socket.c: make 3 variables static · b6fa1a4d
      Adrian Bunk authored
      This patch makes the following needlessly global variables static:
      - sctp_memory_pressure
      - sctp_memory_allocated
      - sctp_sockets_allocated
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6fa1a4d
    • Adrian Bunk's avatar
      [SCTP]: Make sctp_addto_param() static. · 5c94bf86
      Adrian Bunk authored
      sctp_addto_param() can become static.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c94bf86
    • Adrian Bunk's avatar
      [KERNEL]: Unexport raise_softirq_irqoff · 464771fe
      Adrian Bunk authored
      raise_softirq_irqoff no longer has any modular user.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      464771fe
    • Daniel Lezcano's avatar
      [NETNS]: Fix bad macro definition. · a050c33f
      Daniel Lezcano authored
      The macro definition is bad. When calling next_net_device with
      parameter name "dev", the resulting code is:
      	  struct net_device *dev = dev and that leads to an unexpected
      behavior. Especially when llc_core is compiled in, the kernel panics
      at boot time.
      The patchset change macro definition with static inline functions as
      they were defined before.
      Signed-off-by: default avatarBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: default avatarDaniel Lezcano <dlezcano@fr.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a050c33f
    • Daniel Lezcano's avatar
      [NETNS]: Fix loopback network namespace initialization. · abf07acb
      Daniel Lezcano authored
      The core patchset of the network namespace sent by
      Eric Biederman does not do dynamic loopback creation.
      So there is no call to alloc_netdev_mq which fills the
      network namespace field of the netdevice.
      
      This patch assign the loopback to the init network namespace.
      Signed-off-by: default avatarDaniel Lezcano <dlezcano@fr.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abf07acb
    • Daniel Lezcano's avatar
      [NETNS]: Fix export symbols. · 36ac3135
      Daniel Lezcano authored
      Add the appropriate EXPORT_SYMBOLS for proc_net_create,
      proc_net_fops_create and proc_net_remove to fix errors when
      compiling allmodconfig
      Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
      Acked-by: default avatarBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36ac3135
    • Thomas Graf's avatar
      [NETLINK]: Introduce nested and byteorder flag to netlink attribute · 8f4c1f9b
      Thomas Graf authored
      This change allows the generic attribute interface to be used within
      the netfilter subsystem where this flag was initially introduced.
      
      The byte-order flag is yet unused, it's intended use is to
      allow automatic byte order convertions for all atomic types.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f4c1f9b
    • David S. Miller's avatar
      [NET]: Add a might_sleep() to dev_close(). · 9d5010db
      David S. Miller authored
      Requested by Johannes Berg.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d5010db
    • Eric Dumazet's avatar
      [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue · 86bba269
      Eric Dumazet authored
      When the periodic IP route cache flush is done (every 600 seconds on
      default configuration), some hosts suffer a lot and eventually trigger
      the "soft lockup" message.
      
      dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
      eventually freeing some (less than 1%) of them, while holding the
      dst_lock spinlock for the whole scan.
      
      Then it rearms a timer to redo the full thing 1/10 s later...
      The slowdown can last one minute or so, depending on how active are
      the tcp sessions.
      
      This second version of the patch converts the processing from a softirq
      based one to a workqueue.
      
      Even if the list of entries in garbage_list is huge, host is still
      responsive to softirqs and can make progress.
      
      Instead of resetting gc timer to 0.1 second if one entry was freed in a
      gc run, we do this if more than 10% of entries were freed.
      
      Before patch :
      
      Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
      Aug 16 06:21:37 SRV1 kernel:
      Aug 16 06:21:37 SRV1 kernel: Call Trace:
      Aug 16 06:21:37 SRV1 kernel:  <IRQ>  [<ffffffff802286f0>] wake_up_process+0x10/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80251e09>] softlockup_tick+0xe9/0x110
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd380>] dst_run_gc+0x0/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802376f3>] run_local_timers+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802379c7>] update_process_times+0x57/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8023340c>] __do_softirq+0x6c/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020ad6c>] call_softirq+0x1c/0x30
      Aug 16 06:21:37 SRV1 kernel:  <EOI>  [<ffffffff8020cb34>] do_softirq+0x34/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80400160>] tcp_v4_connect+0x150/0x610
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be551>] sys_connect+0x71/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8028881e>] sys_ioctl+0x5e/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80209c4e>] system_call+0x7e/0x83
      
      After patch : (RT_CACHE_DEBUG set to 2 to get following traces)
      
      dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
      dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
      dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
      dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
      dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
      dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us
      dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us
      dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us
      dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us
      dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us
      dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us
      dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us
      
      I successfully reduced the IP route cache of many hosts by a four factor
      thanks to this patch. Previously, I had to disable "ip route flush cache"
      to avoid crashes.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86bba269
    • David S. Miller's avatar
      3c12afe7
    • David S. Miller's avatar
      [NET]: #if 0 out net_alloc() for now. · 678aa8e4
      David S. Miller authored
      We will undo this once it is actually used.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      678aa8e4
    • Eric W. Biederman's avatar
      [NET]: Disable netfilter sockopts when not in the initial network namespace · c48dad7e
      Eric W. Biederman authored
      Until we support multiple network namespaces with netfilter only allow
      netfilter configuration in the initial network namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c48dad7e
    • Eric W. Biederman's avatar
      [NET]: netlink support for moving devices between network namespaces. · d8a5ec67
      Eric W. Biederman authored
      The simplest thing to implement is moving network devices between
      namespaces.  However with the same attribute IFLA_NET_NS_PID we can
      easily implement creating devices in the destination network
      namespace as well.  However that is a little bit trickier so this
      patch sticks to what is simple and easy.
      
      A pid is used to identify a process that happens to be a member
      of the network namespace we want to move the network device to.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8a5ec67
    • Eric W. Biederman's avatar
      [NET]: Implement network device movement between namespaces · ce286d32
      Eric W. Biederman authored
      This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
      a network device is local to a single network namespace and
      should never be moved.  Useful for pseudo devices that we
      need an instance in each network namespace (like the loopback
      device) and for any device we find that cannot handle multiple
      network namespaces so we may trap them in the initial network
      namespace.
      
      This patch introduces the function dev_change_net_namespace
      a function used to move a network device from one network
      namespace to another.  To the network device nothing
      special appears to happen, to the components of the network
      stack it appears as if the network device was unregistered
      in the network namespace it is in, and a new device
      was registered in the network namespace the device
      was moved to.
      
      This patch sets up a namespace device destructor that
      upon the exit of a network namespace moves all of the
      movable network devices  to the initial network namespace
      so they are not lost.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce286d32
    • Eric W. Biederman's avatar
      [NET]: Factor out __dev_alloc_name from dev_alloc_name · b267b179
      Eric W. Biederman authored
      When forcibly changing the network namespace of a device
      I need something that can generate a name for the device
      in the new namespace without overwriting the old name.
      
      __dev_alloc_name provides me that functionality.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b267b179
    • Eric W. Biederman's avatar
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman authored
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      881d966b
    • Eric W. Biederman's avatar
      [NET]: Support multiple network namespaces with netlink · b4b51029
      Eric W. Biederman authored
      Each netlink socket will live in exactly one network namespace,
      this includes the controlling kernel sockets.
      
      This patch updates all of the existing netlink protocols
      to only support the initial network namespace.  Request
      by clients in other namespaces will get -ECONREFUSED.
      As they would if the kernel did not have the support for
      that netlink protocol compiled in.
      
      As each netlink protocol is updated to be multiple network
      namespace safe it can register multiple kernel sockets
      to acquire a presence in the rest of the network namespaces.
      
      The implementation in af_netlink is a simple filter implementation
      at hash table insertion and hash table look up time.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4b51029
    • Eric W. Biederman's avatar
      [NET]: Make device event notification network namespace safe · e9dc8653
      Eric W. Biederman authored
      Every user of the network device notifiers is either a protocol
      stack or a pseudo device.  If a protocol stack that does not have
      support for multiple network namespaces receives an event for a
      device that is not in the initial network namespace it quite possibly
      can get confused and do the wrong thing.
      
      To avoid problems until all of the protocol stacks are converted
      this patch modifies all netdev event handlers to ignore events on
      devices that are not in the initial network namespace.
      
      As the rest of the code is made network namespace aware these
      checks can be removed.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9dc8653
    • Eric W. Biederman's avatar
      [NET]: Make packet reception network namespace safe · e730c155
      Eric W. Biederman authored
      This patch modifies every packet receive function
      registered with dev_add_pack() to drop packets if they
      are not from the initial network namespace.
      
      This should ensure that the various network stacks do
      not receive packets in a anything but the initial network
      namespace until the code has been converted and is ready
      for them.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e730c155
    • Eric W. Biederman's avatar
      [NET]: Initialize the network namespace of network devices. · 6d34b1c2
      Eric W. Biederman authored
      Except for carefully selected pseudo devices all network
      interfaces should start out in the initial network namespace.
      Ultimately it will be register_netdev that examines what
      dev->nd_net is set to and places a device in a network namespace.
      
      This patch modifies alloc_netdev to initialize the network
      namespace a device is in with the initial network namespace.
      This gets it right for the vast majority of devices so their
      drivers need not be modified and for those few pseudo devices
      that need something different they can change this parameter
      before calling register_netdevice.
      
      The network namespace parameter on a network device is not
      reference counted as the devices are inside of a network namespace
      and cannot remain in that namespace past the lifetime of the
      network namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d34b1c2
    • Eric W. Biederman's avatar
      [NET]: Make socket creation namespace safe. · 1b8d7ae4
      Eric W. Biederman authored
      This patch passes in the namespace a new socket should be created in
      and has the socket code do the appropriate reference counting.  By
      virtue of this all socket create methods are touched.  In addition
      the socket create methods are modified so that they will fail if
      you attempt to create a socket in a non-default network namespace.
      
      Failing if we attempt to create a socket outside of the default
      network namespace ensures that as we incrementally make the network stack
      network namespace aware we will not export functionality that someone
      has not audited and made certain is network namespace safe.
      Allowing us to partially enable network namespaces before all of the
      exotic protocols are supported.
      
      Any protocol layers I have missed will fail to compile because I now
      pass an extra parameter into the socket creation code.
      
      [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b8d7ae4
    • Eric W. Biederman's avatar
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman authored
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      457c4cbc