1. 29 Jul, 2014 14 commits
    • Ben Hutchings's avatar
      sfc: Use __iowrite64_copy instead of a slightly different local function · 4984c237
      Ben Hutchings authored
      __iowrite64_copy() isn't quite the same as efx_memcpy_64(), but
      it looks close enough:
      
      - The length is in units of qwords not bytes
      - It never byte-swaps, but that doesn't make a difference now as PIO
        is only enabled for x86_64
      - It doesn't include any memory barriers, but that's OK as there is a
        barrier just before pushing the doorbell
      - mlx4_en uses it for the same purpose
      
      Compile-tested only.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4984c237
    • David S. Miller's avatar
      Merge branch 'netdev-name' · 772e7023
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net: forbid net devices named "all" "default" or "config"
      
      /proc/sys/net/ipv[46]/conf/<dev> could conflict with
      /proc/sys/net/ipv[46]/conf/(all|default). And /proc/net/vlan/<dev>
      could conflict with /proc/net/vlan/config. Besides kernel warnings,
      undefined behavior such as duplicated proc files also appears, therefore
      we should forbid these names.
      
      v2: introduce a helper function, suggested by Florian
          fix error handling for ipv6_add_dev() in addrconf_init()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      772e7023
    • WANG Cong's avatar
      vlan: fail early when creating netdev named config · 9c5ff24f
      WANG Cong authored
      Similarly, vlan will create  /proc/net/vlan/<dev>, so when we
      create dev with name "config", it will confict with
      /proc/net/vlan/config.
      Reported-by: default avatarStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c5ff24f
    • WANG Cong's avatar
      ipv6: fail early when creating netdev named all or default · a317a2f1
      WANG Cong authored
      We create a proc dir for each network device, this will cause
      conflicts when the devices have name "all" or "default".
      
      Rather than emitting an ugly kernel warning, we could just
      fail earlier by checking the device name.
      Reported-by: default avatarStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a317a2f1
    • WANG Cong's avatar
      ipv4: fail early when creating netdev named all or default · 20e61da7
      WANG Cong authored
      We create a proc dir for each network device, this will cause
      conflicts when the devices have name "all" or "default".
      
      Rather than emitting an ugly kernel warning, we could just
      fail earlier by checking the device name.
      Reported-by: default avatarStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20e61da7
    • David S. Miller's avatar
      Merge branch 'syststamp-removal' · 9d7e3ea7
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      net: remove deprecated syststamp
      
      The network stack can generate two kinds of hardware timestamps:
      - hwtstamp stores a hw timestamp in device-specific raw format
      - syststamp convers the raw format to system time
      
      The second is deprecated and only implemented by a single device
      driver. The suggested alternative is to communicate hwtstamp +
      directly expose the NIC PTP clock device through ptp_clock_info.
      The remaining driver (octeon) does not expose such a standard
      interface as of now. It does have its own PTP library that depends
      on its own shared memory PTP clock interface.
      
      This patchset
      1. reverts the syststamp code in the one driver (octeon)
      2. reverts an unnecessary zero initialization in another (vxge)
      3. modifies PF_PACKET to use syststamp is != 0 (because always == 0)
      4. modifies SCM_TIMESTAMPING in the same way
      
      For backwards compatibility, the interfaces are not removed.
      Applications can still request SOF_TIMESTAMPING_SYS_HARDWARE. The
      response field in scm_timestamping also remains. As was the case
      for hardware/drivers that did not implement the feature, the
      setsockopt succeeds, but the response field is always zero.
      ====================
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d7e3ea7
    • Willem de Bruijn's avatar
      net: remove deprecated syststamp timestamp · 4d276eb6
      Willem de Bruijn authored
      The SO_TIMESTAMPING API defines three types of timestamps: software,
      hardware in raw format (hwtstamp) and hardware converted to system
      format (syststamp). The last has been deprecated in favor of combining
      hwtstamp with a PTP clock driver. There are no active users in the
      kernel.
      
      The option was device driver dependent. If set, but without hardware
      support, the correct behavior is to return zero in the relevant field
      in the SCM_TIMESTAMPING ancillary message. Without device drivers
      implementing the option, this field is effectively always zero.
      
      Remove the internal plumbing to dissuage new drivers from implementing
      the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
      avoid breaking existing applications that request the timestamp.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d276eb6
    • Willem de Bruijn's avatar
      packet: remove deprecated syststamp timestamp · 68a360e8
      Willem de Bruijn authored
      No device driver will ever return an skb_shared_info structure with
      syststamp non-zero, so remove the branch that tests for this and
      optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE.
      
      Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes
      may refer to it.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68a360e8
    • Willem de Bruijn's avatar
      vxge: remove deprecated syststamp timestamp · ce750588
      Willem de Bruijn authored
      This driver explicitly clears a field that is unused and about to be
      removed. Remove the initialization.
      
      All fields in skb_shared_info before dataref are cleared in
      __alloc_skb, so the removal is safe even while syststamp exists.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce750588
    • Willem de Bruijn's avatar
      octeon: remove deprecated syststamp timestamp · c6d5fefa
      Willem de Bruijn authored
      Hardware timestamps can be exposed to userspace in raw hardware format
      (hwtstamp) as well as converted to system time (syststamp). The second
      variant is deprecated and only implemented by this driver.
      
      The preferred method of hardware timestamp generation is to combine
      hwtstamp with a device PTP clock. Octeon has its own PTP library
      that relies on a shared memory interface to the PTP clock device.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6d5fefa
    • Jon Paul Maloy's avatar
      tipc: make tipc_buf_append() more robust · 13e9b997
      Jon Paul Maloy authored
      As per comment from David Miller, we try to make the buffer reassembly
      function more resilient to user errors than it is today.
      
      - We check that the "*buf" parameter always is set, since this is
        mandatory input.
      
      - We ensure that *buf->next always is set to NULL before linking in
        the buffer, instead of relying of the caller to have done this.
      
      - We ensure that the "tail" pointer in the head buffer's control
        block is initialized to NULL when the first fragment arrives.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13e9b997
    • David S. Miller's avatar
      Merge tag 'master-2014-07-25' of... · 3fd0202a
      David S. Miller authored
      Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
      
      John W. Linville says:
      
      ====================
      pull request: wireless-next 2014-07-25
      
      Please pull this batch of updates intended for the 3.17 stream!
      
      For the mac80211 bits, Johannes says:
      
      "We have a lot of TDLS patches, among them a fix that should make hwsim
      tests happy again. The rest, this time, is mostly small fixes."
      
      For the Bluetooth bits, Gustavo says:
      
      "Some more patches for 3.17. The most important change here is the move of
      the 6lowpan code to net/6lowpan. It has been agreed with Davem that this
      change will go through the bluetooth tree. The rest are mostly clean up and
      fixes."
      
      and,
      
      "Here follows some more patches for 3.17. These are mostly fixes to what
      we've sent to you before for next merge window."
      
      For the iwlwifi bits, Emmanuel says:
      
      "I have the usual amount of BT Coex stuff. Arik continues to work
      on TDLS and Ariej contributes a few things for HS2.0. I added a few
      more things to the firmware debugging infrastructure. Eran fixes a
      small bug - pretty normal content."
      
      And for the Atheros bits, Kalle says:
      
      "For ath6kl me and Jessica added support for ar6004 hw3.0, our latest
      version of ar6004.
      
      For ath10k Janusz added a printout so that it's easier to check what
      ath10k kconfig options are enabled. He also added a debugfs file to
      configure maximum amsdu and ampdu values. Also we had few fixes as
      usual."
      
      On top of that is the usual large batch of various driver updates --
      brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action.
      Rafał has also been very busy with b43 and related updates.
      
      Also, I pulled the wireless tree into this in order to resolve a
      merge conflict...
      
      P.S.  The change to fs/compat_ioctl.c reflects a name change in a
      Bluetooth header file...
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fd0202a
    • Dan Carpenter's avatar
      bonding: fix a memory leak in bond_arp_send_all() · a67eed57
      Dan Carpenter authored
      This test is reversed so the memory is always leaked.  It's better style
      to remove the test anyway.
      
      Fixes: 3e403a77 ('bonding: make it possible to have unlimited nested upper vlans')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a67eed57
    • Mark Rustad's avatar
      netlink: Fix shadow warning on jiffies · d87de1f3
      Mark Rustad authored
      Change formal parameter name to not shadow the global jiffies.
      Signed-off-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d87de1f3
  2. 28 Jul, 2014 11 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · f1b714bb
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-07-25
      
      This series contains updates to e1000e, ixgbe and ixgbevf.
      
      Mark provides all the changes for ixgbe and ixgbevf.  Converts some udelay()
      calls to the preferred usleep_range().  Fixes a spurious release of the
      semaphore in several functions when there was a failure to acquire the
      semaphore in the first place.  Fixes a X540 semaphore error where an
      incorrect check was treating success as failure and vice-versa.  Fixed
      ixgbe_write_mbx() error when it was being called and there was no
      mbx->ops.write method defined, so no error code was returned.  The
      corresponding read function would explicitly return an error in such a
      case as do other functions.  Cleans up unused (dead) code by removing it.
      Finally make return values more direct, eliminating some gotos and
      otherwise unneeded conditionals, which allows the removal of some local
      variables.
      
      David provides all the changes for e1000e.  Fix CRC errors with jumbo
      traffic for 82579, i217 and i218 client parts to increase the gap
      between the read and write pointers in the transmit FIFO.  Added code
      to check and respond to previously ignored return values from NVM
      access functions.  Added support for EEE in Sx states and fixed EEE in
      S5 with runtime PM enabled.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1b714bb
    • David S. Miller's avatar
      Merge branch 'inet_frag_kill_lru_list' · 6ceed786
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      inet: frag: cleanup and update
      
      The end goal of this patchset is to remove the LRU list and to move the
      frag eviction to a work queue. It also does a couple of necessary cleanups
      and fixes. Brief patch descriptions:
      Patches 1 - 3 inclusive: necessary clean ups
      Patch 4 moves the eviction from the softirqs to a workqueue.
      Patch 5 removes the nqueues counter which was protected by the LRU lock
      Patch 6 removes the, by now unused, lru list.
      Patch 7 moves the rebuild timer to the workqueue and schedules the rebuilds
              only if we've hit the maximum queue length on some of the chains.
      Patch 8 migrate the rwlock to a seqlock since the rehash is usually a rare
              operation.
      Patch 9 introduces an artificial global memory limit based on the value of
              init_net's high_thresh which is used to cap the high_thresh of the
              other namespaces. Also introduces some sane limits on the other
              tunables, and makes it impossible to have low_thresh > high_thresh.
      
      Here are some numbers from running netperf before and after the patchset:
      Each test consists of the following setting: -I 95,5 -i 15,10
      
      1. Bound test (-T 4,4)
      1.1 Virtio before the patchset -
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.  : cpu bind
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      212992   64000   30.00      722177      0    12325.1     34.55    2.025
      212992           30.00      368020            6280.9     34.05    0.752
      
      1.2 Virtio after the patchset -
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.  : cpu bind
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      212992   64000   30.00      727030      0    12407.9     35.45    1.876
      212992           30.00      505405            8625.5     34.92    0.693
      
      2. Virtio unbound test
      2.1 Before the patchset
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992   64000   30.00      730008      0    12458.77
      212992           30.00      416721           7112.02
      
      2.2 After the patchset
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992   64000   30.00      731129      0    12477.89
      212992           30.00      487707           8323.50
      
      3. 10 gig unbound tests
      3.1 Before the patchset
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.133.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992   64000   30.00      417209      0    7120.33
      212992           30.00      416740           7112.33
      
      3.2 After the patchset
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.133.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992   64000   30.00      438009      0    7475.33
      212992           30.00      437630           7468.87
      
      Given the options each netperf ran between 10 and 15 times for 30 seconds
      to get the necessary confidence, also the tests themselves ran 3 times and
      were consistent.
      Another set of tests that I ran were parallel stress tests which consisted
      of flooding the machine with fragmented packets from different sources with
      frag timeout set to 0 (so there're lots of timeouts) and low_thresh set to
      1 byte (so evictions are happening all the time) and on top of that running
      a namespace create/destroy endless loop with network interfaces and
      addresses that got flooded (for the brief periods they were up) in parallel.
      This test ran for an hour without any issues.
      ====================
      6ceed786
    • Nikolay Aleksandrov's avatar
      inet: frag: set limits and make init_net's high_thresh limit global · 1bab4c75
      Nikolay Aleksandrov authored
      This patch makes init_net's high_thresh limit to be the maximum for all
      namespaces, thus introducing a global memory limit threshold equal to the
      sum of the individual high_thresh limits which are capped.
      It also introduces some sane minimums for low_thresh as it shouldn't be
      able to drop below 0 (or > high_thresh in the unsigned case), and
      overall low_thresh should not ever be above high_thresh, so we make the
      following relations for a namespace:
      init_net:
       high_thresh - max(not capped), min(init_net low_thresh)
       low_thresh - max(init_net high_thresh), min (0)
      
      all other namespaces:
       high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
       low_thresh = max(namespace's high_thresh), min(0)
      
      The major issue with having low_thresh > high_thresh is that we'll
      schedule eviction but never evict anything and thus rely only on the
      timers.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bab4c75
    • Florian Westphal's avatar
      inet: frag: use seqlock for hash rebuild · ab1c724f
      Florian Westphal authored
      rehash is rare operation, don't force readers to take
      the read-side rwlock.
      
      Instead, we only have to detect the (rare) case where
      the secret was altered while we are trying to insert
      a new inetfrag queue into the table.
      
      If it was changed, drop the bucket lock and recompute
      the hash to get the 'new' chain bucket that we have to
      insert into.
      
      Joint work with Nikolay Aleksandrov.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab1c724f
    • Florian Westphal's avatar
      inet: frag: remove periodic secret rebuild timer · e3a57d18
      Florian Westphal authored
      merge functionality into the eviction workqueue.
      
      Instead of rebuilding every n seconds, take advantage of the upper
      hash chain length limit.
      
      If we hit it, mark table for rebuild and schedule workqueue.
      To prevent frequent rebuilds when we're completely overloaded,
      don't rebuild more than once every 5 seconds.
      
      ipfrag_secret_interval sysctl is now obsolete and has been marked as
      deprecated, it still can be changed so scripts won't be broken but it
      won't have any effect. A comment is left above each unused secret_timer
      variable to avoid confusion.
      
      Joint work with Nikolay Aleksandrov.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3a57d18
    • Florian Westphal's avatar
      inet: frag: remove lru list · 3fd588eb
      Florian Westphal authored
      no longer used.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fd588eb
    • Florian Westphal's avatar
      inet: frag: don't account number of fragment queues · 434d3054
      Florian Westphal authored
      The 'nqueues' counter is protected by the lru list lock,
      once thats removed this needs to be converted to atomic
      counter.  Given this isn't used for anything except for
      reporting it to userspace via /proc, just remove it.
      
      We still report the memory currently used by fragment
      reassembly queues.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      434d3054
    • Florian Westphal's avatar
      inet: frag: move eviction of queues to work queue · b13d3cbf
      Florian Westphal authored
      When the high_thresh limit is reached we try to toss the 'oldest'
      incomplete fragment queues until memory limits are below the low_thresh
      value.  This happens in softirq/packet processing context.
      
      This has two drawbacks:
      
      1) processors might evict a queue that was about to be completed
      by another cpu, because they will compete wrt. resource usage and
      resource reclaim.
      
      2) LRU list maintenance is expensive.
      
      But when constantly overloaded, even the 'least recently used' element is
      recent, so removing 'lru' queue first is not 'fairer' than removing any
      other fragment queue.
      
      This moves eviction out of the fast path:
      
      When the low threshold is reached, a work queue is scheduled
      which then iterates over the table and removes the queues that exceed
      the memory limits of the namespace. It sets a new flag called
      INET_FRAG_EVICTED on the evicted queues so the proper counters will get
      incremented when the queue is forcefully expired.
      
      When the high threshold is reached, no more fragment queues are
      created until we're below the limit again.
      
      The LRU list is now unused and will be removed in a followup patch.
      
      Joint work with Nikolay Aleksandrov.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b13d3cbf
    • Florian Westphal's avatar
      inet: frag: move evictor calls into frag_find function · 86e93e47
      Florian Westphal authored
      First step to move eviction handling into a work queue.
      
      We lose two spots that accounted evicted fragments in MIB counters.
      
      Accounting will be restored since the upcoming work-queue evictor
      invokes the frag queue timer callbacks instead.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86e93e47
    • Florian Westphal's avatar
      inet: frag: remove hash size assumptions from callers · fb3cfe6e
      Florian Westphal authored
      hide actual hash size from individual users: The _find
      function will now fold the given hash value into the required range.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb3cfe6e
    • Florian Westphal's avatar
  3. 26 Jul, 2014 12 commits
  4. 25 Jul, 2014 3 commits