1. 05 Apr, 2016 40 commits
    • Michael Chan's avatar
      bnxt_en: Add basic EEE support. · 170ce013
      Michael Chan authored
      Get EEE capability and the initial EEE settings from firmware.
      Add "EEE is active | not active" to link up dmesg.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      170ce013
    • Michael Chan's avatar
      bnxt_en: Improve flow control autoneg with Firmware 1.2.1 interface. · c9ee9516
      Michael Chan authored
      Make use of the new AUTONEG_PAUSE bit in the new interface to better
      control autoneg flow control settings, independent of RX and TX
      advertisement settings.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9ee9516
    • Michael Chan's avatar
      bnxt_en: Update to Firmware 1.2.2 spec. · 11f15ed3
      Michael Chan authored
      Use new field names in API structs and stop using deprecated fields
      auto_link_speed and auto_duplex in phy_cfg/phy_qcfg structs.
      
      Update copyright year to 2016.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11f15ed3
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 04c85bfb
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2016-04-05
      
      This series contains updates to fm10k only.
      
      Bruce provides nearly half of the patches in the series, most of which do
      general cleanup of the driver.  These include semantic cleanups,
      checkpatch.pl fixes, update driver to use BIT() kernel macro, use
      BUILD_BUG_ON() where appropriate and use ether_addr_copy() instead of
      memcpy().
      
      Jake provides the remaining patches in the series, starting with a fix
      for a possible NULL pointer deference.  Next delays initialization of the
      service timer and service task until late in probe().  If we do not wait,
      failures in probe do not properly cleanup the service timer or service
      task items which result in a kernel panic.  Added better reporting during
      error conditions.  Fixed another possible kernel panic where we were
      clearing the interrupt scheme before we freed the mailbox IRQ.  Added
      helper functions for setting strings and data for ethtool stats.  Fixed
      comment mis-spelled words.
      
      v2: Dropped patch 3 from the original submission, until a better solution
          can be worked up based on feedback from Joe Perches and David Miller.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04c85bfb
    • Jacob Keller's avatar
      fm10k: use ethtool_rxfh_indir_default for default redirection table · 0ea7fae4
      Jacob Keller authored
      The fm10k driver used its own code for generating a default indirection
      table on device load, which was not the same as the default generated by
      ethtool when indir_size of 0 is passed to SRXFH. Take advantage of
      ethtool_rxfh_indir_default() and simplify code to write the redirection
      table to reduce some code duplication.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0ea7fae4
    • Jacob Keller's avatar
      fm10k: fix a minor typo in some comments · d8ec92f2
      Jacob Keller authored
      s/funciton/function to resolve a typo, and cleanup grammar on a few
      comments regarding processing the VF mailboxes.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d8ec92f2
    • Jacob Keller's avatar
      fm10k: correctly clean up when init_queueing_scheme fails · 4be37c42
      Jacob Keller authored
      Fix a kernel panic that occurs during surprise removal. Clear the
      interface queue counts upon fm10k_init_msix_capability failure. This
      prevents further code (fm10k_update_stats etc.) from attempting to
      access unallocated queue vector or ring memory.
      
      [  628.692648] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
      [  628.692805] IP: [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
      [  628.693173] PGD 0
      [  628.693759] Oops: 0000 [#1] SMP
      [  628.699321] CPU: 10 PID: 8164 Comm: kworker/10:0 Tainted: G           OE  ------------   3.10.0-327.el7.x86_64 #1
      [  628.700096] Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.2 05/09/2015
      [  628.700894] Workqueue: pciehp-1 pciehp_power_thread
      [  628.701686] task: ffff88086559c500 ti: ffff8808593c0000 task.ti: ffff8808593c0000
      [  628.702493] RIP: 0010:[<ffffffffa0475caf>]  [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
      [  628.703310] RSP: 0018:ffff8808593c3b00  EFLAGS: 00010282
      [  628.704132] RAX: 0000000000000000 RBX: ffff880860760000 RCX: 0000000000000000
      [  628.704963] RDX: ffff880860760b08 RSI: 0000000000000000 RDI: 0000000000000000
      [  628.705794] RBP: ffff8808593c3b40 R08: 0000000000000000 R09: 0000000000000000
      [  628.706604] R10: 0000000000000000 R11: ffff880860760c40 R12: 0000000000000080
      [  628.707420] R13: ffff8808607608c0 R14: ffff880860779ec0 R15: ffff880860779f40
      [  628.708238] FS:  0000000000000000(0000) GS:ffff88086f000000(0000) knlGS:0000000000000000
      [  628.709071] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  628.709923] CR2: 0000000000000068 CR3: 000000000194a000 CR4: 00000000001407e0
      [  628.710752] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  628.711596] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  628.712438] Stack:
      [  628.713255]  ffff880860764458 ffff8808607608c0 ffff880860760000 ffff880860760000
      [  628.714088]  0000000000000080 ffff8808607608c0 ffff880860779ec0 ffff880860779f40
      [  628.714925]  ffff8808593c3b88 ffffffffa04780c5 ffff880860764458 0000000a8163cb5b
      [  628.715752] Call Trace:
      [  628.716560]  [<ffffffffa04780c5>] fm10k_down+0x155/0x1f0 [fm10k]
      [  628.717367]  [<ffffffffa0479958>] fm10k_close+0x28/0xd0 [fm10k]
      [  628.718184]  [<ffffffff81526365>] __dev_close_many+0x85/0xd0
      [  628.718986]  [<ffffffff815264d8>] dev_close_many+0x98/0x120
      [  628.719764]  [<ffffffff81527ab8>] rollback_registered_many+0xa8/0x230
      [  628.720527]  [<ffffffff81527c80>] rollback_registered+0x40/0x70
      [  628.721294]  [<ffffffff81529198>] unregister_netdevice_queue+0x48/0x80
      [  628.722052]  [<ffffffff815291ec>] unregister_netdev+0x1c/0x30
      [  628.722816]  [<ffffffffa04762b8>] fm10k_remove+0xd8/0xe0 [fm10k]
      [  628.723581]  [<ffffffff81328c7b>] pci_device_remove+0x3b/0xb0
      [  628.724340]  [<ffffffff813f5fbf>] __device_release_driver+0x7f/0xf0
      [  628.725088]  [<ffffffff813f6053>] device_release_driver+0x23/0x30
      [  628.725814]  [<ffffffff81321fe4>] pci_stop_bus_device+0x94/0xa0
      [  628.726535]  [<ffffffff813220d2>] pci_stop_and_remove_bus_device+0x12/0x20
      [  628.727249]  [<ffffffff8133de40>] pciehp_unconfigure_device+0xb0/0x1b0
      [  628.727964]  [<ffffffff8133d822>] pciehp_disable_slot+0x52/0xd0
      [  628.728664]  [<ffffffff8133d98a>] pciehp_power_thread+0xea/0x150
      [  628.729358]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
      [  628.730036]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
      [  628.730730]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
      [  628.731385]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
      [  628.732036]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      [  628.732674]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
      [  628.733289]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      [  628.733883] Code: 83 e8 01 48 8d 97 40 02 00 00 45 31 c0 4c 8d 9c c7 48 02 0
      [  628.735202] RIP  [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
      [  628.735732]  RSP <ffff8808593c3b00>
      [  628.736285] CR2: 0000000000000068
      [  628.736846] ---[ end trace 9156088b311aff42 ]---
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4be37c42
    • Bruce Allan's avatar
      fm10k: prevent possibly uninitialized variable · c4114e3d
      Bruce Allan authored
      If 'attr_flag < (1 << (2 * FM10K_TEST_MSG_NESTED))' is ever false, err
      will be used uninitialized.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c4114e3d
    • Jacob Keller's avatar
      fm10k: add helper functions to set strings and data for ethtool stats · d2e0721b
      Jacob Keller authored
      Reduce duplicate code and the amount of indentation by adding
      fm10k_add_stat_strings and fm10k_add_ethtool_stats functions which help
      add fm10k_stat structures to the ethtool stats callbacks. This helps
      increase ease of use for future stat additions, and increases code
      readability. Skip handling of the per-queue stats as these will be
      reworked in a following patch.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d2e0721b
    • Jacob Keller's avatar
      fm10k: free MBX IRQ before clearing interrupt scheme · c8ed563b
      Jacob Keller authored
      During fm10k_io_error_detected we were clearing the interrupt scheme
      before we freed the MBX IRQ. This causes a kernel panic because the MBX
      IRQ are assigned after MSI-X initialization. Clearing the interrupt
      scheme results in removing the MSI-X entry table. Fix this by freeing
      the MBX IRQ before we clear the interrupt scheme, as we do elsewhere in
      the driver.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c8ed563b
    • Jacob Keller's avatar
      fm10k: print error message when stop_hw fails · 61e0217e
      Jacob Keller authored
      fm10k_stop_hw_generic calls fm10k_disable_queues_generic, which may
      return an error code indicating that the queues were not stopped within
      the time limit. Notify the user by displaying a message in the kernel
      message ring, in a similar way to how we notify the user when reset_hw
      fails. There isn't much we can do to recover from this error, so
      currently nothing else is done.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      61e0217e
    • Jacob Keller's avatar
      fm10k: base queue scheme covered by RSS · b3525696
      Jacob Keller authored
      In fm10k_set_num_queues, we previously assigned the base template. This
      would always be overwritten by either fm10k_set_qos_queues or
      fm10k_set_rss_queues. In either case, we don't need the base values, so
      we can just remove them.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b3525696
    • Jacob Keller's avatar
      fm10k: don't initialize service task until later in probe · e72319bb
      Jacob Keller authored
      Delay initialization of the service timer and service task until late
      probe. If we don't wait, failures in probe do not properly cleanup the
      service timer or service task items, which results in the kernel panic
      below, potentially freezing the whole system. In addition, ensure that
      the SERVICE_DISABLE bit is set before we request the MBX IRQ since the
      MBX interrupt attempts to schedule the service task otherwise. This
      prevents a similar trace from occurring after this change.
      
      We didn't notice this issue before because probe almost always completes
      successfully. I discovered it due to a mis-ordered mailbox handler
      array, which resulted in the following failure when requesting mailbox
      interrupt.
      
      [  555.325619] ------------[ cut here ]------------
      [  555.325628] WARNING: CPU: 0 PID: 4941 at lib/list_debug.c:33 __list_add+0xa0/0xd0()
      [  555.325631] list_add corruption. prev->next should be next (ffffffff81f46648), but was           (null). (prev=ffff8807fad5d0e8).
      <snip>
      [  555.325722] CPU: 0 PID: 4941 Comm: insmod Tainted: G           OE   4.0.4-303.fc22.x86_64 #1
      [  555.325725] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.8x23.060520140825 06/05/2014
      [  555.325727]  0000000000000000 00000000b4f161b3 ffff88081a21f8e8 ffffffff81783124
      [  555.325734]  0000000000000000 ffff88081a21f940 ffff88081a21f928 ffffffff8109c66a
      [  555.325740]  0000000064000000 ffff8807fad5d0e8 ffff8807fad5d0e8 ffffffff81f46648
      [  555.325746] Call Trace:
      [  555.325752]  [<ffffffff81783124>] dump_stack+0x45/0x57
      [  555.325757]  [<ffffffff8109c66a>] warn_slowpath_common+0x8a/0xc0
      [  555.325759]  [<ffffffff8109c6f5>] warn_slowpath_fmt+0x55/0x70
      [  555.325763]  [<ffffffff813ba270>] __list_add+0xa0/0xd0
      [  555.325768]  [<ffffffff81102d1d>] __internal_add_timer+0x9d/0x110
      [  555.325771]  [<ffffffff81102dbf>] internal_add_timer+0x2f/0xc0
      [  555.325774]  [<ffffffff81104e5a>] mod_timer+0x12a/0x230
      [  555.325782]  [<ffffffffa03d54ca>] fm10k_probe+0x69a/0xc80 [fm10k]
      [  555.325787]  [<ffffffff813e8355>] local_pci_probe+0x45/0xa0
      [  555.325791]  [<ffffffff8129cf42>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
      [  555.325794]  [<ffffffff813e96b9>] pci_device_probe+0xf9/0x150
      [  555.325799]  [<ffffffff814d7e73>] driver_probe_device+0xa3/0x400
      [  555.325802]  [<ffffffff814d82ab>] __driver_attach+0x9b/0xa0
      [  555.325805]  [<ffffffff814d8210>] ? __device_attach+0x40/0x40
      [  555.325808]  [<ffffffff814d5bd3>] bus_for_each_dev+0x73/0xc0
      [  555.325811]  [<ffffffff814d78ce>] driver_attach+0x1e/0x20
      [  555.325815]  [<ffffffff814d7480>] bus_add_driver+0x180/0x250
      [  555.325819]  [<ffffffffa03b2000>] ? 0xffffffffa03b2000
      [  555.325823]  [<ffffffff814d8aa4>] driver_register+0x64/0xf0
      [  555.325826]  [<ffffffff813e7bec>] __pci_register_driver+0x4c/0x50
      [  555.325832]  [<ffffffffa03d6ca3>] fm10k_register_pci_driver+0x23/0x30 [fm10k]
      [  555.325838]  [<ffffffffa03b2080>] fm10k_init_module+0x80/0x1000 [fm10k]
      [  555.325843]  [<ffffffff81002128>] do_one_initcall+0xb8/0x200
      [  555.325848]  [<ffffffff811e10d2>] ? __vunmap+0xa2/0x100
      [  555.325852]  [<ffffffff811fe239>] ? kmem_cache_alloc_trace+0x1b9/0x240
      [  555.325855]  [<ffffffff8178230e>] ? do_init_module+0x28/0x1cb
      [  555.325858]  [<ffffffff81782346>] do_init_module+0x60/0x1cb
      [  555.325862]  [<ffffffff8112168e>] load_module+0x205e/0x26b0
      [  555.325866]  [<ffffffff8111d110>] ? store_uevent+0x70/0x70
      [  555.325870]  [<ffffffff812234b0>] ? kernel_read+0x50/0x80
      [  555.325873]  [<ffffffff81121f3e>] SyS_finit_module+0xbe/0xf0
      [  555.325878]  [<ffffffff81789749>] system_call_fastpath+0x12/0x17
      [  555.325880] ---[ end trace 9e0f58d071eafd2a ]---
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e72319bb
    • Jacob Keller's avatar
      fm10k: prevent null pointer dereference of msix_entries table · de66c610
      Jacob Keller authored
      According to the C standard dereferencing a variable before it is
      checked invokes undefined behavior, and thus compilers are free to
      assume the check for NULL isn't necessary. Prevent this by re-ordering
      the NULL check of msix_entries in fm10k_free_mbx_irq.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      de66c610
    • Bruce Allan's avatar
      fm10k: use ether_addr_copy to copy MAC address · 11c49f79
      Bruce Allan authored
      Cleanup the remaining instances of using memcpy() instead of the preferred
      ether_addr_copy().
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      11c49f79
    • Bruce Allan's avatar
    • Bruce Allan's avatar
      fm10k: demote BUG_ON() to WARN_ON() where appropriate · 838e6102
      Bruce Allan authored
      We don't need to crash the kernel in this instance so just warn about the
      condition and play on.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      838e6102
    • Bruce Allan's avatar
      fm10k: cleanup remaining right-bit-shifted 1 · fcdb0a99
      Bruce Allan authored
      Use BIT() macro instead.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fcdb0a99
    • Bruce Allan's avatar
      fm10k: Move constants to the right of binary operators · 1aab144c
      Bruce Allan authored
      The semantic patch that makes this change is available
      in scripts/coccinelle/misc/compare_const_fl.cocci.
      
      More information about semantic patching is available at
      http://coccinelle.lip6.fr/Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1aab144c
    • David S. Miller's avatar
      Merge branch 'mlxsw-next' · 265bee72
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: small driver update, including switchdev doc update
      
      Ido Schimmel (3):
        mlxsw: spectrum: Reduce number of supported 802.1D bridges
        switchdev: Use switch ID in suggested udev rule
        mlxsw: spectrum: Add support for physical port names
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      265bee72
    • Ido Schimmel's avatar
      mlxsw: spectrum: Add support for physical port names · 2bf9a586
      Ido Schimmel authored
      Export to userspace the front panel name of the port, so that udev can
      rename the ports accordingly. The convention suggested by switchdev
      documentation is used:
      
      1) Non-split: pX
      2) Split: pXsY
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bf9a586
    • Ido Schimmel's avatar
      switchdev: Use switch ID in suggested udev rule · 75f3a101
      Ido Schimmel authored
      Since there can be multiple switch ASICs on the same system we should
      use the switch ID in order to differentiate between them and set the
      switch name (e.g. swX) accordingly.
      
      Also, replace the order of the "Switch ID" and "Port Netdev Naming"
      sections following the above change.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75f3a101
    • Ido Schimmel's avatar
      mlxsw: spectrum: Reduce number of supported 802.1D bridges · b555cf4a
      Ido Schimmel authored
      Resources allocated for these bridges at init time cannot be later used
      for other purposes. While current number is supported by the device,
      it's mostly theoretical with regards to any real use case, which leads
      to poor utilization of device's resources. Solve that by reducing the
      number.
      
      The long term plan is to make this value (along with others) user
      configurable via devlink and write it to NVRAM, so that it can be used
      during the next init. Until then we must hardcode such values.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b555cf4a
    • David S. Miller's avatar
      Merge branch 'tcp-udp-misc' · 15f41e2b
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: various udp/tcp changes
      
      First round of patches for linux-4.7
      
      Add a generic facility for sockets to be freed after an RCU grace
      period, if they need to.
      
      Then UDP stack is changed to no longer use SLAB_DESTROY_BY_RCU,
      in order to speedup rx processing for traffic encapsulated in UDP.
      It gives a 17 % speedup for normal UDP reception in stress conditions.
      
      Then TCP listeners are changed to use SOCK_RCU_FREE as well
      to avoid touching sk_refcnt in synflood case :
      I got up to 30 % performance increase for a mono listener.
      
      Then three patches add SK_MEMINFO_DROPS to sock_diag
      and add per socket rx drops accounting to TCP.
      
      Last patch adds rate limiting on ACK sent on behalf of SYN_RECV
      to better resist to SYNFLOOD targeting one or few flows.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15f41e2b
    • Eric Dumazet's avatar
      tcp: rate limit ACK sent by SYN_RECV request sockets · 4ce7e93c
      Eric Dumazet authored
      Attackers like to use SYNFLOOD targeting one 5-tuple, as they
      hit a single RX queue (and cpu) on the victim.
      
      If they use random sequence numbers in their SYN, we detect
      they do not match the expected window and send back an ACK.
      
      This patch adds a rate limitation, so that the effect of such
      attacks is limited to ingress only.
      
      We roughly double our ability to absorb such attacks.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ce7e93c
    • Eric Dumazet's avatar
      ipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply() · a9d6532b
      Eric Dumazet authored
      TCP uses per cpu 'sockets' to send some packets :
      - RST packets ( tcp_v4_send_reset()) )
      - ACK packets for SYN_RECV and TIMEWAIT sockets
      
      By setting SOCK_USE_WRITE_QUEUE flag, we tell sock_wfree()
      to not call sk_write_space() since these internal sockets
      do not care.
      
      This gives a small performance improvement, merely by allowing
      cpu to properly predict the sock_wfree() conditional branch,
      and avoiding one atomic operation.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9d6532b
    • Eric Dumazet's avatar
      tcp: increment sk_drops for listeners · 9caad864
      Eric Dumazet authored
      Goal: packets dropped by a listener are accounted for.
      
      This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock()
      so that children do not inherit their parent drop count.
      
      Note that we no longer increment LINUX_MIB_LISTENDROPS counter when
      sending a SYNCOOKIE, since the SYN packet generated a SYNACK.
      We already have a separate LINUX_MIB_SYNCOOKIESSENT
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9caad864
    • Eric Dumazet's avatar
      tcp: increment sk_drops for dropped rx packets · 532182cd
      Eric Dumazet authored
      Now ss can report sk_drops, we can instruct TCP to increment
      this per socket counter when it drops an incoming frame, to refine
      monitoring and debugging.
      
      Following patch takes care of listeners drops.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      532182cd
    • Eric Dumazet's avatar
      sock_diag: add SK_MEMINFO_DROPS · 15239302
      Eric Dumazet authored
      Reporting sk_drops to user space was available for UDP
      sockets using /proc interface.
      
      Add this to sock_diag, so that we can have the same information
      available to ss users, and we'll be able to add sk_drops
      indications for TCP sockets as well.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15239302
    • Eric Dumazet's avatar
      tcp/dccp: do not touch listener sk_refcnt under synflood · 3b24d854
      Eric Dumazet authored
      When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
      cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.
      
      By letting listeners use SOCK_RCU_FREE infrastructure,
      we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt
      
      Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
      only listeners are impacted by this change.
      
      Peak performance under SYNFLOOD is increased by ~33% :
      
      On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps
      
      Most consuming functions are now skb_set_owner_w() and sock_wfree()
      contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b24d854
    • Eric Dumazet's avatar
      inet: reqsk_alloc() needs to take care of dead listeners · 3a5d1c0e
      Eric Dumazet authored
      We'll soon no longer take a refcount on listeners,
      so reqsk_alloc() can not assume a listener refcount is not
      zero. We need to use atomic_inc_not_zero()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a5d1c0e
    • Eric Dumazet's avatar
      tcp/dccp: use rcu locking in inet_diag_find_one_icsk() · 2d331915
      Eric Dumazet authored
      RX packet processing holds rcu_read_lock(), so we can remove
      pairs of rcu_read_lock()/rcu_read_unlock() in lookup functions
      if inet_diag also holds rcu before calling them.
      
      This is needed anyway as __inet_lookup_listener() and
      inet6_lookup_listener() will soon no longer increment
      refcount on the found listener.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d331915
    • Eric Dumazet's avatar
      tcp/dccp: remove BH disable/enable in lookup · ee3cf32a
      Eric Dumazet authored
      Since linux 2.6.29, lookups only use rcu locking.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee3cf32a
    • Eric Dumazet's avatar
      udp: no longer use SLAB_DESTROY_BY_RCU · ca065d0c
      Eric Dumazet authored
      Tom Herbert would like not touching UDP socket refcnt for encapsulated
      traffic. For this to happen, we need to use normal RCU rules, with a grace
      period before freeing a socket. UDP sockets are not short lived in the
      high usage case, so the added cost of call_rcu() should not be a concern.
      
      This actually removes a lot of complexity in UDP stack.
      
      Multicast receives no longer need to hold a bucket spinlock.
      
      Note that ip early demux still needs to take a reference on the socket.
      
      Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
      but this might be changed later.
      
      Performance for a single UDP socket receiving flood traffic from
      many RX queues/cpus.
      
      Simple udp_rx using simple recvfrom() loop :
      438 kpps instead of 374 kpps : 17 % increase of the peak rate.
      
      v2: Addressed Willem de Bruijn feedback in multicast handling
       - keep early demux break in __udp4_lib_demux_lookup()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Tested-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca065d0c
    • Eric Dumazet's avatar
      net: add SOCK_RCU_FREE socket flag · a4298e45
      Eric Dumazet authored
      We want a generic way to insert an RCU grace period before socket
      freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too
      much overhead.
      
      SLAB_DESTROY_BY_RCU strict rules force us to take a reference
      on the socket sk_refcnt, and it is a performance problem for UDP
      encapsulation, or TCP synflood behavior, as many CPUs might
      attempt the atomic operations on a shared sk_refcnt
      
      UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their
      lookup can use traditional RCU rules, without refcount changes.
      They can set the flag only once hashed and visible by other cpus.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Tested-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4298e45
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 43e2dfb2
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      10GbE Intel Wired LAN Driver Updates 2016-04-04
      
      This series contains updates to ixgbe and ixgbevf.
      
      Pavel Tikhomirov fixes a typo where we were incrementing transmit stats
      instead of receive stats on the receive side.
      
      Emil updates the ixgbevf driver to use bit operations for setting and
      checking the adapter state.
      
      Chas Williams adds the new NDO trust feature check so that the VF guest
      has the ability to set the unicast address of the interface, if it is a
      trusted VF.
      
      Alex cleans up the driver to that the only time we add a PF entry to the
      VLVF is either for VLAN 0 or if the PF has requested a VLAN that a VF
      is already using.  Also adds support for generic transmit checksums,
      giving the added advantage is that we can support inner checksum offloads
      for tunnels and MPLS while still being able to transparently insert
      VLAN tags.  Lastly, changed ixgbe so that we can use the ethtool
      rx-vlan-filter flag to toggle receive VLAN filtering on and off.
      
      Mark cleans up the ixgbe driver by making all op structures that do not
      change constants.  Also fixed flow control for Xeon D KR backplanes, since
      we cannot use auto-negotiation to determine the mode, we have to use
      whatever the user configured.
      
      Sowmini Varadhan updates ixgbe to use eth_platform_get_mac_address()
      instead of the arch specific solution that was added by a previous
      commit.
      
      Don fixed an issue where it was possible that a system reset could occur
      when we were holding the SWFW semaphore lock, which the next time the
      driver loaded would see it incorrectly as locked.
      
      v2: updated patch 8 of the series to include a minor flags issue where
          we had lost NETIF_F_HW_TC and we were setting NETIF_F_SCTP_CRC in
          two different areas, when we only needed/wanted it in one spot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43e2dfb2
    • David S. Miller's avatar
      Merge branch 'mv88e6131-hw-bridging-6185' · 6e338048
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6131: HW bridging support for 6185
      
      All packets passing through a switch of the 6185 family are currently all
      directed to the CPU port. This means that port bridging is software driven.
      
      To enable hardware bridging for this switch family, we need to implement the
      port mapping operations, the FDB operations, and optionally the VLAN operations
      (for 802.1Q and VLAN filtering aware systems).
      
      However this family only has 256 FDBs indexed by 8-bit identifiers, opposed to
      4096 FDBs with 12-bit identifiers for other families such as 6352. It also
      doesn't have dedicated FID registers for ATU and VTU operations.
      
      This patchset fixes these differences, and enable hardware bridging for 6185.
      
      Changes v1 -> v2:
       - Describe the different numbers of databases and prefer a feature-based logic
         over the current ID/family-based logic.
      ====================
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e338048
    • Vivien Didelot's avatar
      net: dsa: mv88e6131: enable hardware bridging · 26892ffc
      Vivien Didelot authored
      By adding support for bridge operations, FDB operations, and optionally
      VLAN operations (for 802.1Q and VLAN filtering aware systems), the
      switch bridges ports correctly, the CPU is able to populate the hardware
      address databases, and thus hardware bridging becomes functional within
      the 88E6185 family of switches.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26892ffc
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: map destination addresses for 6185 · f93dd042
      Vivien Didelot authored
      The 88E6185 switch also has a MapDA bit in its Port Control 2 register.
      When this bit is cleared, all frames are sent out to the CPU port.
      
      Set this bit to rely on address databases (ATU) hits and direct frames
      out of the correct ports, and thus allow hardware bridging.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f93dd042
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: support 256 databases · 11ea809f
      Vivien Didelot authored
      The 6185 family of devices has only 256 address databases. Their 8-bit
      FID for ATU and VTU operations are split into ATU Control and ATU/VTU
      Operation registers.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11ea809f