Commits · 8306f99a517b91ebf8fa94d017c2c84ca62e107c · Kirill Smelkov / linux

16 Oct, 2015 22 commits

tipc: disallow packet duplicates in link deferred queue · 8306f99a

Jon Paul Maloy authored Oct 15, 2015

After the previous commits, we are guaranteed that no packets
of type LINK_PROTOCOL or with illegal sequence numbers will be
attempted added to the link deferred queue. This makes it possible to
make some simplifications to the sorting algorithm in the function
tipc_skb_queue_sorted().

We also alter the function so that it will drop packets if one with
the same seqeunce number is already present in the queue. This is
necessary because we have identified weird packet sequences, involving
duplicate packets, where a legitimate in-sequence packet may advance to
the head of the queue without being detected and de-queued.

Finally, we make this function outline, since it will now be called only
in exceptional cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8306f99a

tipc: improve sequence number checking · 81204c49

Jon Paul Maloy authored Oct 15, 2015

The sequence number of an incoming packet is currently only checked
for less than, equality to, or bigger than the next expected number,
meaning that the receive window in practice becomes one half sequence
number cycle, or U16_MAX/2. This does not make sense, and may not even
be safe if there are extreme delays in the network. Any packet sent by
the peer during the ongoing cycle must belong inside his current send
window, or should otherwise be dropped if possible.

Since a link endpoint cannot know its peer's current send window, it
has to base this sanity check on a worst-case assumption, i.e., that
the peer is using a maximum sized window of 8191 packets. Using this
assumption, we now add a check that the sequence number is not bigger
than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
done, so that the receive window test is performed before the gap test.
This way, we are guaranteed that no packet with illegal sequence numbers
are ever added to the deferred queue.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

81204c49

tipc: simplify tipc_link_rcv() reception loop · f9aa358a

Jon Paul Maloy authored Oct 15, 2015

Currently, all packets received in tipc_link_rcv() are unconditionally
added to the packet deferred queue, whereafter that queue is walked and
all its buffers evaluated for delivery. This is both non-optimal and
and makes the queue sorting function unnecessary complex.

This commit changes the loop so that an arrived packet is evaluated
first, and added to the deferred queue only when a sequence number gap
is discovered. A non-empty deferred queue is walked until it is empty
or until its head's sequence number doesn't fit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9aa358a

tipc: limit usage of temporary skb list during packet reception · 9945e804

Jon Paul Maloy authored Oct 15, 2015

During packet reception, the function tipc_link_rcv() adds its accepted
packets to a temporary buffer queue, before finally splicing this queue
into the lock protected input queue that will be delivered up to the
socket layer. The purpose is to reduce potential contention on the input
queue lock. However, since the vast majority of packets arrive in
sequence, they will anyway be added one by one to the input queue, and
the use of the temporary queue becomes a sub-optimization.

The only case where this queue makes sense is when unpacking buffers
from a bundle packet; here we want to avoid dozens of small buffers
to be added individually to the lock-protected input queue in a tight
loop.

In this commit, we remove the general usage of the temporary queue,
and keep it only for the packet unbundling case.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9945e804

mlx4: corretly check failed allocation · 175f8d67

Insu Yun authored Oct 15, 2015

When allocation fails, mlx4_alloc_cmd_mailbox returns -ENOMEM.
Since there is no case that mlx4_alloc_cmd_mailbox returns NULL,
it needs to be checked by IS_ERR, not IS_ERR_OR_NULL
Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

175f8d67

bonding: support encapsulated ipv6 TSO · e87eb405

Eric Dumazet authored Oct 15, 2015

If using a sixtofour device on top of a bonding device,
skb segmentation of TCP traffic is done right before calling
bonding xmit, because bonding only enables TSO for IPv4.

This patch improves single flow performance by about 120 % on my hosts,
because segmentation is deferred right before calling slave xmit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e87eb405

Merge branch 'mlxsw-cleanups' · 181e4246

David S. Miller authored Oct 15, 2015

Jiri Pirko says:

====================
mlxsw: Driver update, cleanups

This patchset contains various cleanups and improvements in mlxsw driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

181e4246

mlxsw: cmd: Update CONFIG_PROFILE command documentation · 5cd16d8c

Ido Schimmel authored Oct 15, 2015

The meaning of certain parameters in the profile passed to the device
during initialization has changed, so update their documentation
accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cd16d8c

mlxsw: Add trap group for control packets · 801bd3de

Ido Schimmel authored Oct 15, 2015

Previously, we trapped flooded and control packets using the same trap
group. This can cause flooded packets to overflow the PCI bus and
prevent control packets (e.g. STP, LACP) from getting to the CPU.

Solve this by splitting the RX trap group to RX and control, which allows
us to configure a policer on the first, thereby preventing it from
overflowing the PCI bus.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

801bd3de

mlxsw: Simplify traps creation · f24af330

Ido Schimmel authored Oct 15, 2015

The Host Trap Group Table (HTGT) register configures trap groups, which
are populated with trap IDs using the Host PacKet Trap (HPKT) register.
However, a trap ID can only be present inside one trap group (the last
configured).

Instead of passing both the trap group and ID for the function that
packs HPKT, pass only the trap ID and derive from it the trap group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f24af330

mlxsw: Introduce mlxsw_reg_spms_vid_pack helper and use it · ebb7963f

Jiri Pirko authored Oct 15, 2015

Introduce separate helper for packing SPMS VIDs, as it can be used for
multiple VIDs and not only for one as previous SPMS pack function
provided.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebb7963f

mlxsw: reg: Adjust definition of enum mlxsw_reg_sfgc_type · fa6ad058

Ido Schimmel authored Oct 15, 2015

Define max which would be needed later on.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa6ad058

mlxsw: reg: Remove extra space in SFGC ID define · 36b78e8a

Jiri Pirko authored Oct 15, 2015

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36b78e8a

mlxsw: reg: Uppercase letters in register IDs · 3f0effd1

Jiri Pirko authored Oct 15, 2015

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f0effd1

mlxsw: Use dev_level_ratelimited instead of net_ratelimit & dev_level · 6cf9dc8b
Jiri Pirko authored Oct 15, 2015
```
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
6cf9dc8b

mlxsw: core: Do not use EMADs in mlxsw_emad_fini · 18ea5445

Jiri Pirko authored Oct 15, 2015

Be symmetric with mlxsw_emad_init and don't use EMADs in mlxsw_emad_fini
cleanup function. Use command interface instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18ea5445

mlxsw: pci: Limit number of entries being sent in single MAP_FA cmd · 3e2206da

Jiri Pirko authored Oct 15, 2015

Firmware accepts only limited number of mapping entries for MAP_FA
command. In order to prevent overflow, introduce a limit and in case the
number of entries is bigger, call MAP_FA multiple times.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e2206da

mlxsw: pci: Remove MLXSW_PCI_RDQS/SDQS defines and checks · c85c3882

Jiri Pirko authored Oct 15, 2015

Remove strict number check of queues count as various ASICs have
different counts.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c85c3882

mlxsw: pci: Do not use MLXSW_PCI_SDQS_COUNT define · 424e1114

Jiri Pirko authored Oct 15, 2015

Use mlxsw_pci_sdq_count helper instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

424e1114

mlxsw: pci: Use MLXSW_PCI_CQS_MAX instead of MLXSW_PCI_CQS_COUNT · e4c870b1

Jiri Pirko authored Oct 15, 2015

The count of CQs can be different for various ASICs, so just define
maximal value and check for that.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4c870b1

mlxsw: switchx2: Use ETH_ALEN for mac address length · ffe05328

Jiri Pirko authored Oct 15, 2015

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffe05328

mlxsw: Remove multicast ID configuration · 33a704a5

Ido Schimmel authored Oct 15, 2015

With respect to a firmware change, the Switch Multicast ID (SMID)
register is no longer needed, so the related configuration code can be
removed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

33a704a5

15 Oct, 2015 18 commits

amd-xgbe: Use system workqueue for device restart · 96aec911

Lendacky, Thomas authored Oct 14, 2015

A previous patch switched from using the system workqueue to the device
workqueue for various operations. During a device restart the device
workqueue is flushed so the restart cannot use this workqueue or else
a deadlock results.  Move the device restart back to using the system
workqueue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96aec911

Merge branch 'switchdev-locking' · 74661bee

David S. Miller authored Oct 15, 2015

Jiri Pirko says:

====================
switchdev: change locking

This is something which I'm currently struggling with.
Callers of attr_set and obj_add/del often hold not only RTNL, but also
spinlock (bridge). So in that case, the driver implementing the op cannot sleep.

The way rocker is dealing with this now is just to invoke driver operation
and go out, without any checking or reporting of the operation status.

Since it would be nice to at least put a warning in case the operation fails,
it makes sense to do this in delayed work directly in switchdev core
instead of implementing this in separate drivers. And that is what this patchset
is introducing.

So from now on, the locking of switchdev mod ops is consistent. Caller either
holds rtnl mutex or in case it does not, caller sets defer flag, telling
switchdev core to process the op later, in deferred queue.

Function to force to process switchdev deferred ops can be called by op
caller in appropriate location, for example after it releases
spin lock, to force switchdev core to process pending ops.

v1->v2:
- rebased on current net-next head (including Scott's ageing patchset)
v2->v3:
- fixed comment s/of/or/ typo suggested by Nik
v3->v4:
- the actual patchset is sent instead of different branch I send in v3 :/
v4->v5:
- added patch to "const" attr param
- reworked deferred ops infrastructure (mainly patch number 1 and
  internal users (patch 3 and 5)) - resolves the issue pointed out
  by John
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

74661bee

switchdev: assert rtnl mutex when going over lower netdevs · 771acac2

Jiri Pirko authored Oct 14, 2015

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

771acac2

rocker: remove nowait from switchdev callbacks. · d33eeb64

Jiri Pirko authored Oct 14, 2015

No need to avoid sleeping in switchdev callbacks now, as the switchdev
core allows it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d33eeb64

bridge: defer switchdev fdb del call in fdb_del_external_learn · 56607386

Jiri Pirko authored Oct 14, 2015

Since spinlock is held here, defer the switchdev operation. Also, ensure
that defered switchdev ops are processed before port master device
is unlinked.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56607386

switchdev: introduce possibility to defer obj_add/del · 4d429c5d

Jiri Pirko authored Oct 14, 2015

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d429c5d

switchdev: remove pointers from switchdev objects · 850d0cbc

Jiri Pirko authored Oct 14, 2015

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

850d0cbc

switchdev: allow caller to explicitly request attr_set as deferred · 0bc05d58

Jiri Pirko authored Oct 14, 2015

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Benefit from newly introduced switchdev deferred ops infrastructure.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0bc05d58

switchdev: make struct switchdev_attr parameter const for attr_set calls · f7fadf30

Jiri Pirko authored Oct 14, 2015

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7fadf30

switchdev: introduce switchdev deferred ops infrastructure · 793f4014

Jiri Pirko authored Oct 14, 2015

Introduce infrastructure which will be used internally to defer ops.
Note that the deferred ops are queued up and either are processed by
scheduled work or explicitly by user calling deferred_process function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

793f4014

net: hisilicon: fixes a bug when using ethtool -S · adc9048c

lipeng authored Oct 15, 2015

this patch fixes a bug in hns driver. when we want to get statistic info
by using ethtool -S, it shows us there are 3 wrong counters info. because
the strings related to the registers are wrong. it needs to modify the
strings which give us wrong info.
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adc9048c

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e109a691

David S. Miller authored Oct 15, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-15

This series contains updates to i40e, i40evf and ixgbe.

Emil changes the ixgbe driver to disable LRO by default in favor or GRO.

Mark provides two changes for ixgbe, first fixes a semaphore issue when
a reset never completes, it is necessary to retake the semaphore before
returning.

Jesse fixes up a missing function header comment variable reference.  Then
enables ethtool priv flags to control flow director at runtime.

Neerav changes several i40e error messages to debug only since the
messages were printing when there was no functional issue and were meant
for debug only.

Catherine changes the i40e driver to make only X722 support 100M SGMII,
since it is the only device to actually support it.

Anjali modifies the i40e/i40evf driver to add writeback on ITR offload
support for X722 since the device has a way to work around the
descriptor writeback issue.

Mitch cleans up obsolete code.  Also reduces the i40evf init time by
shortening up the delays in the init task to aid in performance in
load/unload tests and mitigates DMAR errors in VF enable/disable tests.

Shannon modifies i40e to allow flow director sideband when the device
is in MFP mode and only has one partition enabled, since we still have
plenty of interrupts for managing the flow director activity.  Also
cleaned up flow director ATR control in debugfs since the priv flag
has been added to our ethtool interface.  Makes several general code
cleanups of redundant or unnecessary code for i40e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e109a691

ixgbe: Check for setup_internal_link method · a85ce532

Mark Rustad authored Sep 09, 2015

Only call the internal_setup_link method when it is provided. This
check is required for newer version parts.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a85ce532

i40e/i40evf: Bump i40e version to 1.3.28 and i40evf to 1.3.19 · 164f7393

Catherine Sullivan authored Sep 03, 2015

Bump.

Change-ID: I8d9a99f320af43960deba8718eee2d6de50eaf46
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

164f7393

i40evf: speed up init · 5be8308b

Mitch Williams authored Sep 03, 2015

Shorten up the delays in the init task, allowing the VF driver to
initialize faster. This aids performance in load/unload tests and
mitigates DMAR errors in VF enable/disable tests with absurdly short
delays. In the real world, the VF driver will come up more quickly.

The original values were set conservatively based on what we expected
from the firmware in terms of performance. Now that the driver is in use
and we know how well firmware responds to our requests, we can shorten
these delays.

Change-ID: Ibead77d34b19e8170e667c3f58bc14748bbc5bc9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5be8308b

i40e: remove unnecessary string copy operations · a9165490

Shannon Nelson authored Sep 03, 2015

Save a little stack space and remove unnecessary strncpy() with a little
string pointer.

Change-ID: Id2719d34710bfc273d3bb445fec085cd04276e88
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a9165490

i40e: X722 is on the IOSF bus and does not report the PCI bus info · 3fced535

Anjali Singhai Jain authored Sep 03, 2015

X722 will report Gen 1x1 in the PCI config space as it is on
IOSF bus, so skip the PCI bus link/speed check.

Change-ID: Icd5f5751dc7fb00dccf0d5dc5a0a644948e7062e
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3fced535

i40e: Store off PHY capabilities · 3ac67d7b

Kevin Scott authored Sep 03, 2015

Store off reported PHY capabilities in link_info structure.

Change-ID: Ife0f037c26983ca985dbf79abf33f8f8791369e8
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3ac67d7b