Commits · 77cf13ad57315736bcd79f3cff9b2284218e0be6 · Kirill Smelkov / linux

29 Dec, 2016 1 commit

ath10k: fix IRAM banks number for QCA9377 · 77cf13ad

Bartosz Markowski authored Dec 15, 2016

QCA9377 firmware shall alloc 4 IRAM banks
Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

77cf13ad

15 Dec, 2016 10 commits

ath10k: Avoid potential page alloc BUG_ON in tx free path · 02a9e08d

Mohammed Shafi Shajakhan authored Dec 07, 2016

'ath10k_htt_tx_free_cont_txbuf' and 'ath10k_htt_tx_free_cont_frag_desc'
have NULL pointer checks to avoid crash if they are called twice
but this is as of now not sufficient as these pointers are not assigned
to NULL once the contiguous DMA memory allocation is freed, fix this.
Though this may not be hit with the explicity check of state variable
'tx_mem_allocated' check, good to have this addressed as well.

Below BUG_ON is hit when the above scenario is simulated
with kernel debugging enabled

 page:f6d09a00 count:0 mapcount:-127 mapping:  (null)
index:0x0
 flags: 0x40000000()
 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page)
== 0)
 ------------[ cut here ]------------
 kernel BUG at ./include/linux/mm.h:445!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
 EIP is at put_page_testzero.part.88+0xd/0xf
 Call Trace:
  [<c118a2cc>] __free_pages+0x3c/0x40
  [<c118a30e>] free_pages+0x3e/0x50
  [<c10222b4>] dma_generic_free_coherent+0x24/0x30
  [<f8c1d9a8>] ath10k_htt_tx_free_cont_txbuf+0xf8/0x140

  [<f8c1e2a9>] ath10k_htt_tx_destroy+0x29/0xa0

  [<f8c143e0>] ath10k_core_destroy+0x60/0x80 [ath10k_core]
  [<f8acd7e9>] ath10k_pci_remove+0x79/0xa0 [ath10k_pci]
  [<c13ed7a8>] pci_device_remove+0x38/0xb0
  [<c14d3492>] __device_release_driver+0x72/0x100
  [<c14d36b7>] driver_detach+0x97/0xa0
  [<c14d29c0>] bus_remove_driver+0x40/0x80
  [<c14d427a>] driver_unregister+0x2a/0x60
  [<c13ec768>] pci_unregister_driver+0x18/0x70
  [<f8aced4f>] ath10k_pci_exit+0xd/0x2be [ath10k_pci]
  [<c1101e78>] SyS_delete_module+0x158/0x210
  [<c11b34f1>] ? __might_fault+0x41/0xa0
  [<c11b353b>] ? __might_fault+0x8b/0xa0
  [<c1001a4b>] do_fast_syscall_32+0x9b/0x1c0
  [<c178da34>] sysenter_past_esp+0x45/0x74
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

02a9e08d

ath9k: Turn ath_txq_lock/unlock() into static inlines. · 5c4607eb

Toke Høiland-Jørgensen authored Dec 05, 2016

These are one-line functions that just call spin_lock/unlock_bh(); turn
them into static inlines to avoid the function call overhead.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

5c4607eb

ath9k: Introduce airtime fairness scheduling between stations · 63fefa05

Toke Høiland-Jørgensen authored Dec 05, 2016

This reworks the ath9k driver to schedule transmissions to connected
stations in a way that enforces airtime fairness between them. It
accomplishes this by measuring the time spent transmitting to or
receiving from a station at TX and RX completion, and accounting this to
a per-station, per-QoS level airtime deficit. Then, an FQ-CoDel based
deficit scheduler is employed at packet dequeue time, to control which
station gets the next transmission opportunity.

Airtime fairness can significantly improve the efficiency of the network
when station rates vary. The following throughput values are from a
simple three-station test scenario, where two stations operate at the
highest HT20 rate, and one station at the lowest, and the scheduler is
employed at the access point:

                  Before   /   After
Fast station 1:    19.17   /   25.09 Mbps
Fast station 2:    19.83   /   25.21 Mbps
Slow station:       2.58   /    1.77 Mbps
Total:             41.58   /   52.07 Mbps

The benefit of airtime fairness goes up the more stations are present.
In a 30-station test with one station artificially limited to 1 Mbps,
we have seen aggregate throughput go from 2.14 to 17.76 Mbps.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

63fefa05

ath9k: define all EEPROM fields in Little Endian format · 4bca5303

Martin Blumenstingl authored Dec 05, 2016

The ar9300_eeprom logic is already using only 8-bit (endian neutral),
__le16 and __le32 fields to state explicitly how the values should be
interpreted.
All other EEPROM implementations (4k, 9287 and def) were using u16 and
u32 fields with additional logic to swap the values (read from the
original EEPROM) so they match the current CPUs endianness.

The EEPROM format defaults to "all values are Little Endian", indicated
by the absence of the AR5416_EEPMISC_BIG_ENDIAN in the u8 EEPMISC
register. If we detect that the EEPROM indicates Big Endian mode
(AR5416_EEPMISC_BIG_ENDIAN is set in the EEPMISC register) then we'll
swap the values to convert them into Little Endian. This is done by
activating the EEPMISC based logic in ath9k_hw_nvram_swap_data even if
AH_NO_EEP_SWAP is set (this makes ath9k behave like the FreeBSD driver,
which also does not have a flag to enable swapping based on the
AR5416_EEPMISC_BIG_ENDIAN bit). Before this logic was only used to
enable swapping when "current CPU endianness != EEPROM endianness".

After changing all relevant fields to __le16 and __le32 sparse was used
to check that all code which reads any of these fields uses
le{16,32}_to_cpu.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

4bca5303

ath9k: Make the EEPROM swapping check use the eepmisc register · 68fbe792

Martin Blumenstingl authored Dec 05, 2016

There are two ways of swapping the EEPROM data in the ath9k driver:
1) swab16 based on the first two EEPROM "magic" bytes (same for all
   EEPROM formats)
2) field and EEPROM format specific swab16/swab32 (different for
   eeprom_def, eeprom_4k and eeprom_9287)

The result of the first check was used to also enable the second swap.
This behavior seems incorrect, since the data may only be byte-swapped
(afterwards the data could be in the correct endianness).
Thus we introduce a separate check based on the "eepmisc" register
(which is part of the EEPROM data). When bit 0 is set, then the EEPROM
format specific values are in "big endian". This is also done by the
FreeBSD kernel, see [0] for example.

This allows us to parse EEPROMs with the "correct" magic bytes but
swapped EEPROM format specific values. These EEPROMs (mostly found in
lantiq and broadcom based big endian MIPS based devices) only worked
due to platform specific "hacks" which swapped the EEPROM so the
magic was inverted, which also enabled the format specific swapping.
With this patch the old behavior is still supported, but neither
recommended nor needed anymore.

[0]
https://github.com/freebsd/freebsd/blob/50719b56d9ce8d7d4beb53b16e9edb2e9a4a7a18/sys/dev/ath/ath_hal/ah_eeprom_9287.c#L351Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

68fbe792

ath9k: consistently use get_eeprom_rev(ah) · 9bff7428

Martin Blumenstingl authored Dec 05, 2016

The AR5416_VER_MASK macro does the same as get_eeprom_rev, except that
one has to know the actual EEPROM type (and providing a reference to
that in a variable named "eep"). Additionally the eeprom_*.c
implementations used the same shifting logic multiple times to get the
eeprom revision which was also unnecessary duplication of
get_eeprom_rev.

Also use the AR5416_EEP_VER_MINOR_MASK macro where needed and introduce
a similar macro (AR5416_EEP_VER_MAJOR_MASK) for the major version.
Finally drop AR9287_EEP_VER_MINOR_MASK since it simply duplicates the
already defined AR5416_EEP_VER_MINOR_MASK.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

9bff7428

ath9k: replace eeprom_param EEP_MINOR_REV with get_eeprom_rev · 7d7dc538

Martin Blumenstingl authored Dec 05, 2016

get_eeprom(ah, EEP_MINOR_REV) and get_eeprom_rev(ah) are both doing the
same thing: returning the EEPROM revision (12 lowest bits). Make the
code consistent by using get_eeprom_rev(ah) everywhere.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

7d7dc538

ath9k: Add an eeprom_ops callback for retrieving the eepmisc value · d8ec2e2a

Martin Blumenstingl authored Dec 05, 2016

This allows deciding if we have to swap the EEPROM data (so it matches
the system's native endianness) even if no byte-swapping (swab16, based on
the first two bytes in the EEPROM) is needed.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

d8ec2e2a

ath9k: indicate that the AR9003 EEPROM template values are little endian · 291478b7

Martin Blumenstingl authored Dec 05, 2016

The eepMisc field was not set explicitly. The default value of 0 means
that the values in the EEPROM (template) should be interpreted as little
endian. However, this is not clear until comparing the AR9003 code with
the other EEPROM formats.
To make the code easier to understand we explicitly state that the values
are little endian - there are no functional changes with this patch.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

291478b7

ath9k: Add a #define for the EEPROM "eepmisc" endianness bit · 81a834e3

Martin Blumenstingl authored Dec 05, 2016

This replaces a magic number with a named #define. Additionally it
removes two "eeprom format" specific #defines for the "big endianness"
bit which are the same on all eeprom formats.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

81a834e3

05 Dec, 2016 2 commits

adm80211: add checks for dma mapping errors · d15697de

Alexey Khoroshilov authored Dec 03, 2016

The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

d15697de

mwifiex: clean up some messy indenting · e54a8c4b

Dan Carpenter authored Nov 30, 2016

These lines were indented one tab extra.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

e54a8c4b

04 Dec, 2016 26 commits

ipv6 addrconf: Implemented enhanced DAD (RFC7527) · adc176c5

Erik Nordmark authored Dec 02, 2016

Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

adc176c5

Merge branch 'mv88e6390-batch-three' · ce84c7c6

David S. Miller authored Dec 03, 2016

Andrew Lunn says:

====================
mv88e6390 batch 3

More patches to support the MV88e6390. This is mostly refactoring
existing code and adding implementations for the mv88e6390.  This
patchset set which reserved frames are sent to the cpu, the size of
jumbo frames that will be accepted, turn off egress rate limiting, and
configuration of pause frames.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce84c7c6

net: dsa: mv88e6xxx: Implement mv88e6390 pause control · 3ce0e65e

Andrew Lunn authored Dec 03, 2016

The mv88e6390 has a number flow control registers accessed via the
Flow Control register. Use these to set the pause control.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ce0e65e

net: dsa: mv88e6xxx: Refactor pause configuration · b35d322a

Andrew Lunn authored Dec 03, 2016

The mv88e6390 has a different mechanism for configuring pause.
Refactor the code into an ops function, and for the moment, don't add
any mv88e6390 code yet.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b35d322a

net: dsa: mv88e6xxx: Refactor egress rate limiting · ef70b111

Andrew Lunn authored Dec 03, 2016

There are two different rate limiting configurations, depending on the
switch generation. Refactor this into ops.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef70b111

net: dsa: mv88e6xxx: Refactor setting of jumbo frames · 5f436666

Andrew Lunn authored Dec 03, 2016

Some switches support jumbo frames. Refactor this code into operations
in the ops structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f436666

net: dsa: mv88e6xxx: Reserved Management frames to CPU · 6e55f698

Andrew Lunn authored Dec 03, 2016

Older devices have a couple of registers in global2. The mv88e6390
family has a single register in global1 behind which hides similar
configuration. Implement and op for this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e55f698

Merge branch 'mv88e6390-batch-two' · 7a6c5cb9

David S. Miller authored Dec 03, 2016

Andrew Lunn says:

====================
MV88E6390 batch two

This is the second batch of patches adding support for the
MV88e6390. They are not sufficient to make it work properly.

The mv88e6390 has a much expanded set of priority maps. Refactor the
existing code, and implement basic support for the new device.

Similarly, the monitor control register has been reworked.

The mv88e6390 has something odd in its EDSA tagging implementation,
which means it is not possible to use it. So we need to use DSA
tagging. This is the first device with EDSA support where we need to
use DSA, and the code does not support this. So two patches refactor
the existing code. The two different register definitions are
separated out, and using DSA on an EDSA capable device is added.

v2:
Add port prefix
Add helper function for 6390
Add _IEEE_ into #defines
Split monitor_ctrl into a number of separate ops.
Remove 6390 code which is management, used in a later patch
s/EGREES/EGRESS/.
Broke up setup_port_dsa() and set_port_dsa() into a number of ops

v3:
Verify mandatory ops for port setup
Don't set ether type for DSA port.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7a6c5cb9

net: dsa: mv88e6xxx: Refactor CPU and DSA port setup · 56995cbc

Andrew Lunn authored Dec 03, 2016

Older chips only support DSA tagging. Newer chips have both DSA and
EDSA tagging. Refactor the code by adding port functions for setting the
frame mode, egress mode, and if to forward unknown frames.

This results in the helper mv88e6xxx_6065_family() becoming unused, so
remove it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v3:
Verify mandatory ops for port setup
Don't set ether type for DSA port.
Signed-off-by: David S. Miller <davem@davemloft.net>

56995cbc

net: dsa: mv88e6xxx: Move the tagging protocol into info · 443d5a1b

Andrew Lunn authored Dec 03, 2016

Older chips support a single tagging protocol, DSA. New chips support
both DSA and EDSA, an enhanced version. Having both as an option
changes the register layouts. Up until now, it has been assumed that
if EDSA is supported, it will be used. Hence the register layout has
been determined by which protocol should be used. However, mv88e6390
has a different implementation of EDSA, which requires we need to use
the DSA tagging. Hence separate the selection of the protocol from the
register layout.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

443d5a1b

net: dsa: mv88e6xxx: Monitor and Management tables · 33641994

Andrew Lunn authored Dec 03, 2016

The mv88e6390 changes the monitor control register into the Monitor
and Management control, which is an indirection register to various
registers.

Add ops to set the CPU port and the ingress/egress port for both
register layouts, to global1
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

33641994

net: dsa: mv88e6xxx: Implement mv88e6390 tag remap · ef0a7318

Andrew Lunn authored Dec 03, 2016

The mv88e6390 does not have the two registers to set the frame
priority map. Instead it has an indirection registers for setting a
number of different priority maps. Refactor the old code into an
function, implement the mv88e6390 version, and use an op to call the
right one.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef0a7318

Merge branch 'fib-notifier-event-replay' · 69248719

David S. Miller authored Dec 03, 2016

Jiri Pirko says:

====================
ipv4: fib: Replay events when registering FIB notifier

Ido says:

In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced
by a new FIB notification chain to which modules could register in order
to be notified about the addition and deletion of FIB entries. The
motivation for this change was that switchdev drivers need to be able to
reflect the entire FIB table and not only FIBs configured on top of the
port netdevs themselves. This is useful in case of in-band management.

The fundamental problem with this approach is that upon registration
listeners lose all the information previously sent in the chain and
thus have an incomplete view of the FIB tables, which can result in
packet loss. This patchset fixes that by dumping the FIB tables and
replaying notifications previously sent in the chain for the registered
notification block.

The entire dump process is done under RCU and thus the FIB notification
chain is converted to be atomic. The listeners are modified accordingly.
This is done in the first eight patches.

The ninth patch adds a change sequence counter to ensure the integrity
of the FIB dump. The last patch adds the dump itself to the FIB chain
registration function and modifies existing listeners to pass a callback
to be executed in case dump was inconsistent.

---
v3->v4:
- Register the notification block after the dump and protect it using
  the change sequence counter (Hannes Frederic Sowa).
- Since we now integrate the dump into the registration function, drop
  the sysctl to set maximum number of retries and instead set it to a
  fixed number. Lets see if it's really a problem before adding something
  we can never remove.
- For the same reason, dump FIB tables for all net namespaces.
- Add a comment regarding guarantees provided by mutex semantics.

v2->v3:
- Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa).
- Read the sequence counter under RTNL to ensure synchronization
  between the dump process and other processes changing the routing
  tables (Hannes Frederic Sowa).
- Pass a callback to the dump function to be executed prior to a retry.
- Limit the dump to a single net namespace.

v1->v2:
- Add a sequence counter to ensure the integrity of the FIB dump
  (David S. Miller, Hannes Frederic Sowa).
- Protect notifications from re-ordering in listeners by using an
  ordered workqueue (Hannes Frederic Sowa).
- Introduce fib_info_hold() (Jiri Pirko).
- Relieve rocker from the need to invoke the FIB dump by registering
  to the FIB notification chain prior to ports creation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

69248719

ipv4: fib: Replay events when registering FIB notifier · c3852ef7

Ido Schimmel authored Dec 03, 2016

Commit b90eb754 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.

However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.

Solve that by dumping the FIB tables and replaying the events to the
passed notification block. The dump itself is done using RCU in order
not to starve consumers that need RTNL to make progress.

The integrity of the dump is ensured by reading the FIB change sequence
counter before and after the dump under RTNL. This allows us to avoid
the problematic situation in which the dumping process sends a ENTRY_ADD
notification following ENTRY_DEL generated by another process holding
RTNL.

Callers of the registration function may pass a callback that is
executed in case the dump was inconsistent with current FIB tables.

The number of retries until a consistent dump is achieved is set to a
fixed number to prevent callers from looping for long periods of time.
In case current limit proves to be problematic in the future, it can be
easily converted to be configurable using a sysctl.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3852ef7

ipv4: fib: Allow for consistent FIB dumping · cacaad11

Ido Schimmel authored Dec 03, 2016

The next patch will enable listeners of the FIB notification chain to
request a dump of the FIB tables. However, since RTNL isn't taken during
the dump, it's possible for the FIB tables to change mid-dump, which
will result in inconsistency between the listener's table and the
kernel's.

Allow listeners to know about changes that occurred mid-dump, by adding
a change sequence counter to each net namespace. The counter is
incremented just before a notification is sent in the FIB chain.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cacaad11

ipv4: fib: Convert FIB notification chain to be atomic · d3f706f6

Ido Schimmel authored Dec 03, 2016

In order not to hold RTNL for long periods of time we're going to dump
the FIB tables using RCU.

Convert the FIB notification chain to be atomic, as we can't block in
RCU critical sections.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3f706f6

rocker: Register FIB notifier before creating ports · 17f8be7d

Ido Schimmel authored Dec 03, 2016

We can miss FIB notifications sent between the time the ports were
created and the FIB notification block registered.

Instead of receiving these notifications only when they are replayed for
the FIB notification block during registration, just register the
notification block before the ports are created.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17f8be7d

rocker: Implement FIB offload in deferred work · db701955

Ido Schimmel authored Dec 03, 2016

Convert rocker to offload FIBs in deferred work in a similar fashion to
mlxsw, which was converted in the previous commits.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db701955

rocker: Create an ordered workqueue for FIB offload · c1bb279c

Ido Schimmel authored Dec 03, 2016

As explained in the previous commits, we need to process FIB entries
addition / deletion events in FIFO order or otherwise we can have a
mismatch between the kernel's FIB table and the device's.

Create an ordered workqueue for rocker to which these work items will be
submitted to.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1bb279c

mlxsw: spectrum_router: Implement FIB offload in deferred work · 3057224e

Ido Schimmel authored Dec 03, 2016

FIB offload is currently done in process context with RTNL held, but
we're about to dump the FIB tables in RCU critical section, so we can no
longer sleep.

Instead, defer the operation to process context using deferred work. Make
sure fib info isn't freed while the work is queued by taking a reference
on it and releasing it after the operation is done.

Deferring the operation is valid because the upper layers always assume
the operation was successful. If it's not, then the driver-specific
abort mechanism is called and all routed traffic is directed to slow
path.

The work items are submitted to an ordered workqueue to prevent a
mismatch between the kernel's FIB table and the device's.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3057224e

mlxsw: core: Create an ordered workqueue for FIB offload · a3832b31

Ido Schimmel authored Dec 03, 2016

We're going to start processing FIB entries addition / deletion events
in deferred work. These work items must be processed in the order they
were submitted or otherwise we can have differences between the kernel's
FIB table and the device's.

Solve this by creating an ordered workqueue to which these work items
will be submitted to. Note that we can't simply convert the current
workqueue to be ordered, as EMADs re-transmissions are also processed in
deferred work.

Later on, we can migrate other work items to this workqueue, such as FDB
notification processing and nexthop resolution, since they all take the
same lock anyway.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3832b31

ipv4: fib: Add fib_info_hold() helper · 1c677b3d

Ido Schimmel authored Dec 03, 2016

As explained in the previous commit, modules are going to need to take a
reference on fib info and then drop it using fib_info_put().

Add the fib_info_hold() helper to make the code more readable and also
symmetric with fib_info_put().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c677b3d

ipv4: fib: Export free_fib_info() · b423cb10

Ido Schimmel authored Dec 03, 2016

The FIB notification chain is going to be converted to an atomic chain,
which means switchdev drivers will have to offload FIB entries in
deferred work, as hardware operations entail sleeping.

However, while the work is queued fib info might be freed, so a
reference must be taken. To release the reference (and potentially free
the fib info) fib_info_put() will be called, which in turn calls
free_fib_info().

Export free_fib_info() so that modules will be able to invoke
fib_info_put().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b423cb10

act_mirred: fix a typo in get_dev · 548ed722

WANG Cong authored Dec 03, 2016

Fixes: 255cb304 ("net/sched: act_mirred: Add new tc_action_ops get_dev()")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

548ed722

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · db7e9f7c

David S. Miller authored Dec 03, 2016

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-12-02

This series contains updates to i40e and i40evf only.

Alex provides changes so that we are much more robust about defining what
we can and cannot offload in i40e and i40evf by doing additional checks
other than L4 tunnel header length.

Jake provides several fixes/changes, first cleaning up a label that is
unnecessary, as well as cleaned up the use of a "magic number".  Clarified
the code by separating the global private flags and the regular private
flags per interface into two arrays, so that future additions will not
produce duplication and buggy code.  Adds additional checks to protect
against NULL values for msix_entries and q_vectors pointers.

Michal adds Clause22 method for accessing registers for some external
PHYs.

Piotr adds additional protocol support for the admin queue discover
capabilities function.

Tushar Dave fixes a panic seen on SPARC, where writel() should not be
used to write directly to a memory address but only to a memory mapped
I/O address otherwise it causes data access exceptions.

Joe Perches separates out a section of code into its own function, to
help reduce i40evf_reset_task() a bit.

Alan fixes an issue by checking for NULL before dereferencing msix_entries
and returning early in the case where it is NULL within the i40evf_close()
code path.

Henry provides code cleanup to remove unreachable and redundant sections
of code.  Fixed up an issue where new NICs were not identifying "unknown
PHYs" correctly.

Harshitha fixes a issue where the ethtool "Supported Link" modes list
backplane interfaces on X722 devices for 10 GbE with SFP+ and Cortina
retimer, where these interfaces should not be visible to the user since
they cannot use them.

Carolyn changes an X722 informational message so that it only appears
when extra messages are desired.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

db7e9f7c

tcp: fix the missing avr32 SOF_TIMESTAMPING_OPT_STATS · 2bb14878

Yuchung Cheng authored Dec 03, 2016

The commit of SOF_TIMESTAMPING_OPT_STATS didn't include the
new header for avr32, causing build to break. The patch fixes it.

Fixes: 1c885808 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2bb14878

03 Dec, 2016 1 commit

udp: be less conservative with sock rmem accounting · 363dc73a

Paolo Abeni authored Dec 02, 2016

Before commit 850cbadd ("udp: use it's own memory accounting
schema"), the udp protocol allowed sk_rmem_alloc to grow beyond
the rcvbuf by the whole current packet's truesize. After said commit
we allow sk_rmem_alloc to exceed the rcvbuf only if the receive queue
is empty. As reported by Jesper this cause a performance regression
for some (small) values of rcvbuf.

This commit is intended to fix the regression restoring the old
handling of the rcvbuf limit.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Fixes: 850cbadd ("udp: use it's own memory accounting schema")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

363dc73a