Commits · 3b7ad5ece49fe72498b50d28253fd39daead1e14 · nexedi / linux

20 Nov, 2015 4 commits

mlxsw: spectrum: Unify setting of HW VLAN filters · 3b7ad5ec

Ido Schimmel authored Nov 19, 2015

When adding or deleting VLANs from a bridged port, HW VLAN filters must be
set accordingly. Instead of having the same code in both add and delete
functions, just wrap it in a function and call it with the appropriate
parameters.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b7ad5ec

mlxsw: spectrum: Use correct PVID value when removing VLANs · 06c071f6

Ido Schimmel authored Nov 19, 2015

When removing a range of VLANs in which PVID is a member we should use
the correct PVID value instead of some VLAN in the range.

Also, change two print statements to use 'dev' instead of
'mlxsw_sp_port->dev', as it's already used in other print statements in
the function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06c071f6

bpf: add show_fdinfo handler for maps · f99bf205

Daniel Borkmann authored Nov 19, 2015

Add a handler for show_fdinfo() to be used by the anon-inodes
backend for eBPF maps, and dump the map specification there. Not
only useful for admins, but also it provides a minimal way to
compare specs from ELF vs pinned object.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f99bf205

net: encx24j600: move rev announcement to probe function · 7b5dc0dd

Jon Ringle authored Nov 18, 2015

When encx24j600 is open and closed many times due to userspace polling the
interface, the log gets noise with this log message.

Moving this to encx24j600_spi_probe function where it belongs.
Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b5dc0dd

18 Nov, 2015 20 commits

Merge branch 'net-generic-busy-polling' · 85c72ba1

David S. Miller authored Nov 18, 2015

Eric Dumazet says:

====================
net: extend busy polling support

This patch series extends busy polling range to tunnels devices,
and adds busy polling generic support to all NAPI drivers.

No need to provide ndo_busy_poll() method and extra synchronization
between ndo_busy_poll() and normal napi->poll() method.
This was proven very difficult and bug prone.

mlx5 driver is changed to support busy polling using this new method,
and a second mlx5 patch adds napi_complete_done() support and proper
SNMP accounting.

bnx2x and mlx4 drivers are converted to new infrastructure,
reducing kernel bloat and improving performance.

Latest patch, adding generic support, adds a new requirement :

 -free_netdev() and netif_napi_del() must be called from process context.

Since this might not be the case in some drivers, we might have to
either : fix the non conformant drivers (by disabling busy polling on them)
or revert this last patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

85c72ba1

net: provide generic busy polling to all NAPI drivers · 93d05d4a

Eric Dumazet authored Nov 18, 2015

NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)

napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()

This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.

Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.

Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93d05d4a

net: napi_hash_del() returns a boolean status · 34cbe27e

Eric Dumazet authored Nov 18, 2015

napi_hash_del() will soon be used from both drivers (if they want)
or core networking stack.

Callers are responsibles to ensure an RCU grace period is respected
before freeing napi structure : napi_hash_del() can signal if
this RCU grace period is needed or not.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34cbe27e

net: move napi_hash[] into read mostly section · 6180d9de

Eric Dumazet authored Nov 18, 2015

We do not often add/delete a napi context.
Moving napi_hash[] into read mostly section avoids potential false sharing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6180d9de

net: add netif_tx_napi_add() · d64b5e85

Eric Dumazet authored Nov 18, 2015

netif_tx_napi_add() is a variant of netif_napi_add()

It should be used by drivers that use a napi structure
to exclusively poll TX.

We do not want to add this kind of napi in napi_hash[] in following
patches, adding generic busy polling to all NAPI drivers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d64b5e85

net: move skb_mark_napi_id() into core networking stack · 93f93a44

Eric Dumazet authored Nov 18, 2015

We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.

skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().

Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93f93a44

mlx4: remove mlx4_en_low_latency_recv() · 868fdb06

Eric Dumazet authored Nov 18, 2015

Busy polling can now be handled in generic NAPI poll infrastructure.
This removes complexity and fast path overhead :

mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()

Tested:

Without busy polling :

lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    47330.78

With busy polling :

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97643.55
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

868fdb06

bnx2x: remove bnx2x_low_latency_recv() support · b59768c6

Eric Dumazet authored Nov 18, 2015

Switch to native NAPI polling, as this reduces overhead and complexity.

Normal path is faster, since one cmpxchg() is not anymore requested,
and busy polling with the NAPI polling has same performance.

Tested:
lpk50:~# cat /proc/sys/net/core/busy_read
70
lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    40095.07
16384  87380
IpInReceives                    401062             0.0
IpInDelivers                    401062             0.0
IpOutRequests                   401079             0.0
TcpActiveOpens                  7                  0.0
TcpPassiveOpens                 3                  0.0
TcpAttemptFails                 3                  0.0
TcpEstabResets                  5                  0.0
TcpInSegs                       401036             0.0
TcpOutSegs                      401052             0.0
TcpOutRsts                      38                 0.0
UdpInDatagrams                  26                 0.0
UdpOutDatagrams                 27                 0.0
Ip6OutNoRoutes                  1                  0.0
TcpExtDelayedACKs               1                  0.0
TcpExtTCPPrequeued              98                 0.0
TcpExtTCPDirectCopyFromPrequeue 98                 0.0
TcpExtTCPHPHits                 4                  0.0
TcpExtTCPHPHitsToUser           98                 0.0
TcpExtTCPPureAcks               5                  0.0
TcpExtTCPHPAcks                 101                0.0
TcpExtTCPAbortOnData            6                  0.0
TcpExtBusyPollRxPackets         400832             0.0
TcpExtTCPOrigDataSent           400983             0.0
IpExtInOctets                   21273867           0.0
IpExtOutOctets                  21261254           0.0
IpExtInNoECTPkts                401064             0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b59768c6

mlx5: support napi_complete_done() · 44fb6fbb

Eric Dumazet authored Nov 18, 2015

A NAPI poll handler should return number of RX packets processed,
instead of 0 / budget.

This allows proper busy poll accounting through LINUX_MIB_BUSYPOLLRXPACKETS
SNMP counter.

napi_complete_done() allows /sys/class/net/ethX/gro_flush_timeout
to be used for finer GRO aggregation control.

Tested:

Enabled busy polling, and checked TcpExtBusyPollRxPackets counter is increasing.

echo 70 >/proc/sys/net/core/busy_read
nstat >/dev/null
netperf -H target -t TCP_RR >/dev/null
nstat | grep TcpExtBusyPollRxPackets
TcpExtBusyPollRxPackets         490958             0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44fb6fbb

mlx5: add busy polling support · 7ae92ae5

Eric Dumazet authored Nov 18, 2015

It is now easy to add busy polling support to a NAPI driver,
with very little impact on normal input path.

This patch serves as a reference implementation.

Note:

A followup patch will add proper napi_complete_done() in mlx5,
so that LINUX_MIB_BUSYPOLLRXPACKETS snmp counter is properly handled.

Tested:

Normal TCP_RR results without busy polling :

lpk51:~# echo 0 >/proc/sys/net/core/busy_read
lpk52:~# echo 0 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    53509.49
16384  87380

Now enable busy polling :

lpk51:~# echo 70 >/proc/sys/net/core/busy_read
lpk52:~# echo 70 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97530.92
16384  87380
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ae92ae5

net: network drivers no longer need to implement ndo_busy_poll() · ce6aea93

Eric Dumazet authored Nov 18, 2015

Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.

Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.

ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.

We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.

Before this is done, a driver still needs to call :

- skb_mark_napi_id() for each skb provided to the stack.
- napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
- Make sure RCU grace period is respected after napi_hash_del() before
  memory containing napi structure is freed.

Followup patch implements busy poll for mlx5 driver as an example.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce6aea93

net: allow BH servicing in sk_busy_loop() · 2a028ecb

Eric Dumazet authored Nov 18, 2015

Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.

This has many benefits.

1) allow tunneled traffic to use busy poll as well as native traffic.
   Tunnels handlers usually call netif_rx() and depend on net_rx_action()
   being run (from sofirq handler)

2) allow RFS/RPS being used (sending IPI to other cpus if needed)

3) use the 'lets burn cpu cycles' budget to do useful work
   (like TX completions, timers, RCU callbacks...)

4) reduce BH latencies, making busy poll a better citizen.

Tested:

Tested with SIT tunnel

lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    37373.93
16384  87380

Now enable busy poll on both hosts

lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
lpaa6:~# echo 70 >/proc/sys/net/core/busy_read

lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    58314.77
16384  87380
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a028ecb

net: un-inline sk_busy_loop() · 02d62e86

Eric Dumazet authored Nov 18, 2015

There is really little gain from inlining this big function.
We'll soon make it even bigger in following patches.

This means we no longer need to export napi_by_id()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02d62e86

mlx4: mlx4_en_low_latency_recv() called with BH disabled · 5865316c

Eric Dumazet authored Nov 18, 2015

mlx4_en_low_latency_recv() is called with BH disabled,
as other ndo_busy_poll() methods.

No need for spin_lock_bh()/spin_unlock_bh()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5865316c

net: better skb->sender_cpu and skb->napi_id cohabitation · 52bd2d62

Eric Dumazet authored Nov 18, 2015

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

52bd2d62

be2net: remove local variable 'status' · d37b4c0a

Ivan Vecera authored Nov 18, 2015

The lancer_cmd_get_file_len() uses lancer_cmd_read_object() to get
the current size of registers for ethtool registers dump. Returned status
value is stored but not checked. The check itself is not necessary as
the data_read output variable is initialized to 0 and status variable
can be removed.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d37b4c0a

net: hisilicon: fix binding document of mdio · 6fbaa570

huangdaode authored Nov 18, 2015

This patch explains the occasion of "hisilcon,mdio" and
"hisilicon,hns-mdio" according to Arnd's comments.
and reformat it according to comments from Rob<robh@kernel.org>.
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fbaa570

net ipv4: use preferred log methods · 09605cc1

Bastian Stender authored Nov 13, 2015

Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.

Added newline to "too small MTU" message.
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

09605cc1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 34258a32

Linus Torvalds authored Nov 18, 2015

Pull s390 fixes from Martin Schwidefsky:
 "Assorted bug fixes, the mlock2 system call gets added, and one
  improvement.  The boot from dasd devices is now possible from a wider
  range of devices"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: remove SALIPL loader
  s390: wire up mlock2 system call
  s390: remove g5 elf platform support
  s390: avoid cache aliasing under z/VM and KVM
  s390/sclp: _sclp_wait_int(): retain full PSW mask
  s390/zcrypt: Fix initialisation when zcrypt is built-in
  s390/zcrypt: Fix kernel crash on systems without AP bus support
  s390: add support for ipl devices in subchannel sets > 0
  s390/ipl: fix out of bounds access in scpdata_write
  s390/pci_dma: improve debugging of errors during dma map
  s390/pci_dma: handle dma table failures
  s390/pci_dma: unify label of invalid translation table entries
  s390/syscalls: remove system call number calculation
  s390/cio: simplify css_generate_pgid
  s390/diag: add a s390 prefix to the diagnose trace point
  s390/head: fix error message on unsupported hardware

34258a32

Merge tag 'hwmon-for-linus-v4.4-rc2' of... · 0d77a123

Linus Torvalds authored Nov 18, 2015

Merge tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fix build issues in scpi and ina2xx drivers, update scpi driver to
  support recent firmware, and fix an uninitialized variable warning in
  applesmc driver"

* tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (scpi) skip unsupported sensors properly
  hwmon: (scpi) add thermal-of dependency
  hwmon : (applesmc) Fix uninitialized variables warnings
  hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C

0d77a123

17 Nov, 2015 16 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7f151f1d

Linus Torvalds authored Nov 17, 2015

Pull networking fixes from David Miller:

 1) Fix list tests in netfilter ingress support, from Florian Westphal.

 2) Fix reversal of input and output interfaces in ingress hook
    invocation, from Pablo Neira Ayuso.

 3) We have a use after free in r8169, caught by Dave Jones, fixed by
    Francois Romieu.

 4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.

 5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
    a) Don't create clone routes not managed by the fib6 tree
    b) Don't forget to check expiration of DST_NOCACHE routes.
    c) Handle rt->dst.from == NULL properly.

 6) Several AF_PACKET fixes wrt transport header setting and SKB
    protocol setting, from Daniel Borkmann.

 7) Fix thunder driver crash on shutdown, from Pavel Fedin.

 8) Several Mellanox driver fixes (max MTU calculations, use of correct
    DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
    Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.

 9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
    certain registers, etc.) from Neil Armstrong.

10) Make sure to disable preemption while updating per-cpu stats of ip
    tunnels, from Jason A.  Donenfeld.

11) Various ARM64 bpf JIT fixes, from Yang Shi.

12) Flush icache properly in ARM JITs, from Daniel Borkmann.

13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
    Nagai.

14) Fix netdev feature propagation for devices not implementing
    ->ndo_set_features().  From Nikolay Aleksandrov.

15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.

16) RAW socket code increments incorrect SNMP counters, fix from Ben
    Cartwright-Cox.

17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.

18) Fix handling of VLAN headers on stacked devices when REORDER is
    disabled.  From Vlad Yasevich.

19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
    Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  MAINTAINERS: Update Mellanox's Eth NIC driver entries
  net/core: revert "net: fix __netdev_update_features return.." and add comment
  af_unix: take receive queue lock while appending new skb
  rtnetlink: fix frame size warning in rtnl_fill_ifinfo
  net: use skb_clone to avoid alloc_pages failure.
  packet: Use PAGE_ALIGNED macro
  packet: Don't check frames_per_block against negative values
  net: phy: Use interrupts when available in NOLINK state
  phy: marvell: Add support for 88E1540 PHY
  arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
  macvlan: fix leak in macvlan_handle_frame
  ipvlan: fix use after free of skb
  ipvlan: fix leak in ipvlan_rcv_frame
  vlan: Do not put vlan headers back on bridge and macvlan ports
  vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
  via-velocity: unconditionally drop frames with bad l2 length
  ipg: Remove ipg driver
  dl2k: Add support for IP1000A-based cards
  snmp: Remove duplicate OUTMCAST stat increment
  net: thunder: Check for driver data in nicvf_remove()
  ...

7f151f1d

MAINTAINERS: Update Mellanox's Eth NIC driver entries · e7523a49

Or Gerlitz authored Nov 17, 2015

Eugenia (Jenny) Emantayev is replacing Amir Vadai as the
mlx4 Ethernet driver maintainer.

Saeed Mahameed is assigned to maintain mlx5 Eth functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7523a49

net/core: revert "net: fix __netdev_update_features return.." and add comment · 17b85d29

Nikolay Aleksandrov authored Nov 17, 2015

This reverts commit 00ee5927 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.

CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17b85d29

af_unix: take receive queue lock while appending new skb · a3a116e0

Hannes Frederic Sowa authored Nov 17, 2015

While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.

For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.

Fixes: 869e7c62 ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3a116e0

rtnetlink: fix frame size warning in rtnl_fill_ifinfo · b22b941b

Hannes Frederic Sowa authored Nov 17, 2015

Fix the following warning:

  CC      net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b22b941b

net: use skb_clone to avoid alloc_pages failure. · 19125c1a

Martin Zhang authored Nov 17, 2015

1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.
Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19125c1a

packet: Use PAGE_ALIGNED macro · 90836b67

Tobias Klauser authored Nov 17, 2015

Use PAGE_ALIGNED(...) instead of open-coding it.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

90836b67

packet: Don't check frames_per_block against negative values · 4194b491

Tobias Klauser authored Nov 17, 2015

rb->frames_per_block is an unsigned int, thus can never be negative.

Also fix spacing in the calculation of frames_per_block.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

4194b491

net: phy: Use interrupts when available in NOLINK state · 321beec5

Andrew Lunn authored Nov 16, 2015

The NOLINK state will poll the phy once a second to see if the link
has come up. If the phy has an interrupt line, this polling can be
skipped, since the phy should interrupt when the link returns.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

321beec5

phy: marvell: Add support for 88E1540 PHY · 819ec8e1

Andrew Lunn authored Nov 16, 2015

The 88E1540 can be found embedded in the Marvell 88E6352 switch.  It
is compatible with the 88E1510, so add support for it, using the
88E1510 specific functions.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

819ec8e1

arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS · ec0738db

Yang Shi authored Nov 16, 2015

Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.

However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. Save and restore x25/x26 in BPF prologue and epilogue to keep
them intact for the outside of BPF. Actually, x26 is unnecessary, but SP
requires 16 bytes alignment.

So, the BPF stack layout looks like:

                                 high
         original A64_SP =>   0:+-----+ BPF prologue
                                |FP/LR|
         current A64_FP =>  -16:+-----+
                                | ... | callee saved registers
                                +-----+
                                |     | x25/x26
         BPF fp register => -80:+-----+
                                |     |
                                | ... | BPF prog stack
                                |     |
                                |     |
         current A64_SP =>      +-----+
                                |     |
                                | ... | Function call stack
                                |     |
                                +-----+
                                  low

CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec0738db

macvlan: fix leak in macvlan_handle_frame · e639b8d8

Sabrina Dubroca authored Nov 16, 2015

Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.

Fixes: 8a4eb573 ("net: introduce rx_handler results and logic around that")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e639b8d8

ipvlan: fix use after free of skb · a534dc52

Sabrina Dubroca authored Nov 16, 2015

ipvlan_handle_frame is a rx_handler, and when it returns a value other
than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER),
__netif_receive_skb_core expects that the skb still exists and will
process it further, but we just freed it.

Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a534dc52

ipvlan: fix leak in ipvlan_rcv_frame · cf554ada

Sabrina Dubroca authored Nov 16, 2015

Pass a **skb to ipvlan_rcv_frame so that if skb_share_check returns a
new skb, we actually use it during further processing.

It's safe to ignore the new skb in the ipvlan_xmit_* functions, because
they call ipvlan_rcv_frame with local == true, so that dev_forward_skb
is called and always takes ownership of the skb.

Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf554ada

Merge branch 'vlan-reorder' · eb3f8b42

David S. Miller authored Nov 17, 2015

Vladislav Yasevich says:

====================
Fix issues with vlans without REORDER_HEADER

A while ago Phil Sutter brought up an issue with vlans without
REORDER_HEADER and bridges.  The problem was that if a vlan
without REORDER_HEADER was a port in the bridge, the bridge ended
up forwarding corrupted packets that still contained the vlan header.
The same issue exists for bridge mode macvlan/macvtap devices.

An additional issue with vlans without REORDER_HEADER is that stacking
them also doesn't work.  The reason here is that skb_reorder_vlan_header()
function assumes that it on ETH_HLEN bytes deep into the packet.  That
is not the case, when you a vlan without REORRDER_HEADER flag set.

This series attempts to correct these 2 issues.

1) To solve the stacked vlans problem, the patch simply use
skb->mac_len as an offset to start copying mac addresses that
is part of header reordering.

2) To fix the issue with bridge/macvlan/macvtap, the second patch
simply doesn't write the vlan header back to the packet if the
vlan device is either a bridge or a macvlan port.  This ends up
being the simplest and least performance intrussive solution.

I've considered extending patch 2 to all stacked devices (essentially
checked for the presense of rx_handler), but that feels like a broader
restriction and _may_ break existing uses.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

eb3f8b42

vlan: Do not put vlan headers back on bridge and macvlan ports · 28f9ee22

Vlad Yasevich authored Nov 16, 2015

When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports.  The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.

With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28f9ee22