Commits · 2344ef3c86a7fe41f97bf66c7936001b6132860b · nexedi / linux

30 Dec, 2016 1 commit

sh_eth: fix branch prediction in sh_eth_interrupt() · 2344ef3c

Sergei Shtylyov authored Dec 30, 2016

IIUC, likely()/unlikely() should apply to the whole *if* statement's
expression, not a part of it -- fix such expression in sh_eth_interrupt()
accordingly...

Fixes: 283e38db ("sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2344ef3c

29 Dec, 2016 10 commits

Merge branch 'mlx4-misc-fixes' · e400b797

David S. Miller authored Dec 29, 2016

Tariq Toukan says:

====================
mlx4 misc fixes

This patchset contains several bug fixes from the team to the
mlx4 Eth and Core drivers.

Series generated against net commit:
60133867 'net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured"'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e400b797

net/mlx4_core: Fix raw qp flow steering rules under SRIOV · 10b1c04e

Jack Morgenstein authored Dec 29, 2016

Demoting simple flow steering rule priority (for DPDK) was achieved by
wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF
as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper
function for the PF and all VFs.

In function mlx4_ib_create_flow(), this change caused the main rule
creation for the PF to be wrapped, while it left the associated
tunnel steering rule creation unwrapped for the PF.

This mismatch caused rule deletion failures in mlx4_ib_destroy_flow()
for the PF when the detach wrapper function did not find the associated
tunnel-steering rule (since creation of that rule for the PF did not
go through the wrapper function).

Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native"
(so that the PF invocation does not go through the wrapper), and perform
the required priority demotion for the PF in the mlx4_ib_create_flow()
code path.

Fixes: 48564135 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10b1c04e

net/mlx4_en: Fix type mismatch for 32-bit systems · 61b6034c

Slava Shwartsman authored Dec 29, 2016

is_power_of_2 expects unsigned long and we pass u64 max_val_cycles,
this will be truncated on 32 bit systems, and the result is not what we
were expecting.
div_u64 expects u32 as a second argument and we pass
max_val_cycles_rounded which is u64 hence it will always be truncated.
Fix was tested on both 64 and 32 bit systems and got same results for
max_val_cycles and max_val_cycles_rounded.

Fixes: 4850cf45 ("net/mlx4_en: Resolve dividing by zero in 32-bit system")
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61b6034c

net/mlx4: Remove BUG_ON from ICM allocation routine · c1d5f8ff

Leon Romanovsky authored Dec 29, 2016

This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent()
by checking DMA address alignment in advance and performing proper
folding in case of error.

Fixes: 5b0bf5e2 ("mlx4_core: Support ICM tables in coherent memory")
Reported-by: Ozgur Karatas <okaratas@member.fsf.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1d5f8ff

net/mlx4_en: Fix bad WQE issue · 6496bbf0

Eugenia Emantayev authored Dec 29, 2016

Single send WQE in RX buffer should be stamped with software
ownership in order to prevent the flow of QP in error in FW
once UPDATE_QP is called.

Fixes: 9f519f68 ('mlx4_en: Not using Shared Receive Queues')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6496bbf0

net/mlx4_core: Use-after-free causes a resource leak in flow-steering detach · 3b01fe7f

Jack Morgenstein authored Dec 29, 2016

mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering
rule (which results in freeing the rule structure), and then
references a field in this struct (the qp number) when releasing the
busy-status on the rule's qp.

Since this memory was freed, it could reallocated and changed.
Therefore, the qp number in the struct may be incorrect,
so that we are releasing the incorrect qp. This leaves the rule's qp
in the busy state (and could possibly release an incorrect qp as well).

Fix this by saving the qp number in a local variable, for use after
removing the steering rule.

Fixes: 2c473ae7 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b01fe7f

net: fix incorrect original ingress device index in PKTINFO · f0c16ba8

Wei Zhang authored Dec 29, 2016

When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0c16ba8

rtnl: stats - add missing netlink message size checks · 4775cc1f

Mathias Krause authored Dec 28, 2016

We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.

Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.

Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4775cc1f

net: stmmac: Fix error path after register_netdev move · b2eb09af

Florian Fainelli authored Dec 28, 2016

Commit 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and
stmmac_open") re-ordered how the MDIO bus registration and the network
device are registered, but missed to unwind the MDIO bus registration in
case we fail to register the network device.

Fixes: 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2eb09af

ipv6: Should use consistent conditional judgement for ip6 fragment between... · e4c5e13a

Zheng Li authored Dec 28, 2016

ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output

There is an inconsistent conditional judgement between __ip6_append_data
and ip6_finish_output functions, the variable length in __ip6_append_data
just include the length of application's payload and udp6 header, don't
include the length of ipv6 header, but in ip6_finish_output use
(skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ipv6 header.

That causes some particular application's udp6 payloads whose length are
between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
though the rst->dev support UFO feature.

Add the length of ipv6 header to length in __ip6_append_data to keep
consistent conditional judgement as ip6_finish_output for ip6 fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4c5e13a

28 Dec, 2016 18 commits

net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured" · 60133867

Colin Ian King authored Dec 28, 2016

trivial fix to spelling mistake in pr_info message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60133867

net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS · 9dd0f896

Augusto Mecking Caringi authored Dec 28, 2016

This patch fixes the following warnings when CONFIG_PROC_FS is not set:

linux/net/atm/lec.c: In function ‘lane_module_cleanup’:
linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first
use in this function)
remove_proc_entry("lec", atm_proc_root);
                           ^
linux/net/atm/lec.c:1062:27: note: each undeclared identifier is
reported only once for each function it appears in
Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dd0f896

Merge branch 'mlx5-fixes' · 4b52416a

David S. Miller authored Dec 28, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 28-12-2016

Some fixes for mlx5 core and ethernet driver.

for -stable:
    net/mlx5: Check FW limitations on log_max_qp before setting it
    net/mlx5: Cancel recovery work in remove flow
    net/mlx5: Avoid shadowing numa_node
    net/mlx5: Mask destination mac value in ethtool steering rules
    net/mlx5: Prevent setting multicast macs for VFs
    net/mlx5e: Don't sync netdev state when not registered
    net/mlx5e: Disable netdev after close
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4b52416a

net/mlx5e: Disable netdev after close · 37f304d1

Saeed Mahameed authored Dec 28, 2016

Disable netdev should come after it was closed, although no harm of doing it
before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way.

Fixes: 26e59d80 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37f304d1

net/mlx5e: Don't sync netdev state when not registered · 610e89e0

Saeed Mahameed authored Dec 28, 2016

Skip setting netdev vxlan ports and netdev rx_mode on driver load
when netdev is not yet registered.

Synchronizing with netdev state is needed only on reset flow where the
netdev remains registered for the whole reset period.

This also fixes an access before initialization of net_device.addr_list_lock
- which for some reason initialized on register_netdev - where we queued
set_rx_mode work on driver load before netdev registration.

Fixes: 26e59d80 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

610e89e0

net/mlx5e: Check ets capability before initializing ets settings · 4525a45b

Huy Nguyen authored Dec 28, 2016

During the initial setup, the ets command is sent to firmware
without checking if the HCA supports ets. This causes the invalid
command error. Add the ets capiblity check before sending firmware
command to initialize ets settings.

Fixes: e207b7e9 ("net/mlx5e: ConnectX-4 firmware support for DCBX")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4525a45b

Revert "net/mlx5: Add MPCNT register infrastructure" · 1efbd205

Gal Pressman authored Dec 28, 2016

This reverts commit 7f503169.

Fixes: 7f503169 ("net/mlx5: Add MPCNT register infrastructure")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1efbd205

Revert "net/mlx5e: Expose PCIe statistics to ethtool" · 465db5da

Gal Pressman authored Dec 28, 2016

This reverts commit 9c726239.
PCIe counters were introduced in a new firmware version, as a result users
with old firmware encountered a syndrome every 200ms due to update stats
work. This feature will be re-introduced later with appropriate capabilities
infrastructure.

Fixes: 9c726239 ("net/mlx5e: Expose PCIe statistics to ethtool")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

465db5da

net/mlx5: Prevent setting multicast macs for VFs · ccce1700

Mohamad Haj Yahia authored Dec 28, 2016

Need to check that VF mac address entered by the admin user is either
zero or unicast mac.
Multicast mac addresses are prohibited.

Fixes: 77256579 ('net/mlx5: E-Switch, Introduce Vport administration functions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccce1700

net/mlx5: Release FTE lock in error flow · 9b8c5142

Maor Gottlieb authored Dec 28, 2016

Release the FTE lock when adding rule to the FTE has failed.

Fixes: 0fd758d6 ('net/mlx5: Don't unlock fte while still using it')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b8c5142

net/mlx5: Mask destination mac value in ethtool steering rules · 077b1e80

Maor Gottlieb authored Dec 28, 2016

We need to mask the destination mac value with the destination mac
mask when adding steering rule via ethtool.

Fixes: 1174fce8 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

077b1e80

net/mlx5: Avoid shadowing numa_node · d151d73d

Eli Cohen authored Dec 28, 2016

Avoid using a local variable named numa_node to avoid shadowing a public
one.

Fixes: db058a18 ('net/mlx5_core: Set irq affinity hints')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d151d73d

net/mlx5: Cancel recovery work in remove flow · 689a248d

Daniel Jurgens authored Dec 28, 2016

If there is pending delayed work for health recovery it must be canceled
if the device is being unloaded.

Fixes: 05ac2c0b ("net/mlx5: Fix race between PCI error handlers and health work")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

689a248d

net/mlx5: Check FW limitations on log_max_qp before setting it · 883371c4

Noa Osherovich authored Dec 28, 2016

When setting HCA capabilities, set log_max_qp to be the minimum
between the selected profile's value and the HCA limitation.

Fixes: 938fe83c ('net/mlx5_core: New device capabilities...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

883371c4

net/mlx5: Disable RoCE on the e-switch management port under switchdev mode · 9da34cd3

Or Gerlitz authored Dec 28, 2016

Under the switchdev/offloads mode, packets that don't match any
e-switch steering rule are sent towards the e-switch management
port. We use a NIC HW steering rule set per vport (uplink and VFs)
to make them be received into the host OS through the respective
vport representor netdevice.

Currnetly such missed RoCE packets will not get to this NIC steering
rule, and hence VF RoCE will not work over the slow path of the offloads
mode. This is b/c these packets will be matched by a steering rule added
by the firmware that serves RoCE traffic set on the PF NIC vport which
is also the e-switch management port under SRIOV.

Disabling RoCE on the e-switch management vport when we are in the offloads
mode, will signal to the firmware to remove their RoCE rule, and then the
missed RoCE packets will be matched by the representor NIC steering rule
as any other missed packets.

To achieve that, we disable RoCE on the PF vport. We do that by removing
(hot-unplugging) the IB device instance associated with the PF. This is
also required by our current model where the PF serves as the uplink
representor and hence only SW switching (TC, bridge, OVS) applications
and slow path vport mlx5e net-device should be running over that vport.

Fixes: c930a3ad ('net/mlx5e: Add devlink based SRIOV mode changes')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9da34cd3

net/sched: cls_flower: Fix missing addr_type in classify · 0df0f207

Paul Blakey authored Dec 28, 2016

Since we now use a non zero mask on addr_type, we are matching on its
value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
failed in SW/classify path since its value was zero.
This patch sets the proper value of addr_type for encapsulated packets.

Fixes: 970bfcd0 ('net/sched: cls_flower: Use mask for addr_type')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0df0f207

net: stmmac: Fix race between stmmac_drv_probe and stmmac_open · 57016590

Florian Fainelli authored Dec 27, 2016

There is currently a small window during which the network device registered by
stmmac can be made visible, yet all resources, including and clock and MDIO bus
have not had a chance to be set up, this can lead to the following error to
occur:

[  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                stmmac_dvr_probe: warning: cannot get CSR clock
[  473.919382] stmmaceth 0000:01:00.0: no reset control found
[  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
[  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
[  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                Enable RX Mitigation via HW Watchdog Timer
[  473.921395] libphy: PHY stmmac-1:00 not found
[  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
[  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
                PHY (error: -19)
[  473.959710] libphy: stmmac: probed
[  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
                (stmmac-1:00) active
[  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
                (stmmac-1:01)
[  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
                (stmmac-1:02)
[  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
                (stmmac-1:03)

Fix this by making sure that register_netdev() is the last thing being done,
which guarantees that the clock and the MDIO bus are available.

Fixes: 4bfcbd7a ("stmmac: Move the mdio_register/_unregister in probe/remove")
Reported-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57016590

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f18e4d0

Linus Torvalds authored Dec 27, 2016

Pull networking fixes from David Miller:

 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

 2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

 3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: stmmac: fix incorrect bit set in gmac4 mdio addr register
  r8169: add support for RTL8168 series add-on card.
  net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
  openvswitch: upcall: Fix vlan handling.
  ipv4: Namespaceify tcp_tw_reuse knob
  net: korina: Fix NAPI versus resources freeing
  net, sched: fix soft lockup in tc_classify
  net/mlx4_en: Fix user prio field in XDP forward
  tipc: don't send FIN message from connectionless socket
  ipvlan: fix multicast processing
  ipvlan: fix various issues in ipvlan_process_multicast()

8f18e4d0

27 Dec, 2016 7 commits

net: stmmac: fix incorrect bit set in gmac4 mdio addr register · 5799fc90

Kweh, Hock Leong authored Dec 28, 2016

Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of
OR together with MII_WRITE.
Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5799fc90

r8169: add support for RTL8168 series add-on card. · 610c9087

Chun-Hao Lin authored Dec 27, 2016

This chip is the same as RTL8168, but its device id is 0x8161.
Signed-off-by: Chun-Hao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

610c9087

net: xdp: remove unused bfp_warn_invalid_xdp_buffer() · be267277

Jason Wang authored Dec 27, 2016

After commit 73b62bd0 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

be267277

openvswitch: upcall: Fix vlan handling. · df30f740

pravin shelar authored Dec 26, 2016

Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbad ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

df30f740

ipv4: Namespaceify tcp_tw_reuse knob · 56ab6b93

Haishuang Yan authored Dec 25, 2016

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56ab6b93

x86/mce/AMD: Make the init code more robust · 0dad3a30

Thomas Gleixner authored Dec 26, 2016

If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.

Add a sanity check.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0dad3a30

smp/hotplug: Undo tglxs brainfart · b9d9d691

Thomas Gleixner authored Dec 26, 2016

The attempt to prevent overwriting an active state resulted in a
disaster which effectively disables all dynamically allocated hotplug
states.

Cleanup the mess.

Fixes: dc280d93 ("cpu/hotplug: Prevent overwriting of callbacks")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b9d9d691

26 Dec, 2016 4 commits

arm64: don't pull uaccess.h into *.S · b4b8664d

Al Viro authored Dec 26, 2016

Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b4b8664d

net: korina: Fix NAPI versus resources freeing · e6afb1ad

Florian Fainelli authored Dec 23, 2016

Commit beb0babf ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:

Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.

Fixes: beb0babf ("korina: disable napi on close and restart")
Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6afb1ad

net, sched: fix soft lockup in tc_classify · 628185cf

Daniel Borkmann authored Dec 21, 2016

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

628185cf

Linux 4.10-rc1 · 7ce7d89f
Linus Torvalds authored Dec 25, 2016

7ce7d89f