Commits · a788859649b75a6e4514337a9a5d4a60fd709904 · Kirill Smelkov / linux

14 Nov, 2016 10 commits

net: atheros: atl2: use new api ethtool_{get|set}_link_ksettings · a7888596

Philippe Reynes authored Nov 13, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7888596

net: atheros: atl1: use new api ethtool_{get|set}_link_ksettings · 1dae02b3

Philippe Reynes authored Nov 13, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1dae02b3

net: atheros: atl1c: use new api ethtool_{get|set}_link_ksettings · 58046c70

Philippe Reynes authored Nov 12, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58046c70

net: alx: use new api ethtool_{get|set}_link_ksettings · 36a4e690

Philippe Reynes authored Nov 11, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36a4e690

net: fix sleeping for sk_wait_event() · d9dc8b0f

WANG Cong authored Nov 11, 2016

Similar to commit 14135f30 ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().

Switch to the new wait API.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9dc8b0f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 7d384846

David S. Miller authored Nov 13, 2016

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.

Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:

        nft add table netdev x
        nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
        nft add rule netdev x y drop

And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:

    17.30%  kpktgend_0   [nf_tables]            [k] nft_do_chain
    15.75%  kpktgend_0   [kernel.vmlinux]       [k] __netif_receive_skb_core
    10.39%  kpktgend_0   [nf_tables_netdev]     [k] nft_do_chain_netdev

I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

This rework contains more specifically, in strict order, these patches:

1) Remove compile-time debugging from core.

2) Remove obsolete comments that predate the rcu era. These days it is
   well known that a Netfilter hook always runs under rcu_read_lock().

3) Remove threshold handling, this is only used by br_netfilter too.
   We already have specific code to handle this from br_netfilter,
   so remove this code from the core path.

4) Deprecate NF_STOP, as this is only used by br_netfilter.

5) Place nf_state_hook pointer into xt_action_param structure, so
   this structure fits into one single cacheline according to pahole.
   This also implicit affects nftables since it also relies on the
   xt_action_param structure.

6) Move state->hook_entries into nf_queue entry. The hook_entries
   pointer is only required by nf_queue(), so we can store this in the
   queue entry instead.

7) use switch() statement to handle verdict cases.

8) Remove hook_entries field from nf_hook_state structure, this is only
   required by nf_queue, so store it in nf_queue_entry structure.

9) Merge nf_iterate() into nf_hook_slow() that results in a much more
   simple and readable function.

10) Handle NF_REPEAT away from the core, so far the only client is
    nf_conntrack_in() and we can restart the packet processing using a
    simple goto to jump back there when the TCP requires it.
    This update required a second pass to fix fallout, fix from
    Arnd Bergmann.

11) Set random seed from nft_hash when no seed is specified from
    userspace.

12) Simplify nf_tables expression registration, in a much smarter way
    to save lots of boiler plate code, by Liping Zhang.

13) Simplify layer 4 protocol conntrack tracker registration, from
    Davide Caratti.

14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
    to recent generalization of the socket infrastructure, from Arnd
    Bergmann.

15) Then, the ipset batch from Jozsef, he describes it as it follows:

* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
  in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
  because it is unnecessary to zero out the area just before
  explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
  together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
  across all set types.
* Count non-static extension memory into memsize calculation for
  userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
  they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
  one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
  types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
  and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
  was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
  by Muhammad Falak R Wani, individually for the set type families.

16) Remove useless connlabel field in struct netns_ct, patch from
    Florian Westphal.

17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
    {ip,ip6,arp}tables code that uses this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7d384846

mlxsw: spectrum_router: Add FIB abort warning · 8d419324

Jiri Pirko authored Nov 10, 2016

Add a warning that the abort mechanism was triggered for device.
Also avoid going through the procedure if abort was already done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d419324

Merge branch 'dsa-mv88e6xxx-post-refactor-fixes' · 5aad5b42

David S. Miller authored Nov 13, 2016

Andrew Lunn says:

====================
dsa: mv88e6xxx: Fixes for port refactoring

The patches which refactored setting up the switch MACs introduced a
couple of regressions. The RGMII delays for a port can be set using
other mechanism than just phy-mode. Don't overwrite the delays unless
explicitly asked to. This broke my Armada 370 RD. Also, the mv88e6351
family supports setting RGMII delays, but is missing the necessary
entries in the ops structures to allow this.

These fixes are to patches currently in net-next. No need for stable
etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5aad5b42

net: dsa: mv88e6xxx: 6351 family also has RGMII delays · 94d66ae6

Andrew Lunn authored Nov 10, 2016

The recent refactoring of setting the MAC configuration broke setting
of RGMII delays, via the phy-mode, on the 6351 family. Add the missing
ops to the structure.

Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

94d66ae6

net: dsa: mv88e6xxx: Don't modify RGMII delays when not RGMII mode · fedf1865

Andrew Lunn authored Nov 10, 2016

The RGMII modes delays can be set via strapping pings or EEPROM.
Don't change them unless explicitly asked to change them. The recent
refactoring of setting the MAC configuration changed this behaviours,
in that CPU and DSA ports have any pre-configured RGMII delays
removed. This breaks the Armada 370RD board. Restore the previous
behaviour, in that RGMII delays are only applied/removed when
explicitly asked for via an phy-mode being PHY_INTERFACE_MODE_RGMII*

Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fedf1865

13 Nov, 2016 30 commits

netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test · eb1a6bdc

Julia Lawall authored Nov 11, 2016

Since commit 7926dbfa ("netfilter: don't use
mutex_lock_interruptible()"), the function xt_find_table_lock can only
return NULL on an error.  Simplify the call sites and update the
comment before the function.

The semantic patch that change the code is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,e;
@@

t = \(xt_find_table_lock(...)\|
      try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- ! IS_ERR_OR_NULL(t)
+ t

@@
expression t,e;
@@

t = \(xt_find_table_lock(...)\|
      try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ !t

@@
expression t,e,e1;
@@

t = \(xt_find_table_lock(...)\|
      try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ e1
... when any

// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

eb1a6bdc

netfilter: conntrack: remove unused netns_ct member · 7e416ad7

Florian Westphal authored Nov 10, 2016

since 23014011 ('netfilter: conntrack: support a fixed size of 128 distinct labels')
this isn't needed anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7e416ad7

net: atheros: atl1e: use new api ethtool_{get|set}_link_ksettings · 63fb571e

Philippe Reynes authored Nov 12, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63fb571e

net: ethernet: ti: davinci_cpdma: don't stop ctlr if it was stopped · b993eec0

Ivan Khoronzhuk authored Nov 11, 2016

No need to stop ctlr if it was already stopped. It can cause timeout
warns. Steps:
- ifconfig eth0 down
- ethtool -l eth0 rx 8 tx 8
- ethtool -l eth0 rx 1 tx 1
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b993eec0

net: ethernet: ti: davinci_cpdma: fix fixed prio cpdma ctlr configuration · 991ddb1f

Ivan Khoronzhuk authored Nov 11, 2016

The dma ctlr is reseted to 0 while cpdma soft reset, thus cpdma ctlr
cannot be configured after cpdma is stopped. So restoring content
of cpdma ctlr while off/on procedure is needed. The cpdma ctlr off/on
procedure is present while interface down/up and while changing number
of channels with ethtool. In order to not restore content in many
places, move it to cpdma_ctlr_start().
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

991ddb1f

mlxsw: reg: Fix pwm_frequency field size in MFCR register · f7ad3d4b

Jiri Pirko authored Nov 11, 2016

The field is 7bit long. Fix it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7ad3d4b

genetlink: Make family a signed integer. · 98e4321b

David S. Miller authored Nov 13, 2016

The idr_alloc(), idr_remove(), et al. routines all expect IDs to be
signed integers.  Therefore make the genl_family member 'id' signed
too.
Signed-off-by: David S. Miller <davem@davemloft.net>

98e4321b

net: phy: marvell: optimize logic for page changing during init · b5718b5a

Uwe Kleine-König authored Nov 10, 2016

Instead of remembering if the page was changed, just compare the current
page to the saved one. This is easier and has the advantage to save a
register write if the page was already restored.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5718b5a

Merge branch 'amd-xgbe-updates' · 849dcce3

David S. Miller authored Nov 13, 2016

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2016-11-10

This patch series is targeted at adding support for a new PCI version
of the hardware. As part of the new PCI device, there is a new PCS/PHY
interaction, ECC support, I2C sideband communication, SFP+ support and
more.

The following updates and fixes are included in this driver update series:

- Hardware workaround for possible incorrectly generated interrupts
  during software reset
- Hardware workaround for Tx timestamp register access order
- Add support for a PCI version of the device
- Increase the Rx queue limit to take advantage of the increased number
  of DMA channels that might be available
- Add support for a new DMA channel interrupt mode
- Add ECC support for the device memory
- Add support for using the integrated I2C controller for sideband
  communication
- Expose the phylib phy_aneg_done() function so it can be called by the
  driver
- Add support for SFP+ modules
- Add support for MDIO attached PHYs
- Add support for KR re-driver between the PCS/SerDes and an external
  PHY

This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

849dcce3

amd-xgbe: Add support for a KR redriver · d7445d1f

Lendacky, Thomas authored Nov 10, 2016

This patch provides support for the presence of a KR redriver chip in
between the device PCS and an external PHY.  When a redriver chip is
present the device must perform clause 73 auto-negotiation in order to
set the redriver chip for the downstream connection.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7445d1f

amd-xgbe: Add support for MDIO attached PHYs · 732f2ab7

Lendacky, Thomas authored Nov 10, 2016

Use the phylib support in the kernel to communicate with and control an
MDIO attached PHY. Use the hardware's MDIO communication mechanism to
communicate with the PHY.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

732f2ab7

amd-xgbe: Add support for SFP+ modules · abf0a1c2

Lendacky, Thomas authored Nov 10, 2016

Add support for recognizing and using SFP+ modules directly. This includes
using the I2C support to read and interpret the information returned from
an SFP+ module and configuring things properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abf0a1c2

net: phy: expose phy_aneg_done API for use by drivers · 372788f9

Lendacky, Thomas authored Nov 10, 2016

Make phy_aneg_done() available to drivers so that the result of the
auto-negotiation initiated by phy_start_aneg() can be determined.

Remove the local implementation of phy_aneg_done() from the Aeroflex
driver and use the phy library version.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

372788f9

amd-xgbe: Add I2C support for sideband communication · 5ab1dcd5

Lendacky, Thomas authored Nov 10, 2016

Add support to initialize and use the I2C controller within the hardware
in order to perform sideband communication, e.g. determine the SFP media
type that is installed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ab1dcd5

amd-xgbe: Add ECC status support for the device memory · e78332b2

Lendacky, Thomas authored Nov 10, 2016

Some versions of the amd-xgbe device are capable of reporting ECC error
information back to the driver. Add support to process, track and report
on this information.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e78332b2

amd-xgbe: Add support for new DMA interrupt mode · 4c70dd8a

Lendacky, Thomas authored Nov 10, 2016

The current per channel DMA interrupt support is based on an edge
triggered interrupt that is not maskable. This results in having to call
the disable_irq/enable_irq functions in order to prevent interrupts
during napi processing. The hardware now has a way to configure the per
channel DMA interrupt that will allow for masking the interrupt which
prevents calling disable_irq/enable_irq now. This patch makes use of
this support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c70dd8a

amd-xgbe: Allow for a greater number of Rx queues · 8d6b2e92

Lendacky, Thomas authored Nov 10, 2016

Remove the call to netif_get_num_default_rss_queues() and replace it
with num_online_cpus() to allow for the possibility of using all of
the hardware DMA channels available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d6b2e92

amd-xgbe: Add PCI device support · 47f164de

Lendacky, Thomas authored Nov 10, 2016

Add support for new PCI devices to the driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47f164de

amd-xgbe: Add a workaround for Tx timestamp issue · aba9777a

Lendacky, Thomas authored Nov 10, 2016

Update the reading of the Tx timestamp to account for a hardware issue
on how the fields and interrupt are cleared. The "seconds" portion of
the timestamp should be read first, followed by the "nanoseconds" portion.
Reading the "nanoseconds" portion should clear the timestamp data and the
interrupt. Because of an issue with the hardware this order is reversed
and reading the "seconds" portion actually clears the timestamp. The code
currently follows this workaround, but to guard against future versions
where this is fixed add a field to the version data to indicate if the
workaround is required or not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aba9777a

amd-xgbe: Guard against incorrectly generated interrupts · 5ffc0335

Lendacky, Thomas authored Nov 10, 2016

Due to a hardware issue, it is possible for interrupt events to be
incorrectly generated when performing a soft reset.  To guard against
this, perform the soft reset twice.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ffc0335

Merge branch 'ovs-L3-encap' · f0a40400

David S. Miller authored Nov 13, 2016

Jiri Benc says:

====================
openvswitch: support for layer 3 encapsulated packets

At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header.

The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.

This series is based on work by Simon Horman, Lorand Jakab, Thomas Morin and
others. I kept Lorand's and Simon's s-o-b in the patches that are derived
from v11 to record their authorship of parts of the code.

Changes from v12 to v13:

* Addressed Pravin's feedback.
* Removed the GRE vport conversion patch; L3 GRE ports should be created by
  rtnetlink instead.

Main changes from v11 to v12:

* The patches were restructured and split differently for easier review.
* They were rebased and adjusted to the current net-next. Especially MPLS
  handling is different (and easier) thanks to the recent MPLS GSO rework.
* Several bugs were discovered and fixed. The most notable is fragment
  handling: header adjustment for ARPHRD_NONE devices on tx needs to be done
  after refragmentation, not before it. This required significant changes in
  the patchset. Another one is stricter checking of attributes (match on L2
  vs. L3 packet) at the kernel level.
* Instead of is_layer3 bool, a mac_proto field is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f0a40400

openvswitch: allow L3 netdev ports · 217ac77a

Jiri Benc authored Nov 10, 2016

Allow ARPHRD_NONE interfaces to be added to ovs bridge.

Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

217ac77a

openvswitch: add Ethernet push and pop actions · 91820da6

Jiri Benc authored Nov 10, 2016

It's not allowed to push Ethernet header in front of another Ethernet
header.

It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.

Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

91820da6

openvswitch: netlink: support L3 packets · 0a6410fb

Jiri Benc authored Nov 10, 2016

Extend the ovs flow netlink protocol to support L3 packets. Packets without
OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the
OVS_KEY_ATTR_ETHERTYPE attribute is mandatory.

Push/pop vlan actions are only supported for Ethernet packets.

Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a6410fb

openvswitch: add processing of L3 packets · 5108bbad

Jiri Benc authored Nov 10, 2016

Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).

Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.

Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5108bbad

openvswitch: support MPLS push and pop for L3 packets · 1560a074

Jiri Benc authored Nov 10, 2016

Update Ethernet header only if there is one.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1560a074

openvswitch: pass mac_proto to ovs_vport_send · e2d9d835

Jiri Benc authored Nov 10, 2016

We'll need it to alter packets sent to ARPHRD_NONE interfaces.

Change do_output() to use the actual L2 header size of the packet when
deciding on the minimum cutlen. The assumption here is that what matters is
not the output interface hard_header_len but rather the L2 header of the
particular packet. For example, ARPHRD_NONE tunnels that encapsulate
Ethernet should get at least the Ethernet header.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2d9d835

openvswitch: add mac_proto field to the flow key · 329f45bc

Jiri Benc authored Nov 10, 2016

Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.

It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.

Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

329f45bc

openvswitch: use hard_header_len instead of hardcoded ETH_HLEN · 738314a0

Jiri Benc authored Nov 10, 2016

On tx, use hard_header_len while deciding whether to refragment or drop the
packet. That way, all combinations are calculated correctly:

* L2 packet going to L2 interface (the L2 header len is subtracted),
* L2 packet going to L3 interface (the L2 header is included in the packet
  lenght),
* L3 packet going to L3 interface.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

738314a0

bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path · c540594f

Daniel Borkmann authored Nov 09, 2016

Commit 67f8b1dc ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().

We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()

Fixes: 67f8b1dc ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c540594f