Commits · 38222b1aacf39c92e72d42a809609c725f2e9f83 · Kirill Smelkov / linux

23 May, 2018 40 commits

net: dsa: qca8k: Remove redundant parentheses · 38222b1a

Michal Vokáč authored May 23, 2018

Fix warning reported by checkpatch.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38222b1a

net: dsa: qca8k: Replace GPL boilerplate by SPDX · 63a786a3

Michal Vokáč authored May 23, 2018

Replace the GPLv2 license boilerplate with the SPDX license identifier.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63a786a3

net: dsa: qca8k: Allow overwriting CPU port setting · 9bb2289f

Michal Vokáč authored May 23, 2018

Implement adjust_link function that allows to overwrite default CPU port
setting using fixed-link device tree subnode.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bb2289f

net: dsa: qca8k: Force CPU port to its highest bandwidth · 79a4ed4f

Michal Vokáč authored May 23, 2018

By default autonegotiation is enabled to configure MAC on all ports.
For the CPU port autonegotiation can not be used so we need to set
some sensible defaults manually.

This patch forces the default setting of the CPU port to 1000Mbps/full
duplex which is the chip maximum capability.

Also correct size of the bit field used to configure link speed.

Fixes: 6b93fb46 ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79a4ed4f

net: dsa: qca8k: Enable RXMAC when bringing up a port · eee1fe64

Michal Vokáč authored May 23, 2018

When a port is brought up/down do not enable/disable only the TXMAC
but the RXMAC as well. This is essential for the CPU port to work.

Fixes: 6b93fb46 ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eee1fe64

net: dsa: qca8k: Add support for QCA8334 switch · 64cf8167

Michal Vokáč authored May 23, 2018

Add support for the four-port variant of the Qualcomm QCA833x switch.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

64cf8167

net: dsa: qca8k: Add QCA8334 binding documentation · 218bbea1

Michal Vokáč authored May 23, 2018

Add support for the four-port variant of the Qualcomm QCA833x switch.

The CPU port default link settings can be reconfigured using
a fixed-link sub-node.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

218bbea1

cxgb4: Add new T6 device ids · 038ed45b

Ganesh Goudar authored May 23, 2018

Add 0x6088 and 0x6089 device ids for new T6 cards.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

038ed45b

selftests: uevent filtering · 9d3df886

Christian Brauner authored May 22, 2018

Recent discussions around uevent filtering (cf. net-next commit [1], [2],
and [3] and discussions in [4], [5], and [6]) have shown that the semantics
around uevent filtering where not well understood.
Now that we have settled - at least for the moment - how uevent filtering
should look like let's add some selftests to ensure we don't regress
anything in the future.
Note, the semantics of uevent filtering are described in detail in my
commit message to [2] so I won't repeat them here.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90d52d4fd82007005125d9a8d2d560a1ca059b9d
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a3498436b3a0f8ec289e6847e1de40b4123e1639
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=26045a7b14bc7a5455e411d820110f66557d6589
[4]: https://lkml.org/lkml/2018/4/4/739
[5]: https://lkml.org/lkml/2018/4/26/767
[6]: https://lkml.org/lkml/2018/4/26/738Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d3df886

Merge branch 'fib-rule-selftest' · 8fcb0972

David S. Miller authored May 23, 2018

Roopa Prabhu says:

====================
fib rule selftest

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get,
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8fcb0972

selftests: net: initial fib rule tests · 65b2b493

Roopa Prabhu authored May 22, 2018

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65b2b493

ipv6: support sport, dport and ip_proto in RTM_GETROUTE · eacb9384

Roopa Prabhu authored May 22, 2018

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eacb9384

ipv4: support sport, dport and ip_proto in RTM_GETROUTE · 404eb77e

Roopa Prabhu authored May 22, 2018

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

404eb77e

hv_netvsc: Add handlers for ethtool get/set msg level · 273de02a

Haiyang Zhang authored May 22, 2018

The handlers for ethtool get/set msg level are missing from netvsc.
This patch adds them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

273de02a

net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION · 7c6f9747

Colin Ian King authored May 22, 2018

Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION
to fix spelling mistake.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c6f9747

Merge branch 'udp-gso-fixes' · 75a839c3

David S. Miller authored May 23, 2018

Willem de Bruijn says:

====================
udp gso fixes

A few small fixes:
- disallow segmentation with XFRM
- do not leak gso packets into the ingress path

Changes
  v1 -> v2
  - fix build failure in team.c
  - drop scatter-gather fix:
      this is now fixed by commit 113f99c3 ("net: test tailroom
      before appending to linear skb"). After this patch gso skbs are
      built non-linear regardless of NETIF_F_SG and skb_segment builds
      linear segs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

75a839c3

gso: limit udp gso to egress-only virtual devices · 8eea1ca8

Willem de Bruijn authored May 22, 2018

Until the udp receive stack supports large packets (UDP GRO), GSO
packets must not loop from the egress to the ingress path.

Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
may loop packets, such as veth and macvlan.

Instead add it to specific devices that forward to another device's
egress path, bonding and team.

Fixes: 83aa025f ("udp: add gso support to virtual devices")
CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8eea1ca8

udp: exclude gso from xfrm paths · ff06342c

Willem de Bruijn authored May 22, 2018

UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff06342c

Merge branch 'net-sfp-small-improvements' · e89e59c0

David S. Miller authored May 23, 2018

Antoine Tenart says:

====================
net: sfp: small improvements

A small series of patches improving the SFP support by adding a warning
when no Tx disable pin is available, and making the i2c-bus property
mandatory.

Thanks!
Antoine

Since v1:
  - Removed the patch fixing the sfp driver when no i2c bus was described.
  - Made two new patches to make the i2c-bus property mandatory for sfp modules.

Since the phylink series:
  - s/-EOPNOTSUPP/-ENODEV/ in patch 1/2.
  - I added the acked-by tag in patch 2/2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e89e59c0

Documentation/bindings: net: the sfp i2c-bus property is now mandatory · 3e484393

Antoine Tenart authored May 22, 2018

The i2c-bus property for sfp modules was made mandatory. Update the
documentation to keep it in sync with the driver's behaviour.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e484393

net: phy: sfp: make the i2c-bus dt property mandatory · 66ede1f9

Antoine Tenart authored May 22, 2018

This patch makes the i2c-bus property mandatory when using a device
tree. If the sfp i2c bus isn't described it's impossible to guess the
protocol to use for a given module, and the sfp module would then not
work in most cases.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66ede1f9

net: phy: sfp: warn the user when no tx_disable pin is available · a1f5d1f0

Antoine Tenart authored May 22, 2018

In case no Tx disable pin is available the SFP modules will always be
emitting. This could be an issue when using modules using laser as their
light source as we would have no way to disable it when the fiber is
removed. This patch adds a warning when registering an SFP cage which do
not have its tx_disable pin wired or available.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1f5d1f0

Merge branch 'nfp-abm-add-basic-support-for-advanced-buffering-NIC' · 47de868b

David S. Miller authored May 23, 2018

Jakub Kicinski says:

====================
nfp: abm: add basic support for advanced buffering NIC

This series lays groundwork for advanced buffer management NIC feature.
It makes necessary NFP core changes, spawns representors and adds devlink
glue.  Following series will add the actual buffering configuration (patch
series size limit).

First three patches add support for configuring NFP buffer pools via a
mailbox.  The existing devlink APIs are used for the purpose.

Third patch allows us to perform small reads from the NFP memory.

The rest of the patch set adds eswitch mode change support and makes
the driver spawn appropriate representors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

47de868b

nfp: assign vNIC id as phys_port_name of vNICs which are not ports · 51c1df83

Jakub Kicinski authored May 21, 2018

When NFP is modelled as a switch we assign phys_port_name to respective
port(representor )s:

 vNIC0 - | - PF port (pf%d)     MAC/PHY (p%d[s%d]) - |E==

In most cases there is only one vNIC for communication with the switch.
If there is more than one we need to be able to identify them.  Use %d
as phys_port_name of the vNICs.

We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any
more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51c1df83

nfp: use split in naming of PCIe PF ports · 290f54db

Jakub Kicinski authored May 21, 2018

PCI PFs can host more than one logical endpoint.  In NFP terms
this means having more than one vNIC for PCIe PF.  The vNICs
are usually corresponding 1:1 to Ethernet ports.  In core NIC
we use the legacy idea of vNIC *being* the Ethernet port,
hence netdevs put pX(sY) in their phys_port_name, like Ethernet
ports would.  When ASIC ports are fully represented we need to
be able to name different PCIe PF ports, too.  Use a scheme
similar to Ethernet ports - pfXsY, for PCIe PF number X,
sub-port Y.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

290f54db

nfp: abm: force Ethternet port up · 1f700367

Jakub Kicinski authored May 21, 2018

Current control firmware does not cater too well to multi-host
applications.  There is no way to check which hosts are up or
otherwise negotiate what the state of the external port (the
Ethernet port) should be.  Make sure the link is up when driver
loads, and don't take it down when Ethernet port netdev is
closed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f700367

nfp: abm: spawn port netdevs · d05d902e

Jakub Kicinski authored May 21, 2018

To configure buffering points we need full set of netdevs:

                              ASIC

 user netdev  -- | -- PCIe port   MAC port -- | --

Configuring egrees qdiscs on user netdev configures standard
Linux TC software qdiscs, configuring PCIe port qdiscs will
provide a way of setting ASIC queuing parameters for PCIe block.
MAC port netdev egress qdiscs correspond to ASIC MAC Traffic
Manager block.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d05d902e

nfp: add devlink_eswitch_mode_set callback · 4afa3af4

Jakub Kicinski authored May 21, 2018

Our previous apps all assumed to use only one eswitch mode (legacy
or switchdev) without the ability to change it.  ABM NIC will
want to support the switch so plumb devlink_eswitch_mode_set through.
The devlink_eswitch_mode_set is expected to spawn representors and
potentially devlink ports so it's called under big devlink lock and
pf->lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4afa3af4

devlink: don't take instance lock around eswitch mode set · 7ac1cc9a

Jakub Kicinski authored May 21, 2018

Changing switch mode may want to register and unregister devlink
ports.  Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it
should not take the instance lock.  Drivers don't depend on existing
locking since it's a very recent addition.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ac1cc9a

nfp: add app pointer to port representors · 634c6b7a

Jakub Kicinski authored May 21, 2018

nfp_apps can currently associate their structures with vNICs but
not representors.  Add app priv pointer to representors as well.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

634c6b7a

nfp: abm: create project-specific vNIC structure · cc54dc28

Jakub Kicinski authored May 21, 2018

ABM NIC requires more complex vNIC handling, allocate
per-vNIC structure.  Find out RX queue base and PCI PF id.
There will be multiple PFs sharing the same MAC port, therefore
the MAC address assigned to the vNIC must be looked up in the
HWInfo database.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc54dc28

nfp: abm: add initial active buffer management NIC skeleton · c4c8f39a

Jakub Kicinski authored May 21, 2018

Add a very rudimentary active buffer management NIC support.
For now it's like a core NIC without SR-IOV support.  Next
commits will extend its functionality.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4c8f39a

nfp: core: allow 4-byte aligned accesses to Memory Units · b586c77b

Jakub Kicinski authored May 21, 2018

Current code doesn't enforce length requirements on 32bit accesses
with action NFP_CPP_ACTION_RW to memory units, but if the access
is only aligned to 4 bytes as well we will fall into the explicit
access case and error out.  Such accesses are correct, allow them
by lowering the width earlier.

While at it use a switch statement to improve readability.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b586c77b

nfp: add shared buffer configuration · a0d163f4

Jakub Kicinski authored May 21, 2018

Allow app FW to advertise its shared buffer pool information.
Use the per-PF mailbox to configure them from devlink.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0d163f4

nfp: add support for per-PCI PF mailbox · 0c693323

Jakub Kicinski authored May 21, 2018

When working with devlink-related functionality for locking reasons
it's easier to create a new mailbox per-PCI PF device than try to
use one of the netdev/vNIC mailboxes.

Define new mailbox structure and resolve its symbol during probe.
For forward compatibility allow silent truncation of mailbox command
data.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c693323

nfp: move rtsym helpers to pf code · 8f6196f6

Jakub Kicinski authored May 21, 2018

nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not
really related to networking code.  Move them to the PF code and
remove the net from their names.  They will soon be needed by code
outside of nfp_net_main.c anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8f6196f6

Merge branch 'bpfilter' · e95a5f54

David S. Miller authored May 23, 2018

Alexei Starovoitov says:

====================
bpfilter

v2->v3:
- followed Luis's suggestion and significantly simplied first patch
  with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper
- fixed typos and race to access pipes with mutex
- tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y|m both work.
  Interesting to see a usermode executable being embedded inside vmlinux.
- it doesn't hurt to enable bpfilter in .config.
  ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is
  returned from userspace, so kernel falls back to original iptables code

v1->v2:
this patch set is almost a full rewrite of the earlier umh modules approach
The v1 of patches and follow up discussion was covered by LWN:
https://lwn.net/Articles/749108/

I believe the v2 addresses all issues brought up by Andy and others.
Mainly there are zero changes to kernel/module.c
Instead of teaching module loading logic to recognize special
umh module, let normal kernel modules execute part of its own
.init.rodata as a new user space process (Andy's idea)
Patch 1 introduces this new helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
Input:
  data + len == executable file
Output:
  struct umh_info {
       struct file *pipe_to_umh;
       struct file *pipe_from_umh;
       pid_t pid;
  };

Advantages vs v1:
- the embedded user mode executable is stored as .init.rodata inside
  normal kernel module. These pages are freed when .ko finishes loading
- the elf file is copied into tmpfs file. The user mode process is swappable.
- the communication between user mode process and 'parent' kernel module
  is done via two unix pipes, hence protocol is not exposed to
  user space
- impossible to launch umh on its own (that was the main issue of v1)
  and impossible to be man-in-the-middle due to pipes
- bpfilter.ko consists of tiny kernel part that passes the data
  between kernel and umh via pipes and much bigger umh part that
  doing all the work
- 'lsmod' shows bpfilter.ko as usual.
  'rmmod bpfilter' removes kernel module and kills corresponding umh
- signed bpfilter.ko covers the whole image including umh code

Few issues:
- the user can still attach to the process and debug it with
  'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work.
  (a bit worse comparing to v1)
- tinyconfig will notice a small increase in .text
  +766 | TEXT | 7c8b94806bec umh: introduce fork_usermode_blob() helper
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e95a5f54

net: add skeleton of bpfilter kernel module · d2ba09c1

Alexei Starovoitov authored May 21, 2018

bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
and user mode helper code that is embedded into bpfilter.ko

The steps to build bpfilter.ko are the following:
- main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
- with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
  is converted into bpfilter_umh.o object file
  with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
  Example:
  $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
  0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
  0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
  0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
- bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko

bpfilter_kern.c is a normal kernel module code that calls
the fork_usermode_blob() helper to execute part of its own data
as a user mode process.

Notice that _binary_net_bpfilter_bpfilter_umh_start - end
is placed into .init.rodata section, so it's freed as soon as __init
function of bpfilter.ko is finished.
As part of __init the bpfilter.ko does first request/reply action
via two unix pipe provided by fork_usermode_blob() helper to
make sure that umh is healthy. If not it will kill it via pid.

Later bpfilter_process_sockopt() will be called from bpfilter hooks
in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
kill umh as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2ba09c1

umh: introduce fork_usermode_blob() helper · 449325b5

Alexei Starovoitov authored May 21, 2018

Introduce helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
struct umh_info {
       struct file *pipe_to_umh;
       struct file *pipe_from_umh;
       pid_t pid;
};

that GPLed kernel modules (signed or unsigned) can use it to execute part
of its own data as swappable user mode process.

The kernel will do:
- allocate a unique file in tmpfs
- populate that file with [data, data + len] bytes
- user-mode-helper code will do_execve that file and, before the process
  starts, the kernel will create two unix pipes for bidirectional
  communication between kernel module and umh
- close tmpfs file, effectively deleting it
- the fork_usermode_blob will return zero on success and populate
  'struct umh_info' with two unix pipes and the pid of the user process

As the first step in the development of the bpfilter project
the fork_usermode_blob() helper is introduced to allow user mode code
to be invoked from a kernel module. The idea is that user mode code plus
normal kernel module code are built as part of the kernel build
and installed as traditional kernel module into distro specified location,
such that from a distribution point of view, there is
no difference between regular kernel modules and kernel modules + umh code.
Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
by a kernel module doesn't make it any special from kernel and user space
tooling point of view.

Such approach enables kernel to delegate functionality traditionally done
by the kernel modules into the user space processes (either root or !root) and
reduces security attack surface of the new code. The buggy umh code would crash
the user process, but not the kernel. Another advantage is that umh code
of the kernel module can be debugged and tested out of user space
(e.g. opening the possibility to run clang sanitizers, fuzzers or
user space test suites on the umh code).
In case of the bpfilter project such architecture allows complex control plane
to be done in the user space while bpf based data plane stays in the kernel.

Since umh can crash, can be oom-ed by the kernel, killed by the admin,
the kernel module that uses them (like bpfilter) needs to manage life
time of umh on its own via two unix pipes and the pid of umh.

The exit code of such kernel module should kill the umh it started,
so that rmmod of the kernel module will cleanup the corresponding umh.
Just like if the kernel module does kmalloc() it should kfree() it
in the exit code.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

449325b5

Merge branch 'qed-firmware-TLV' · 1fe8c06c

David S. Miller authored May 22, 2018

Sudarsana Reddy Kalluru says:

====================
qed*: Add support for management firmware TLV request.

Management firmware (MFW) requires config and state information from
the driver. It queries this via TLV (type-length-value) request wherein
mfw specificies the list of required TLVs. Driver fills the TLV data
and responds back to MFW.
This patch series adds qed/qede/qedf/qedi driver implementation for
supporting the TLV queries from MFW.

Changes from previous versions:
-------------------------------
v2: Split patch (2) into multiple simpler patches.
v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid
    bunch of unnecessary typecasts.

Please consider applying this series to "net-next".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1fe8c06c