Commits · 9bcc86064bb5006257e3367fc4439f4072d82442 · Kirill Smelkov / linux

28 Nov, 2016 23 commits

net/mlx5e: Add CQE compression user control · 9bcc8606

Shaker Daibes authored Nov 27, 2016

The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bcc8606

net/mlx5e: Moves pflags to priv->params · 59ece1c9

Shaker Daibes authored Nov 27, 2016

pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59ece1c9

net/mlx5e: Add support for loopback selftest · 0952da79

Saeed Mahameed authored Nov 27, 2016

Extend the self diagnostic tests to support loopback test.

The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0952da79

net/mlx5e: Add support for ethtool self diagnostics test · d605d668

Kamal Heib authored Nov 27, 2016

The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d605d668

net/mlx5e: Add DCBX control interface · 0eca995f

Huy Nguyen authored Nov 27, 2016

Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0eca995f

net/mlx5e: ConnectX-4 firmware support for DCBX · e207b7e9

Huy Nguyen authored Nov 27, 2016

DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.

This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e207b7e9

net/mlx5: Add DCBX firmware commands support · 341c5ee2

Huy Nguyen authored Nov 27, 2016

Add set/query commands for DCBX_PARAM register
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

341c5ee2

net/mlx5e: Read ETS settings directly from firmware · 820c2c5e

Huy Nguyen authored Nov 27, 2016

Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
      creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
      is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
      implementation solves the scenarios when the DCBX is in FW control
      and willing bit is on which means the ETS setting is dictated
      by remote switch.

Also check ETS capability where needed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

820c2c5e

net/mlx5e: Support DCBX CEE API · 3a6a931d

Huy Nguyen authored Nov 27, 2016

Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a6a931d

net/mlx5e: Add qos capability check · 80653f73

Huy Nguyen authored Nov 27, 2016

Make sure firmware supports qos before exposing the DCB API.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80653f73

virtio-net: enable multiqueue by default · 44900010

Jason Wang authored Nov 25, 2016

We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
  tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
  cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44900010

Merge branch 'MV88E6097-fixes' · 06b3ccfb

David S. Miller authored Nov 28, 2016

Stefan Eichenberger says:

====================
Fix support for the MV88E6097

This patchset fixes the following two issues for the MV88E6097:
- Add missing definition of g1_irqs
- Add missing comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

06b3ccfb

net: dsa: mv88e6xxx: add missing comment for MV88E6097 · 15da3cc8

Stefan Eichenberger authored Nov 25, 2016

Add a missing comment for the MV88E6097 because of unification.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15da3cc8

net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097 · c534178b

Stefan Eichenberger authored Nov 25, 2016

Add the missing definition of g1_irqs for MV88E6097.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c534178b

Merge branch 'bpf-misc-next' · 53c4ce02

David S. Miller authored Nov 27, 2016

Daniel Borkmann says:

====================
BPF cleanups and misc updates

This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

53c4ce02

bpf: fix multiple issues in selftest suite and samples · e00c7b21

Daniel Borkmann authored Nov 26, 2016

1) The test_lru_map and test_lru_dist fails building on my machine since
   the sys/resource.h header is not included.

2) test_verifier fails in one test case where we try to call an invalid
   function, since the verifier log output changed wrt printing function
   names.

3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
   retrieving the number of possible CPUs. This is broken at least in our
   scenario and really just doesn't work.

   glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
   First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
   if that fails, depending on the config, it either tries to count CPUs
   in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
   If /proc/cpuinfo has some issue, it returns just 1 worst case. This
   oddity is nothing new [1], but semantics/behaviour seems to be settled.
   _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
   that fails it looks into /proc/stat for cpuX entries, and if also that
   fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
   unlikely all breaks down).

   While that might match num_possible_cpus() from the kernel in some
   cases, it's really not guaranteed with CPU hotplugging, and can result
   in a buffer overflow since the array in user space could have too few
   number of slots, and on perpcu map lookup, the kernel will write beyond
   that memory of the value buffer.

   William Tu reported such mismatches:

     [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
     happens when CPU hotadd is enabled. For example, in Fusion when
     setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
     -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
     will be 2 [2]. [...]

   Documentation/cputopology.txt says /sys/devices/system/cpu/possible
   outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
   so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
   own implementation. Later, we could add support to bpf(2) for passing
   a mask via CPU_SET(3), for example, to just select a subset of CPUs.

   BPF samples code needs this fix as well (at least so that people stop
   copying this). Thus, define bpf_num_possible_cpus() once in selftests
   and import it from there for the sample code to avoid duplicating it.
   The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.

After all three issues are fixed, the test suite runs fine again:

  # make run_tests | grep self
  selftests: test_verifier [PASS]
  selftests: test_maps [PASS]
  selftests: test_lru_map [PASS]
  selftests: test_kmod.sh [PASS]

  [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
  [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html

Fixes: 3059303f ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b41 ("Add sample for adding simple drop program to link")
Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e1559671 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676da ("bpf: Print function name in addition to function id")
Fixes: 5db58faf ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e00c7b21

bpf: allow for mount options to specify permissions · a3af5f80

Daniel Borkmann authored Nov 26, 2016

Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3af5f80

bpf: add owner_prog_type and accounted mem to array map's fdinfo · 21116b70

Daniel Borkmann authored Nov 26, 2016

Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.

The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.

Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

21116b70

bpf: reuse dev_is_mac_header_xmit for redirect · c491680f

Daniel Borkmann authored Nov 26, 2016

Commit dcf80034 ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c491680f

bpf: drop useless bpf_fd member from cls/act · 55556dd5

Daniel Borkmann authored Nov 26, 2016

After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

55556dd5

bpf: drop unnecessary context cast from BPF_PROG_RUN · 88575199

Daniel Borkmann authored Nov 26, 2016

Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

88575199

sfc: remove unneeded variable · e3739099

Dan Carpenter authored Nov 25, 2016

We don't use ->heap_buf after commit 46d1efd8 ("sfc: remove Software
TSO") so let's remove the last traces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3739099

Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of... · 33f8a045

David S. Miller authored Nov 27, 2016

Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.10

Major changes:

iwlwifi

* finalize and enable dynamic queue allocation
* use dev_coredumpmsg() to prevent locking the driver
* small fix to pass the AID to the FW
* use FW PS decisions with multi-queue

ath9k

* add device tree bindings
* switch to use mac80211 intermediate software queues to reduce
  latency and fix bufferbloat

wl18xx

* allow scanning in AP mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

33f8a045

27 Nov, 2016 6 commits

netdevice: fix sparse warning for HARD_TX_LOCK · 5a717f4f

Michael S. Tsirkin authored Nov 24, 2016

sparse warns about context imbalance in any code
that uses HARD_TX_LOCK/UNLOCK - this is because it's
unable to determine that flags don't change so
lock and unlock are paired.

Seems easy enough to fix by adding __acquire/__release
calls.

With this patch af_packet.c is now sparse-clean,
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a717f4f

ptp: gianfar: Use high resolution frequency method. · 42895116

Ulrik De Bie authored Nov 23, 2016

This patch depends on commit d8d26354 ("ptp: Introduce a high
resolution frequency adjustment method.")

The gianfar devices offer a frequency resolution of about 0.46 ppb
(depends on actual value of tmr_add, for the calculation assumed
0x80000000). This patch lets users of the device benefit from the increased
frequency resolution when tuning the clock. Thanks to the rounding the
maximum error between the requested frequency and the applied frequency
will then be about 0.23 ppb.

Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
on net-next (currently at v4.9-rc5).
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

42895116

mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation() · b9972d22

Eric Dumazet authored Nov 23, 2016

Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.

Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.

Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9972d22

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0b42f25d

David S. Miller authored Nov 26, 2016

udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>

0b42f25d

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d8e435f3

Linus Torvalds authored Nov 26, 2016

Pull vfs splice fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix default_file_splice_read()

d8e435f3

fix default_file_splice_read() · 8e54cada

Al Viro authored Nov 26, 2016

Botched calculation of number of pages.  As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8e54cada

26 Nov, 2016 11 commits

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · e3480312

Linus Torvalds authored Nov 26, 2016

Pull i2c fixes from Wolfram Sang:
 "Here is a revert and two bugfixes for the I2C designware driver.

  Please note that we are still hunting down a regression for the
  i2c-octeon driver. While there is a fix pending, we have unclear
  feedback from the testers currently. An rc8 would be quite helpful
  for this case"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: designware: do not disable adapter after transfer"
  i2c: designware: fix rx fifo depth tracking
  i2c: designware: report short transfers

e3480312

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · a56f3eb2

Linus Torvalds authored Nov 26, 2016

Pull ARM fix from Russell King:
 "This resolves the ksyms issues by reverting the commit which
  introduced the breakage"

There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  Revert "arm: move exports to definitions"

a56f3eb2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a0d60e62

Linus Torvalds authored Nov 26, 2016

Pull networking fixes from David Miller:

 1) Fix leak in fsl/fman driver, from Dan Carpenter.

 2) Call flow dissector initcall earlier than any networking driver can
    register and start to use it, from Eric Dumazet.

 3) Some dup header fixes from Geliang Tang.

 4) TIPC link monitoring compat fix from Jon Paul Maloy.

 5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
    Florian Fainelli.

 6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
    Mashak.

 7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  tipc: resolve connection flow control compatibility problem
  mvpp2: use correct size for memset
  net/mlx5: drop duplicate header delay.h
  net: ieee802154: drop duplicate header delay.h
  ibmvnic: drop duplicate header seq_file.h
  fsl/fman: fix a leak in tgec_free()
  net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
  tipc: improve sanity check for received domain records
  tipc: fix compatibility bug in link monitoring
  net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
  dwc_eth_qos: drop duplicate headers
  net sched filters: fix filter handle ID in tfilter_notify_chain()
  net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
  bnxt: do not busy-poll when link is down
  udplite: call proper backlog handlers
  ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
  net/mlx4_en: Free netdev resources under state lock
  net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
  rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
  bnxt_en: Fix a VXLAN vs GENEVE issue
  ...

a0d60e62

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 30e2b7cf

Linus Torvalds authored Nov 26, 2016

Pull libnvdimm fixes from Dan Williams:

 - Fix a crash that occurs at driver initialization if the memory region
   is already busy (request_mem_region() fails).

 - Fix a vma validation check that mistakenly allows a private device-
   dax mapping to be established. Device-dax explicitly forbids private
   mappings so it can guarantee a given fault granularity and backing
   memory type.

 Both of these fixes have soaked in -next and are tagged for -stable.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fail all private mapping attempts
  device-dax: check devm_nsio_enable() return value

30e2b7cf

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · fc13ca19

Linus Torvalds authored Nov 26, 2016

Pull KVM fixes from Radim Krčmář:
 "Four fixes for bugs found by syzkaller on x86, all for stable"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: check for pic and ioapic presence before use
  KVM: x86: fix out-of-bounds accesses of rtc_eoi map
  KVM: x86: drop error recovery in em_jmp_far and em_ret_far
  KVM: x86: fix out-of-bounds access in lapic

fc13ca19

Merge tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 39c15737

Linus Torvalds authored Nov 26, 2016

Pull powerpc fixes from Michael Ellerman:
 "Fixes marked for stable:
   - Set missing wakeup bit in LPCR on POWER9
   - Fix the early OPAL console wrappers
   - Fixup kernel read only mapping

  Fixes for code merged this cycle:
   - Fix missing CRCs, add more asm-prototypes.h declarations"

* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fixup kernel read only mapping
  powerpc/boot: Fix the early OPAL console wrappers
  powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
  powerpc: Set missing wakeup bit in LPCR on POWER9

39c15737

tipc: resolve connection flow control compatibility problem · 6998cc6e

Jon Paul Maloy authored Nov 24, 2016

In commit 10724cc7 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6998cc6e

Merge branch 'mlxsw-trap-groups-and-policers' · e5f12b3f

David S. Miller authored Nov 25, 2016

Jiri Pirko says:

====================
mlxsw: traps, trap groups and policers

Nogah says:

For a packet to be sent from the HW to the cpu, it needs to be trapped.
For a trap to be activate it should be assigned to a trap group.
Those trap groups can have policers, to limit the packet rate (the max
number of packets that can be sent to the cpu in a time slot, the rest
will be discarded) or the data rate (the same, but the count is not by the
number of packets but by their total length in bytes).

This patchset rearrange the trap setting API, re-write the traps and the
trap groups list in spectrum and assign them policers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e5f12b3f

mlxsw: spectrum: Add policers for trap groups · 9148e7cf

Nogah Frankel authored Nov 25, 2016

Configure policers and connect them to trap groups.

While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.

Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9148e7cf

mlxsw: reg: Add QoS Policer Configuration Register · 76a4c7d3

Nogah Frankel authored Nov 25, 2016

The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76a4c7d3

mlxsw: resources: Add max cpu policers resource · 2b77958b

Nogah Frankel authored Nov 25, 2016

Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b77958b