Commits · 12f084120e573ba5de21879a6c491179d2f94d4e · Kirill Smelkov / linux

10 Feb, 2016 3 commits

Merge branch 'renesas-bit-twiddling' · 12f08412

David S. Miller authored Feb 10, 2016

Sergei Shtylyov says:

====================
Factor out register bit twiddling in the Renesas Ethernet drivers

   Here's a set of 2 patches against DaveM's 'net-next.git' repo. We factor out
the often repeated pattern of reading a register, AND'ing and/or OR'ing some
bits, and then writing the value back.

[1/2] ravb: factor out register bit twiddling code
[2/2] sh_eth: factor out register bit twiddling code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12f08412

sh_eth: factor out register bit twiddling code · b2b14d2f

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
sh_eth_modify() -- this saves 84 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyright.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2b14d2f

ravb: factor out register bit twiddling code · 568b3ce7

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
ravb_modify() -- this saves 260 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyrights.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

568b3ce7

09 Feb, 2016 10 commits

Merge branch 'tpacket-gso-csum-offload' · ef5c0e25

David S. Miller authored Feb 09, 2016

Willem de Bruijn says:

====================
packet: tpacket gso and csum offload

Extend PACKET_VNET_HDR socket option support to packet sockets with
memory mapped rings.

Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.

Patch 1 prepares for this by moving the relevant virtio_net_hdr
logic out of packet_snd and packet_rcv into helper functions.

GSO transmission requires all headers in the skb linear section.
Patch 3 moves parsing of tx_ring slot headers before skb allocation
to enable allocation with sufficient linear size.

Changes
  v1->v2:
    - fix bounds checks:
      - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
      - compare tp_len to size_max also with GSO, just do not truncate to MTU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ef5c0e25

packet: tpacket_snd gso and checksum offload · 1d036d25

Willem de Bruijn authored Feb 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.

When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.

The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d036d25

packet: parse tpacket header before skb alloc · 8d39b4a6

Willem de Bruijn authored Feb 03, 2016

GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.

The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d39b4a6

packet: vnet_hdr support for tpacket_rcv · 58d19b19

Willem de Bruijn authored Feb 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
When enabled, a struct virtio_net_hdr will precede the data in the
packet ring slots.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c

  pkt: 1454269209.798420 len=5066
  vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
  csum: start=34 off=16
  eth: proto=0x800
  ip: src=<masked> dst=<masked> proto=6 len=5052
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58d19b19

packet: move vnet_hdr code to helper functions · 16cc1400

Willem de Bruijn authored Feb 03, 2016

packet_snd and packet_rcv support virtio net headers for GSO.
Move this logic into helper functions to be able to reuse it in
tpacket_snd and tpacket_rcv.

This is a straighforward code move with one exception. Instead of
creating and passing a separate gso_type variable, reuse
vnet_hdr.gso_type after conversion from virtio to kernel gso type.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16cc1400

bonding: 3ad: apply ad_actor settings changes immediately · 5ee14e6d

Nikolay Aleksandrov authored Feb 03, 2016

Currently the bonding allows to set ad_actor_system and prio while the
bond device is down, but these are actually applied only if there aren't
any slaves yet (applied to bond device when first slave shows up, and to
slaves at 3ad bind time). After this patch changes are applied immediately
and the new values can be used/seen after the bond's upped so it's not
necessary anymore to release all and enslave again to see the changes.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ee14e6d

Merge branch 'bridge-mdb-entry-offload-flag' · a1b486ae

David S. Miller authored Feb 09, 2016

Jiri Pirko says:

====================
bridge: mdb: flag offloaded mdb entries

This patchset extends uapi to let the user know if an mdb entry is offloaded.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a1b486ae

bridge: mdb: Passing the port-group pointer to br_mdb module · 9e8430f8

Elad Raz authored Feb 03, 2016

Passing the port-group to br_mdb in order to allow direct access to the
structure. br_mdb will later use the structure to reflect HW reflection
status via "state" variable.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e8430f8

bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->state · 9d06b6d8

Elad Raz authored Feb 03, 2016

Change net_bridge_port_group 'state' member to 'flags' and define new set
of flags internal to the kernel.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d06b6d8

bridge: mdb: add support for offloaded mdb entries · 157ede67

Elad Raz authored Feb 03, 2016

Add new bitmask member 'flags' to br_mdb_entry structure. Adding
MDB_FLAGS_OFFLOAD bit which indicates MDB entries is offloaded to hardware.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

157ede67

08 Feb, 2016 2 commits

bonding: trivial: style fixes · d66bd905

Zhang Shengju authored Feb 03, 2016

remove some redudant brackets, use sizeof(*) instead of sizeof(struct x).
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d66bd905

tcp: Fix syncookies sysctl default. · 0aca737d

David S. Miller authored Feb 08, 2016

Unintentionally the default was changed to zero, fix
that.

Fixes: 12ed8244 ("ipv4: Namespaceify tcp syncookies sysctl knob")
Signed-off-by: David S. Miller <davem@davemloft.net>

0aca737d

07 Feb, 2016 25 commits

Merge branch 'ns-tcp-sysctls' · 7158ce80

David S. Miller authored Feb 07, 2016

Nikolay Borisov says:

====================
Namespaceify more of the tcp sysctl knobs

This patch series continues making more of the tcp-related
sysctl knobs be per net-namespace. Most of these apply per
socket and have global defaults so should be safe and I
don't expect any breakages.

Having those per net-namespace is useful when multiple
containers are hosted and it is required to tune the
tcp settings for each independently of the host node.

I've split the patches to be per-sysctl but after
the review if the outcome is positive I'm happy
to either send it in one big blob or just.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7158ce80

ipv4: Namespaceify tcp_notsent_lowat sysctl knob · 4979f2d9

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4979f2d9

ipv4: Namespaceify tcp_fin_timeout sysctl knob · 1e579caa

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e579caa

ipv4: Namespaceify tcp_orphan_retries sysctl knob · c402d9be

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c402d9be

ipv4: Namespaceify tcp_retries2 sysctl knob · c6214a97

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6214a97

ipv4: Namespaceify tcp_retries1 sysctl knob · ae5c3f40

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae5c3f40

ipv4: Namespaceify tcp reordering sysctl knob · 1043e25f

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1043e25f

ipv4: Namespaceify tcp syncookies sysctl knob · 12ed8244

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12ed8244

ipv4: Namespaceify tcp synack retries sysctl knob · 7c083ecb

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c083ecb

ipv4: Namespaceify tcp syn retries sysctl knob · 6fa25166

Nikolay Borisov authored Feb 03, 2016

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fa25166

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 9d1eb21b

David S. Miller authored Feb 07, 2016

Antonio Quartulli says:

====================
This batch of patches includes a number of corrections and
improvements for our kernel-doc. These changes also make sure
that our doc is now properly processed by the kernel-doc
parsing tool.

Other than that you have a patch updating all the copyright
lines to 2016 and another patch switching the URLs in our
readme, Kconfig and MAINTAINERS file from "http" to "https".
Both by Sven Eckelmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9d1eb21b

Merge branch 'virtio_net_ethtool_settings' · e6359193

David S. Miller authored Feb 07, 2016

Nikolay Aleksandrov says:

====================
virtio_net: add ethtool get/set settings support

Patch 1 adds ethtool speed/duplex validation functions which check if the
value is defined. Patch 2 adds support for ethtool (get|set)_settings and
uses the validation functions to check the user-supplied values.

v2: split in 2 patches to allow everyone to make use of the validation
functions and allow virtio_net devices to be half duplex
v3: added a check to return error if the user tries to change anything else
besides duplex/speed as per Michael's comment
v4: Set port type to PORT_OTHER
v5: clear diff1.port (ignore port) when checking for changes since we set
it now and ethtool uses it in the set request

Sorry about the pointless iterations, should've all covered now.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e6359193

virtio_net: add ethtool support for set and get of settings · 16032be5

Nikolay Aleksandrov authored Feb 03, 2016

This patch allows the user to set and retrieve speed and duplex of the
virtio_net device via ethtool. Having this functionality is very helpful
for simulating different environments and also enables the virtio_net
device to participate in operations where proper speed and duplex are
required (e.g. currently bonding lacp mode requires full duplex). Custom
speed and duplex are not allowed, the user-supplied settings are validated
before applying.

Example:
$ ethtool eth1
Settings for eth1:
...
	Speed: Unknown!
	Duplex: Unknown! (255)
$ ethtool -s eth1 speed 1000 duplex full
$ ethtool eth1
Settings for eth1:
...
	Speed: 1000Mb/s
	Duplex: Full

Based on a patch by Roopa Prabhu.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16032be5

ethtool: add speed/duplex validation functions · 103a8ad1

Nikolay Aleksandrov authored Feb 03, 2016

Add functions which check if the speed/duplex are defined.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

103a8ad1

Merge branch 'sunvnet-tracepoints' · 13340a0a

David S. Miller authored Feb 07, 2016

Sowmini Varadhan says:

====================
sunvnet: perf tracepoint hooks

Added some perf tracepoints to help track and debug sunvnet
descriptor state at run-time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

13340a0a

sunvnet: perf tracepoint invocations to trace LDC state machine · 365a1028

Sowmini Varadhan authored Feb 02, 2016

Use sunvnet perf trace macros to monitor LDC message exchange state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

365a1028

sunvnet: Add support for perf LDC event tracing · 46fcc6ef

Sowmini Varadhan authored Feb 02, 2016

Add perf event macros for support of tracing and instrumentation
of LDC state machine
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46fcc6ef

Merge branch 'tcp_cong_ctrl_refactoring' · ffeb6437

David S. Miller authored Feb 07, 2016

Yuchung Cheng says:

====================
tcp: congestion control refactoring

This patch set refactors the sequence of congestion control,
loss recovery, and transmission logic in TCP ack processing.

The design goal is to decouple and sequence them in the following order:

  0. ACK accounting: free or tag sent packets [unchanged]

  1. loss recovery: identify lost/ecn packets and update congestion state

  2. congestion control: up/down cwnd and pacing rate based on (1)

  3. transmission: send new or retransmit old based on (1) and (2)

This refactoring makes the cwnd changes more clear because it's done
in one place. The packet accounting is also more robust especially
for connections that do not support SACK. Patch 1-4 and 6 are
refactoring and patch 5 improves TCP performance under reordering.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ffeb6437

tcp: tcp_cong_control helper · d452e6ca

Yuchung Cheng authored Feb 02, 2016

Refactor and consolidate cwnd and rate updates into a new function
tcp_cong_control().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d452e6ca

tcp: make congestion control more robust against reordering · 2d14a4de

Yuchung Cheng authored Feb 02, 2016

This change enables congestion control to update cwnd based on
not only packet cumulatively acked but also packets delivered
out-of-order. This makes congestion control robust against packet
reordering because it may raise cwnd as long as packets are being
delivered once reordering has been detected (i.e., it only cares
the amount of packets delivered, not the ordering among them).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d14a4de

tcp: refactor pkts acked accounting · 3ebd8871

Yuchung Cheng authored Feb 02, 2016

A small refactoring that gets number of packets cumulatively acked
from tcp_clean_rtx_queue() directly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ebd8871

tcp: new delivery accounting · ddf1af6f

Yuchung Cheng authored Feb 02, 2016

This patch changes the accounting of how many packets are
newly acked or sacked when the sender receives an ACK.

The current approach basically computes

   newly_acked_sacked = (prior_packets - prior_sacked) -
                        (tp->packets_out - tp->sacked_out)

   where prior_packets and prior_sacked out are snapshot
   at the beginning of the ACK processing.

The new approach tracks the delivery information via a new
TCP state variable "delivered" which monotically increases
as new packets are delivered in order or out-of-order.

The reason for this change is that the current approach is
brittle that produces negative or inaccurate estimate.

   1) For non-SACK connections, an ACK that advances the SND.UNA
   could reset the DUPACK counters (tp->sacked_out) in
   tcp_process_loss() or tcp_fastretrans_alert(). This inflates
   the inflight suddenly and causes under-estimate or even
   negative estimate. Here is a real example:

                   before   after (processing ACK)
   packets_out     75       73
   sacked_out      23        0
   ca state        Loss     Open

   The old approach computes (75-23) - (73 - 0) = -21 delivered
   while the new approach computes 1 delivered since it
   considers the 2nd-24th packets are delivered OOO.

   2) MSS change would re-count packets_out and sacked_out so
   the estimate is in-accurate and can even become negative.
   E.g., the inflight is doubled when MSS is halved.

   3) Spurious retransmission signaled by DSACK is not accounted

The new approach is simpler and more robust. For SACK connections,
tp->delivered increments as packets are being acked or sacked in
SACK and ACK processing.

For non-sack connections, it's done in tcp_remove_reno_sacks() and
tcp_add_reno_sack(). When an ACK advances the SND.UNA, tp->delivered
is incremented by the number of packets ACKed (less the current
number of DUPACKs received plus one packet hole).  Upon receiving
a DUPACK, tp->delivered is incremented assuming one out-of-order
packet is delivered.

Upon receiving a DSACK, tp->delivered is incremtened assuming one
retransmission is delivered in tcp_sacktag_write_queue().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddf1af6f

tcp: move cwnd reduction after recovery state procesing · 31ba0c10

Yuchung Cheng authored Feb 02, 2016

Currently the cwnd is reduced and increased in various different
places. The reduction happens in various places in the recovery
state processing (tcp_fastretrans_alert) while the increase
happens afterward.

A better sequence is to identify lost packets and update
the congestion control state (icsk_ca_state) first. Then base
on the new state, up/down the cwnd in one central place. It's
more clear to reason cwnd changes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31ba0c10

tcp: retransmit after recovery processing and congestion control · e662ca40

Yuchung Cheng authored Feb 02, 2016

The retransmission and F-RTO transmission currently happen inside
recovery state processing (tcp_fastretrans_alert) but before
congestion control.  This refactoring moves the logic after both
s.t. we can determine how much to send (cwnd) before deciding what to
send.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e662ca40

net: drop write-only stack variable · 3575dbf2

David Herrmann authored Feb 02, 2016

Remove a write-only stack variable from unix_attach_fds(). This is a
left-over from the security fix in:

    commit 712f4aad
    Author: willy tarreau <w@1wt.eu>
    Date:   Sun Jan 10 07:54:56 2016 +0100

        unix: properly account for FDs passed over unix sockets
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3575dbf2