Commits · 43962dd7ee192299c6e0c6cd7f0a65997308f1f4 · Kirill Smelkov / linux

25 Aug, 2015 5 commits

RDS: always free recv frag as we free its ring entry · 43962dd7

santosh.shilimkar@oracle.com authored Aug 22, 2015

We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which
indicates that the recv refill path was finding allocated frags in ring
entries that were marked free. These were usually followed by OOM crashes.
They only seem to be occurring in the presence of completion errors and
connection resets.

This patch ensures that we free the frag as we mark the ring entry free.
This should stop the refill path from finding allocated frags in ring
entries that were marked free.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

43962dd7

RDS: restore return value in rds_cmsg_rdma_args() · 1d2e3f39

santosh.shilimkar@oracle.com authored Aug 22, 2015

In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns
number of pinned pages on success. And the same value is returned to the
caller of rds_cmsg_rdma_args() on success which is not intended.

Commit f4a3fc03 ("RDS: Clean up error handling in rds_cmsg_rdma_args")
removed the 'ret = 0' line which broke RDS RDMA mode.

Fix it by restoring the return value on rds_pin_pages() success
keeping the clean-up in place.
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d2e3f39

tcp: refine pacing rate determination · 43e122b0

Eric Dumazet authored Aug 21, 2015

When TCP pacing was added back in linux-3.12, we chose
to apply a fixed ratio of 200 % against current rate,
to allow probing for optimal throughput even during
slow start phase, where cwnd can be doubled every other gRTT.

At Google, we found it was better applying a different ratio
while in Congestion Avoidance phase.
This ratio was set to 120 %.

We've used the normal tcp_in_slow_start() helper for a while,
then tuned the condition to select the conservative ratio
as soon as cwnd >= ssthresh/2 :

- After cwnd reduction, it is safer to ramp up more slowly,
  as we approach optimal cwnd.
- Initial ramp up (ssthresh == INFINITY) still allows doubling
  cwnd every other RTT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

43e122b0

xfrm: Use VRF master index if output device is enslaved · 4ec3b28c

David Ahern authored Aug 20, 2015

Directs route lookups to VRF table. Compiles out if NET_VRF is not
enabled. With this patch able to successfully bring up ipsec tunnels
in VRFs, even with duplicate network configuration.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ec3b28c

tcp: fix slow start after idle vs TSO/GSO · 6f021c62

Eric Dumazet authored Aug 21, 2015

slow start after idle might reduce cwnd, but we perform this
after first packet was cooked and sent.

With TSO/GSO, it means that we might send a full TSO packet
even if cwnd should have been reduced to IW10.

Moving the SSAI check in skb_entail() makes sense, because
we slightly reduce number of times this check is done,
especially for large send() and TCP Small queue callbacks from
softirq context.

As Neal pointed out, we also need to perform the check
if/when receive window opens.

Tested:

Following packetdrill test demonstrates the problem
// Test of slow start after idle

`sysctl -q net.ipv4.tcp_slow_start_after_idle=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.100 < . 1:1(0) ack 1 win 511
+0    accept(3, ..., ...) = 4
+0    setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0

+0    write(4, ..., 26000) = 26000
+0    > . 1:5001(5000) ack 1
+0    > . 5001:10001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

+.100 < . 1:1(0) ack 10001 win 511
+0    %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
+0    > . 10001:20001(10000) ack 1
+0    > P. 20001:26001(6000) ack 1

+.100 < . 1:1(0) ack 26001 win 511
+0    %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%

+4 write(4, ..., 20000) = 20000
// If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
+0    > . 26001:31001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0    > . 31001:36001(5000) ack 1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f021c62

24 Aug, 2015 29 commits

Merge branch 'fjes' · 56fff0a0

David S. Miller authored Aug 24, 2015

Taku Izumi says:

====================
FUJITSU Extended Socket network device driver

This patchsets adds FUJITSU Extended Socket network device driver.
Extended Socket network device is a shared memory based high-speed
network interface between Extended Partitions of PRIMEQUEST 2000 E2
series.

You can get some information about Extended Partition and Extended
Socket by referring the following manual.

http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
    3.2.1 Extended Partitioning
    3.2.2 Extended Socke

v2.2 -> v3:
   - Fix up according to David's comment (No functional change)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

56fff0a0

fjes: ethtool support · 786eec27

Taku Izumi authored Aug 21, 2015

This patch adds implementation for ethtool support.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

786eec27

fjes: handle receive cancellation request interrupt · cb79eaae

Taku Izumi authored Aug 21, 2015

This patch adds implementation of handling IRQ
of other receiver's receive cancellation request.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb79eaae

fjes: epstop_task · b5a9152d

Taku Izumi authored Aug 21, 2015

This patch adds epstop_task.
This task is used to process other receiver's
cancellation request.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5a9152d

fjes: update_zone_task · 785f28e0

Taku Izumi authored Aug 21, 2015

This patch adds update_zone_task.
Zoning information can be changed by user.
This task is used to monitor if zoning information is
changed or not.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

785f28e0

fjes: unshare_watch_task · 8fc4cadb

Taku Izumi authored Aug 21, 2015

This patch adds unshare_watch_task.
Shared buffer's status can be changed into unshared.
This task is used to monitor shared buffer's status.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fc4cadb

fjes: force_close_task · ff5b4210

Taku Izumi authored Aug 21, 2015

This patch adds force_close_task.
This task is used to close network device forcibly.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff5b4210

fjes: interrupt_watch_task · 8edb62a8

Taku Izumi authored Aug 21, 2015

This patch adds interrupt_watch_task.
This task is used to prevent delay of interrupts.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8edb62a8

fjes: net_device_ops.ndo_vlan_rx_add/kill_vid · 3e3fedda

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_vlan_rx_add_vid and
net_device_ops.ndo_vlan_rx_kill_vid callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e3fedda

fjes: net_device_ops.ndo_tx_timeout · 4393e767

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_tx_timeout callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4393e767

fjes: net_device_ops.ndo_change_mtu · b9e23a67

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_change_mtu.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9e23a67

fjes: net_device_ops.ndo_get_stats64 · 879bc9a3

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_get_stats64 callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

879bc9a3

fjes: NAPI polling function · 26585930

Taku Izumi authored Aug 21, 2015

This patch adds NAPI polling function and receive related work.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26585930

fjes: tx_stall_task · ac63b947

Taku Izumi authored Aug 21, 2015

This patch adds tx_stall_task.
When receiver's buffer is full, sender stops
its tx queue. This task is used to monitor
receiver's status and when receiver's buffer
is avairable, it resumes tx queue.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac63b947

fjes: raise_intr_rxdata_task · b772b9dc

Taku Izumi authored Aug 21, 2015

This patch add raise_intr_rxdata_task.
Extended Socket Network Device is shared memory
based, so someone's transmission denotes other's
reception. In order to notify receivers, sender
has to raise interruption of receivers.
raise_intr_rxdata_task does this work.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b772b9dc

fjes: net_device_ops.ndo_start_xmit · 9acf51cb

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_start_xmit callback,
which is called when sending packets.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9acf51cb

fjes: net_device_ops.ndo_open and .ndo_stop · e5d486dc

Taku Izumi authored Aug 21, 2015

This patch adds net_device_ops.ndo_open and .ndo_stop
callback. These function is called when network device
activation and deactivation.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5d486dc

fjes: buffer address regist/unregistration routine · 7950e6c5

Taku Izumi authored Aug 21, 2015

This patch adds buffer address regist/unregistration routine.

This function is mainly invoked when network device's
activation (open) and deactivation (close)
in order to retist/unregist shared buffer address.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7950e6c5

fjes: ES information acquisition routine · 3bb025d4

Taku Izumi authored Aug 21, 2015

This patch adds ES information acquisition routine.
ES information can be retrieved issuing information
request command. ES information includes which
receiver is same zone.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3bb025d4

fjes: platform_driver's .probe and .remove routine · 2fcbca68

Taku Izumi authored Aug 21, 2015

This patch implements platform_driver's .probe and .remove
routine, and also adds board specific private data structure.

This driver registers net_device at platform_driver's .probe
routine and unregisters net_device at its .remove routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fcbca68

fjes: Hardware cleanup routine · a18aaec2

Taku Izumi authored Aug 21, 2015

This patch adds hardware cleanup routine to be
invoked at driver's .remove routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a18aaec2

fjes: Hardware initialization routine · 8cdc3f6c

Taku Izumi authored Aug 21, 2015

This patch adds hardware initialization routine to be
invoked at driver's .probe routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8cdc3f6c

fjes: Introduce FUJITSU Extended Socket Network Device driver · 658d439b

Taku Izumi authored Aug 21, 2015

This patch adds the basic code of FUJITSU Extended Socket
Network Device driver.

When "PNP0C02" is found in ACPI DSDT, it evaluates "_STR"
to check if "PNP0C02" is for Extended Socket device driver
and retrieves ACPI resource information. Then creates
platform_device.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

658d439b

3c59x: Add BQL support for 3c59x ethernet driver. · 4a89ba04

Loganaden Velvindron authored Aug 20, 2015

This BQL patch is based on work done by Tino Reichardt.

Tested on 0000:05:00.0: 3Com PCI 3c905C Tornado at ffffc90000e6e000 by running
Flent several times.
Signed-off-by: Loganaden Velvindron <logan@elandsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a89ba04

Merge branch 'ila-precompute' · b17f2964

David S. Miller authored Aug 24, 2015

Tom Herbert says:

====================
ila: Precompute checksums

This patch set:
 - Adds argument ot LWT build_state that holds a pointer to the fib
   configuration being applied to the new route
 - Adds support in ILA to precompute checksum difference for
   performance optimization

v2:
 - Move return argument in build_state to end of arguments

v3:
 - Update the signature for ip6_tun_build_state()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b17f2964

ila: Precompute checksum difference for translations · 92b78aff

Tom Herbert authored Aug 24, 2015

In the ILA build state for LWT compute the checksum difference to apply
to transport checksums that include the IPv6 pseudo header. The
difference is between the route destination (from fib6_config) and the
locator to write.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92b78aff

lwt: Add cfg argument to build_state · 127eb7cd

Tom Herbert authored Aug 24, 2015

Add cfg and family arguments to lwt build state functions. cfg is a void
pointer and will either be a pointer to a fib_config or fib6_config
structure. The family parameter indicates which one (either AF_INET
or AF_INET6).

LWT encpasulation implementation may use the fib configuration to build
the LWT state.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

127eb7cd

net: phy: add interrupt support for aquantia phy · 54cf7be9

Shaohui Xie authored Aug 21, 2015

By implementing config_intr & ack_interrupt, now the phy can support
link connect/disconnect interrupt.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54cf7be9

Merge tag 'nfc-next-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · d9893d13

David S. Miller authored Aug 23, 2015

Samuel Ortiz says:

====================
NFC 4.3 pull request

This is the NFC pull request for 4.3.
With this one we have:

- A new driver for Samsung's S3FWRN5 NFC chipset. In order to
  properly support this driver, a few NCI core routines needed
  to be exported. Future drivers like Intel's Fields Peak will
  benefit from this.

- SPI support as a physical transport for STM st21nfcb.

- An additional netlink API for sending replies back to userspace
  from vendor commands.

- 2 small fixes for TI's trf7970a

- A few st-nci fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d9893d13

23 Aug, 2015 6 commits

route: fix breakage after moving lwtunnel state · 751a587a

Jiri Benc authored Aug 21, 2015

__recnt and related fields need to be in its own cacheline for performance
reasons. Commit 61adedf3 ("route: move lwtunnel state to dst_entry")
broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered.

This patch fixes the breakage by moving the lwtunnel state to the end of
dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline
with __refcnt and may affect performance, thus further patches may be
needed.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

751a587a

Merge tag 'linux-can-next-for-4.3-20150820' of... · 31fbde99

David S. Miller authored Aug 23, 2015

Merge tag 'linux-can-next-for-4.3-20150820' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of a two patches for net-next.

The first patch is by Nik Nyby and fixes a typo in a function name. The
second patch by Lucas Stach demotes register output to debug level.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

31fbde99

Merge branch 'tipc-failover-fixes' · c5f98b56

David S. Miller authored Aug 23, 2015

Jon Maloy says:

====================
tipc: fix link failover/synch problems

We fix three problems with the new link failover/synch implementation,
which was introduced earlier in this release cycle. They are all related
to situations where there is a very short interval between the disabling
and enabling of interfaces.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c5f98b56

tipc: fix stale link problem during synchronization · 2be80c2d

Jon Paul Maloy authored Aug 20, 2015

Recent changes to the link synchronization means that we can now just
drop packets arriving on the synchronizing link before the synch point
is reached. This has lead to significant simplifications to the
implementation, but also turns out to have a flip side that we need
to consider.

Under unlucky circumstances, the two endpoints may end up
repeatedly dropping each other's packets, while immediately
asking for retransmission of the same packets, just to drop
them once more. This pattern will eventually be broken when
the synch point is reached on the other link, but before that,
the endpoints may have arrived at the retransmission limit
(stale counter) that indicates that the link should be broken.
We see this happen at rare occasions.

The fix for this is to not ask for retransmissions when a link is in
state LINK_SYNCHING. The fact that the link has reached this state
means that it has already received the first SYNCH packet, and that it
knows the synch point. Hence, it doesn't need any more packets until the
other link has reached the synch point, whereafter it can go ahead and
ask for the missing packets.

However, because of the reduced traffic on the synching link that
follows this change, it may now take longer to discover that the
synch point has been reached. We compensate for this by letting all
packets, on any of the links, trig a check for synchronization
termination. This is possible because the packets themselves don't
contain any information that is needed for discovering this condition.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2be80c2d

tipc: interrupt link synchronization when a link goes down · 5ae2f8e6

Jon Paul Maloy authored Aug 20, 2015

When we introduced the new link failover/synch mechanism
in commit 6e498158
("tipc: move link synch and failover to link aggregation level"),
we missed the case when the non-tunnel link goes down during the link
synchronization period. In this case the tunnel link will remain in
state LINK_SYNCHING, something leading to unpredictable behavior when
the failover procedure is initiated.

In this commit, we ensure that the node and remaining link goes
back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
when one of the parallel links goes down. We also ensure that we don't
re-enter synch mode if subsequent SYNCH packets arrive on the remaining
link.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ae2f8e6

tipc: eliminate risk of premature link setup during failover · 17b20630

Jon Paul Maloy authored Aug 20, 2015

When a link goes down, and there is still a working link towards its
destination node, a failover is initiated, and the failed link is not
allowed to re-establish until that procedure is finished. To ensure
this, the concerned link endpoints are set to state LINK_FAILINGOVER,
and the node endpoints to NODE_FAILINGOVER during the failover period.

However, if the link reset is due to a disabled bearer, the corres-
ponding link endpoint is deleted, and only the node endpoint knows
about the ongoing failover. Now, if the disabled bearer is re-enabled
during the failover period, the discovery mechanism may create a new
link endpoint that is ready to be established, despite that this is not
permitted. This situation may cause both the ongoing failover and any
subsequent link synchronization to fail.

In this commit, we ensure that a newly created link goes directly to
state LINK_FAILINGOVER if the corresponding node state is
NODE_FAILINGOVER. This eliminates the problem described above.

Furthermore, we tighten the criteria for which packets are allowed
to end a failover state in the function tipc_node_check_state().
By checking that the receiving link is up and running, instead of just
checking that it is not in failover mode, we eliminate the risk that
protocol packets from the re-created link may cause the failover to
be prematurely terminated.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17b20630