Commits · c637c1035534867b85b78b453c38c495b58e2c5a · nexedi / linux

06 Feb, 2015 6 commits

tipc: resolve race problem at unicast message reception · c637c103

Jon Paul Maloy authored Feb 05, 2015

TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c637c103

tipc: use existing sk_write_queue for outgoing packet chain · 94153e36

Jon Paul Maloy authored Feb 05, 2015

The list for outgoing traffic buffers from a socket is currently
allocated on the stack. This forces us to initialize the queue for
each sent message, something costing extra CPU cycles in the most
critical data path. Later in this series we will introduce a new
safe input buffer queue, something that would force us to initialize
even the spinlock of the outgoing queue. A closer analysis reveals
that the queue always is filled and emptied within the same lock_sock()
session. It is therefore safe to use a queue aggregated in the socket
itself for this purpose. Since there already exists a queue for this
in struct sock, sk_write_queue, we introduce use of that queue in
this commit.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94153e36

tipc: split up function tipc_msg_eval() · e3a77561

Jon Paul Maloy authored Feb 05, 2015

The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.

This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3a77561

tipc: enqueue arrived buffers in socket in separate function · d570d864

Jon Paul Maloy authored Feb 05, 2015

The code for enqueuing arriving buffers in the function tipc_sk_rcv()
contains long code lines and currently goes to two indentation levels.
As a cosmetic preparaton for the next commits, we break it out into
a separate function.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d570d864

tipc: simplify message forwarding and rejection in socket layer · 1186adf7

Jon Paul Maloy authored Feb 05, 2015

Despite recent improvements, the handling of error codes and return
values at reception of messages in the socket layer is still confusing.

In this commit, we try to make it more comprehensible. First, we
separate between the return values coming from the functions called
by tipc_sk_rcv(), -those are TIPC specific error codes, and the
return values returned by tipc_sk_rcv() itself. Second, we don't
use the returned TIPC error code as indication for whether a buffer
should be forwarded/rejected or not; instead we use the buffer pointer
passed along with filter_msg(). This separation is necessary because
we sometimes want to forward messages even when there is no error
(i.e., protocol messages and successfully secondary looked up data
messages).
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1186adf7

tipc: reduce usage of context info in socket and link · c5898636

Jon Paul Maloy authored Feb 05, 2015

The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5898636

05 Feb, 2015 34 commits

hso: Use static attribute groups for sysfs entry · 4134069f

Takashi Iwai authored Feb 05, 2015

Pass the static attribute groups and the driver data via
tty_port_register_device_attr() instead of manual device_create_file()
and device_remove_file() calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4134069f

Merge branch 'isdnloop_checkpatch' · b0ebfaea

David S. Miller authored Feb 05, 2015

Bas Peters says:

====================
Fix checkpatch errors in drivers/isdn/isdnloop

This patchset adresses various checkpatch errors in the abovementioned driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b0ebfaea

drivers: isdn: isdnloop: isdnloop.c: Remove parenthesis around return values,... · 3581ec58

Bas Peters authored Feb 04, 2015

drivers: isdn: isdnloop: isdnloop.c: Remove parenthesis around return values, as specified in CodingStyle.
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3581ec58

drivers: isdn: isdnloop: isdnloop.c: Fix brace positions according to CodingStyle specifications. · cacccb06
Bas Peters authored Feb 04, 2015
```
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
cacccb06

drivers: isdn: isdnloop: isdnloop.c: remove assignment of variables in if... · 50a58b6b

Bas Peters authored Feb 04, 2015

drivers: isdn: isdnloop: isdnloop.c: remove assignment of variables in if conditions, in accordance with the CodingStyle.
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50a58b6b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6e03f896

David S. Miller authored Feb 05, 2015

Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>

6e03f896

MMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9d82f5eb

Linus Torvalds authored Feb 05, 2015

Pull networking fixes from David Miller:

 1) Stretch ACKs can kill performance with Reno and CUBIC congestion
    control, largely due to LRO and GRO.  Fix from Neal Cardwell.

 2) Fix userland breakage because we accidently emit zero length netlink
    messages from the bridging code.  From Roopa Prabhu.

 3) Carry handling in generic csum_tcpudp_nofold is broken, fix from
    Karl Beldan.

 4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas
    Dichtel.

 5) Make sure PPP deflation never returns a length greater then the
    output buffer, otherwise we overflow and trigger skb_over_panic().
    Fix from Florian Westphal.

 6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd
    Bergmann.

 7) Don't increase route cached MTU on datagram too big ICMPs.  From Li
    Wei.

 8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso.

 9) Fix bitmask handling regression in netlink that broke things like
    acpi userland tools.  From Pablo Neira Ayuso.

10) Wrong header pointer passed to param_type2af() in SCTP code, from
    Saran Maruti Ramanara.

11) Stacked vlans not handled correctly by vlan_get_protocol(), from
    Toshiaki Makita.

12) Add missing DMA memory barrier to xgene driver, from Iyappan
    Subramanian.

13) Fix crash in rate estimators, from Eric Dumazet.

14) We've been adding various workarounds, one after another, for the
    change which added the per-net tcp_sock.  It was meant to reduce
    socket contention but added lots of problems.

    Reduce this instead to a proper per-cpu socket and that rids us of
    all the daemons.

    From Eric Dumazet.

15) Fix memory corruption and OOPS in mlx4 driver, from Jack
    Morgenstein.

16) When we disabled UFO in the virtio_net device, it introduces some
    serious performance regressions.  The orignal problem was IPV6
    fragment ID generation, so fix that properly instead.  From Vlad
    Yasevich.

17) sr9700 driver build breaks on xtensa because it defines macros with
    the same name as those used by the arch code.  Use more unique
    names.  From Chen Gang.

18) Fix endianness in new virio 1.0 mode of the vhost net driver, from
    Michael S Tsirkin.

19) Several sysctls were setting the maxlen attribute incorrectly, from
    Sasha Levin.

20) Don't accept an FQ scheduler quantum of zero, that leads to crashes.
    From Kenneth Klette Jonassen.

21) Fix dumping of non-existing actions in the packet scheduler
    classifier.  From Ignacy Gawędzki.

22) Return the write work_done value when doing TX work in the qlcnic
    driver.

23) ip6gre_err accesses the info field with the wrong endianness, from
    Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
  sit: fix some __be16/u16 mismatches
  ipv6: fix sparse errors in ip6_make_flowlabel()
  net: remove some sparse warnings
  flow_keys: n_proto type should be __be16
  ip6_gre: fix endianness errors in ip6gre_err
  qlcnic: Fix NAPI poll routine for Tx completion
  amd-xgbe: Set RSS enablement based on hardware features
  amd-xgbe: Adjust for zero-based traffic class count
  cls_api.c: Fix dumping of non-existing actions' stats.
  pkt_sched: fq: avoid hang when quantum 0
  net: rds: use correct size for max unacked packets and bytes
  vhost/net: fix up num_buffers endian-ness
  gianfar: correct the bad expression while writing bit-pattern
  net: usb: sr9700: Use 'SR_' prefix for the common register macros
  Revert "drivers/net: Disable UFO through virtio"
  Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
  ipv6: Select fragment id during UFO segmentation if not set.
  xen-netback: stop the guest rx thread after a fatal error
  net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
  isdn: off by one in connect_res()
  ...

9d82f5eb

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 14365ea2

Linus Torvalds authored Feb 05, 2015

Pull SCSI fixes from James Bottomley:
 "This patch set is fixing two serious problems which have turned up
  late in the release cycle.

  The first fixes a problem with 4k sector disks where the transfer
  length (amount of data sent to the disk) was getting increased every
  time the disk was revalidated leading to potential for overflows.

  The other is a regression oops fix for some of our last merge window
  code"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  sd: Fix max transfer length for 4k disks
  scsi: fix device handler detach oops

14365ea2

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 42345d63

Linus Torvalds authored Feb 05, 2015

Pull drm fixes from Dave Airlie:
 "Radeon and amdkfd fixes.

  Radeon ones mostly for oops in some test/benchmark functions since
  fencing changes, and one regression fix for old GPUs,

  There is one cirrus regression fix, the 32bpp broke userspace, so this
  hides it behind a module option for the few users who care.

  I'm off for a few days, so this is probably the final pull I have, if
  I see fixes from Intel I'll forward the pull as I should have email"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/cirrus: Limit modes depending on bpp option
  drm/radeon: fix the crash in test functions
  drm/radeon: fix the crash in benchmark functions
  drm/radeon: properly set vm fragment size for TN/RL
  drm/radeon: don't init gpuvm if accel is disabled (v3)
  drm/radeon: fix PLLs on RS880 and older v2
  drm/amdkfd: Don't create BUG due to incorrect user parameter
  drm/amdkfd: max num of queues can't be 0
  drm/amdkfd: Fix bug in accounting of queues

42345d63

Merge tag 'spi-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · d445d46d

Linus Torvalds authored Feb 05, 2015

Pull spi fixes from Mark Brown:
 "A couple of driver specific fixes:

   - Disable DMA mode for i.MX6DL chips due to a hardware bug.

   - Don't use devm_kzalloc() outside of bind/unbind paths in the
     fsl-dspi driver, fixing memory leaks"

* tag 'spi-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: imx: use pio mode for i.mx6dl
  spi: spi-fsl-dspi: Remove usage of devm_kzalloc

d445d46d

Merge tag 'pm+acpi-3.19-fin' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e7394c77

Linus Torvalds authored Feb 05, 2015

Pull ACPI power management fix from Rafael  Wysocki:
 "This is a revert of an ACPI Low-power Subsystem (LPSS) driver change
  that was supposed to improve power management of the LPSS DMA
  controller, but introduced more serious problems.

  Since fixing them turns out to be non-trivial, it is better to revert
  the commit in question at this point and try to fix the original issue
  differently in the next cycle"

* tag 'pm+acpi-3.19-fin' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI / LPSS: introduce a 'proxy' device to power on LPSS for DMA"

e7394c77

Merge tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · f3c2352d

Linus Torvalds authored Feb 05, 2015

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration
    - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson)

  Resource management
    - Handle read-only BARs on AMD CS553x devices (Myron Stowe)

  Synopsys DesignWare
    - Reject MSI-X IRQs (Lucas Stach)"

* tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Handle read-only BARs on AMD CS553x devices
  PCI: Add NEC variants to Stratus ftServer PCIe DMI check
  PCI: designware: Reject MSI-X IRQs

f3c2352d

sit: fix some __be16/u16 mismatches · a409caec

Eric Dumazet authored Feb 04, 2015

Fixes following sparse warnings :

net/ipv6/sit.c:1509:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1509:32: expected restricted __be16 [usertype] sport
net/ipv6/sit.c:1509:32: got unsigned short
net/ipv6/sit.c:1514:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1514:32: expected restricted __be16 [usertype] dport
net/ipv6/sit.c:1514:32: got unsigned short
net/ipv6/sit.c:1711:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1711:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1711:38: got restricted __be16 [usertype] sport
net/ipv6/sit.c:1713:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1713:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1713:38: got restricted __be16 [usertype] dport
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a409caec

ipv6: fix sparse errors in ip6_make_flowlabel() · 67765146

Eric Dumazet authored Feb 04, 2015

include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22: got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22: left side has type restricted __be32
include/net/ipv6.h:719:22: right side has type unsigned int
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67765146

net: remove some sparse warnings · 2ce1ee17

Eric Dumazet authored Feb 04, 2015

netdev_adjacent_add_links() and netdev_adjacent_del_links()
are static.

queue->qdisc has __rcu annotation, need to use RCU_INIT_POINTER()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ce1ee17

flow_keys: n_proto type should be __be16 · f4575d35

Eric Dumazet authored Feb 04, 2015

(struct flow_keys)->n_proto is in network order, use
proper type for this.

Fixes following sparse errors :

net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39:    got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23:    got restricted __be16 [assigned] [usertype] proto
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d84 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>

f4575d35

vxlan: Only set has-GBP bit in header if any other bits would be set · db79a621

Thomas Graf authored Feb 04, 2015

This allows for a VXLAN-GBP socket to talk to a Linux VXLAN socket by
not setting any of the bits.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

db79a621

net: ep93xx_eth: Delete unnecessary checks before the function call "kfree" · 7af348be

Markus Elfring authored Feb 04, 2015

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7af348be

ip6_gre: fix endianness errors in ip6gre_err · d1e158e2

Sabrina Dubroca authored Feb 04, 2015

info is in network byte order, change it back to host byte order
before use. In particular, the current code sets the MTU of the tunnel
to a wrong (too big) value.

Fixes: c12b395a ("gre: Support GRE over IPv6")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1e158e2

xen-netfront: Use static attribute groups for sysfs entries · 27b917e5

Takashi Iwai authored Feb 04, 2015

Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array.  This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27b917e5

tun: Use static attribute groups for sysfs entries · c4d33e24

Takashi Iwai authored Feb 04, 2015

Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array.  This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4d33e24

qlogic: Deletion of unnecessary checks before two function calls · 7061b2bd

Markus Elfring authored Feb 04, 2015

The functions kfree() and vfree() perform also input parameter validation.
Thus the test around their calls is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

7061b2bd

Merge tag 'linux-can-next-for-3.20-20150204' of... · adfde051

David S. Miller authored Feb 05, 2015

Merge tag 'linux-can-next-for-3.20-20150204' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-02-04

this is a pull request of 2 patches for net-next/master.

Nicholas Mc Guire contributes a patch for the janz-ican3 driver to fix
a mismatch in an assignment. Ahmed S. Darwish contributes a patch for
the kvaser_usb driver, to make the driver more robust during the
bus-off handling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

adfde051

net: ethernet: ti/cpsw-common.c: fix sparse warning · f86b4ae6

Lad, Prabhakar authored Feb 04, 2015

this patch fixes following sparse warning:

cpsw-common.c:23:5: warning: symbol 'cpsw_am33xx_cm_get_macid' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f86b4ae6

netxen: Delete an unnecessary check before the function call "kfree" · de701762

Markus Elfring authored Feb 04, 2015

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

de701762

net: fec: Delete unnecessary checks before the function call "kfree" · 1b4b32c6

Markus Elfring authored Feb 04, 2015

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b4b32c6

myri10ge: Delete an unnecessary check before the function call "kfree" · 2d4ad4f6

Markus Elfring authored Feb 04, 2015

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d4ad4f6

qlcnic: Fix NAPI poll routine for Tx completion · f31ec95f

Shahed Shaikh authored Feb 04, 2015

After d75b1ade ("net: less interrupt masking in NAPI")
driver's NAPI poll routine is expected to return
exact budget value if it wants to be re-called.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Fixes: d75b1ade ("net: less interrupt masking in NAPI")
Signed-off-by: David S. Miller <davem@davemloft.net>

f31ec95f

cxgb4: Delete an unnecessary check before the function call "release_firmware" · 0b5b6bee

Markus Elfring authored Feb 04, 2015

The release_firmware() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b5b6bee

cxgb4: Add low latency socket busy_poll support · 3a336cb1

Hariprasad Shenai authored Feb 04, 2015

cxgb_busy_poll, corresponding to ndo_busy_poll, gets called by the socket
waiting for data.

With busy_poll enabled, improvement is seen in latency numbers as observed by
collecting netperf TCP_RR numbers.
Below are latency number, with and without busy-poll, in a switched environment
for a particular msg size:
netperf command: netperf -4 -H <ip> -l 30 -t TCP_RR -- -r1,1
Latency without busy-poll: ~16.25 us
Latency with busy-poll   : ~08.79 us

Based on original work by Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a336cb1

Revert "bridge: Let bridge not age 'externally' learnt FDB entries, they are... · 3fcf9011

David S. Miller authored Feb 04, 2015

Revert "bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging"

This reverts commit 9a05dde5.

Requested by Scott Feldman.
Signed-off-by: David S. Miller <davem@davemloft.net>

3fcf9011

pkt_sched: fq: better control of DDOS traffic · 06eb395f

Eric Dumazet authored Feb 04, 2015

FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.

This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.

Its default value is 1024 slots, to mimic SFQ behavior.

Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.

This patch also handles the specific case of SYNACK messages:

They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.

This is very similar to an internal patch Google have used more
than one year.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06eb395f

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f2683b74
David S. Miller authored Feb 04, 2015
```
More iov_iter work from Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
f2683b74

tcp: do not pace pure ack packets · 98781965

Eric Dumazet authored Feb 03, 2015

When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.

All TCP had to do was to compute sk->pacing_rate using simple formula:

sk->pacing_rate = 2 * cwnd * mss / rtt

It works well for senders (bulk flows), but not very well for receivers
or even RPC :

cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.

Really, only the sender should pace, according to its own logic.

Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.

Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)

This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.

This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)

In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.

This patch is a combination of two patches we used for about one year at
Google.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98781965