Commits · 209cf2a751f9ff2a516102339e54fcac0176fa78 · Kirill Smelkov / linux

21 Jun, 2013 12 commits

RDMA/ucma: Allow user space to pass AF_IB into resolve · 209cf2a7

Sean Hefty authored May 29, 2013

Allow user space applications to call resolve_addr using AF_IB.  To
support sockaddr_ib, we need to define a new structure capable of
handling the larger address size.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

209cf2a7

RDMA/ucma: Allow user space to bind to AF_IB · eebe4c3a

Sean Hefty authored May 29, 2013

Support user space binding to addresses using AF_IB.  Since
sockaddr_ib is larger than sockaddr_in6, we need to define a larger
structure when binding using AF_IB.  This time we use sockaddr_storage
to cover future cases.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

eebe4c3a

RDMA/ucma: Name changes to indicate only IP addresses supported · 05ad9457

Sean Hefty authored May 29, 2013

Several commands into the RDMA CM from user space are restricted to
supporting addresses which fit into a sockaddr_in6 structure: bind
address, resolve address, and join multicast.

With the addition of AF_IB, we need to support addresses which are
larger than sockaddr_in6.  This will be done by adding new commands
that exchange address information using sockaddr_storage.  However, to
support existing applications, we maintain the current commands and
structures, but rename them to indicate that they only support IPv4
and v6 addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

05ad9457

RDMA/ucma: Add ability to query GID addresses · edaa7a55

Sean Hefty authored May 29, 2013

Part of address resolution is mapping IP addresses to IB GIDs.  With
the changes to support querying larger addresses and more path records,
also provide a way to query IB GIDs after resolution completes.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

edaa7a55

RDMA/cma: Export cma_get_service_id() · cf53936f

Sean Hefty authored May 29, 2013

Allow the rdma_ucm to query the IB service ID formed or allocated by
the rdma_cm by exporting the cma_get_service_id() functionality.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

cf53936f

RDMA/ucma: Support querying when IB paths are not reversible · ac53b264

Sean Hefty authored May 29, 2013

The current query_route call can return up to two path records.  The
assumption being that one is the primary path, with optional support
for an alternate path.  In both cases, the paths are assumed to be
reversible and are used to send CM MADs.

With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:

	forward primary, reverse primary,
	forward alternate, reverse alternate,
	reversible primary path for CM MADs
	reversible alternate path for CM MADs.

(It is unclear at this time if IB routing will complicate this)  In
order to handle more flexible routing topologies, add a new command to
report any number of paths.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

ac53b264

IB/sa: Export function to pack a path record into wire format · 2e08b587

Sean Hefty authored May 29, 2013

Allow converting from struct ib_sa_path_rec to the IB defined SA path
record wire format.  This will be used to report path data from the
rdma cm into user space.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

2e08b587

RDMA/ucma: Support querying for AF_IB addresses · ee7aed45

Sean Hefty authored May 29, 2013

The sockaddr structure for AF_IB is larger than sockaddr_in6.  The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.

To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information.  Route (i.e. path record) data is separated.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

ee7aed45

RDMA/cma: Only listen on IB devices when using AF_IB · 94d0c939

Sean Hefty authored May 29, 2013

If an rdma_cm_id is bound to AF_IB, with a wild card address, only
listen on IB devices.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

94d0c939

RDMA/cma: Set qkey for AF_IB · 5c438135

Sean Hefty authored May 29, 2013

Allow the user to specify the qkey when using AF_IB.  The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

5c438135

RDMA/cma: Expose private data when using AF_IB · e8160e15

Sean Hefty authored May 29, 2013

If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header. Instead, all private data should be exported to the
user. When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header. Additionally, this will be
necessary to support any IB connection through the rdma cm interface,
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

e8160e15

RDMA/cma: Merge cma_get/save_net_info · fbaa1a6d

Sean Hefty authored May 29, 2013

With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

fbaa1a6d

20 Jun, 2013 28 commits

RDMA/cma: Remove unused SDP related code · 01602f11

Sean Hefty authored May 29, 2013

The SDP protocol was never merged upstream.  Remove unused SDP related
code from the RDMA CM.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

01602f11

RDMA/cma: Add support for AF_IB to cma_get_service_id() · 496ce3ce

Sean Hefty authored May 29, 2013

cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id.  Extend the call to support AF_IB,
which contains the service ID directly.  This will be needed to
support any arbitrary SID.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

496ce3ce

RDMA/cma: Add support for AF_IB to rdma_resolve_route() · f68194ca

Sean Hefty authored May 29, 2013

Allow rdma_resolve_route() to handle the case where the user specified
the source and destination addresses using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

f68194ca

RDMA/cma: Add support for AF_IB to rdma_resolve_addr() · f17df3b0

Sean Hefty authored May 29, 2013

Allow the user to specify the remote address using AF_IB format.  When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done.  The local address may still need to be
matched with a local IB device.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

f17df3b0

RDMA/cma: Verify that source and dest sa_family are the same · 4ae7152e
Sean Hefty authored May 29, 2013
```
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
```
4ae7152e

RDMA/cma: Restrict AF_IB loopback to binding to IB devices only · b0569e40

Sean Hefty authored May 29, 2013

If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

b0569e40

RDMA/cma: Add helper functions to return id address information · f4753834

Sean Hefty authored May 29, 2013

Provide inline helpers to extract source and destination address data
from the rdma_cm_id.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

f4753834

RDMA/cma: Do not modify sa_family when setting loopback address · 6a3e362d

Sean Hefty authored May 29, 2013

cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port.  Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.

As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback.  cma_bind_loopback is only invoked when
the source address is the loopback address.

Finally, add loopback support for AF_IB as part of the change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

6a3e362d

RDMA/cma: Allow user to specify AF_IB when binding · 680f920a

Sean Hefty authored May 29, 2013

Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device.  AF_IB indicates that the user is not mapping an IP
address to the native IB addressing.  (The mapping may have already
been done, or is not needed)
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

680f920a

RDMA/cma: Update port reservation to support AF_IB · 58afdcb7

Sean Hefty authored May 29, 2013

The AF_IB uses a 64-bit service id (SID), which the user can control
through the use of a mask.  The rdma_cm will assign values to the
unmasked portions of the SID based on the selected port space and port
number.

Because the IB spec divides the SID range into several regions, a
SID/mask combination may fall into one of the existing port space
ranges as defined by the RDMA CM IP Annex.  Map the AF_IB SID to the
correct RDMA port space.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

58afdcb7

IB/addr: Add AF_IB support to ip_addr_size · ef560861

Sean Hefty authored May 29, 2013

Add support for AF_IB to ip_addr_size, and rename the function to
account for the change.  Give the compiler more control over whether
the call should be inline or not by moving the definition into the .c
file, removing the static inline, and exporting it.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

ef560861

RDMA/cma: Include AF_IB in loopback and any address checks · 2e2d190c

Sean Hefty authored May 29, 2013

Enhance checks for loopback and any address to support AF_IB in
addition to AF_INET and AF_INT6.  This will allow future patches to
use AF_IB when binding and resolving addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

2e2d190c

RDMA/cma: Allow enabling reuseaddr in any state · c8dea2f9

Sean Hefty authored May 29, 2013

The rdma_cm only allows setting reuseaddr if the corresponding
rdma_cm_id is in the idle state.  Allow setting this value in other
states.  This brings the behavior more inline with sockets.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

c8dea2f9

RDMA/cma: Define native IB address · 8d36eb01

Sean Hefty authored May 29, 2013

Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB
addressing.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

8d36eb01

ndisc: Convert use of typedef ctl_table to struct ctl_table · fedaf4ff

Joe Perches authored Jun 13, 2013

This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fedaf4ff

ipv6: Convert use of typedef ctl_table to struct ctl_table · 9e8cda3b

Joe Perches authored Jun 13, 2013

This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e8cda3b

inet: frag , remove an empty ifdef. · af92e542

Rami Rosen authored Jun 15, 2013

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af92e542

htb: refactor struct htb_sched fields for performance · c9364636

Eric Dumazet authored Jun 15, 2013

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9364636

tcp: introduce a per-route knob for quick ack · bcefe17c

Cong Wang authored Jun 15, 2013

In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

	"ACKS might also be delayed because of bidirectional
	traffic, and is more controlled by the application
	response time. TCP stack can not easily estimate it."

	"ACK can be incredibly useful to recover from losses in
	a short time.

	The vast majority of TCP sessions are small lived, and we
	send one ACK per received segment anyway at beginning or
	retransmits to let the sender smoothly increase its cwnd,
	so an auto-tuning facility wont help them that much."

and according to David:

	"ACKs are the only information we have to detect loss.

	And, for the same reasons that TCP VEGAS is fundamentally
	broken, we cannot measure the pipe or some other
	receiver-side-visible piece of information to determine
	when it's "safe" to stretch ACK.

	And even if it's "safe", we should not do it so that losses are
	accurately detected and we don't spuriously retransmit.

	The only way to know when the bandwidth increases is to
	"test" it, by sending more and more packets until drops happen.
	That's why all successful congestion control algorithms must
	operate on explicited tested pieces of information.

	Similarly, it's not really possible to universally know if
	it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcefe17c

sctp: Convert __list_for_each use to list_for_each · 2c0740e4

Dave Jones authored Jun 17, 2013

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c0740e4

bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM) · 85768271

Yijing Wang authored Jun 18, 2013

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

85768271

amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM) · f9c7da5e

Yijing Wang authored Jun 18, 2013

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
Signed-off-by: David S. Miller <davem@davemloft.net>

f9c7da5e

Bnx2x: remove redundant D0 power state set · b8a39dd2

Yijing Wang authored Jun 18, 2013

Pci_enable_device() will set device power state to D0,
so it's no need to do it again in bnx2x_init_dev().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8a39dd2

net: Add missing dependencies on NETDEVICES · 2206209e

Ben Hutchings authored Jun 18, 2013

ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES.
I don't think anything should select NETDEVICES, so make it a
dependency.  It also doesn't need to select or depend on ETHERNET,
which has nothing to do with the Ethernet library functions.

BPCTL selects MII, which depends on NETDEVICES.  But everything in the
drivers/staging/silicom directory is related to net devices, so make
NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant
dependencies on NET.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

2206209e

at91_ether: Do not select NET_CORE · d6cf7a86

Ben Hutchings authored Jun 18, 2013

This has no dependency on any of the drivers under NET_CORE.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6cf7a86

net: Move MII out from under NET_CORE and hide it · a1606c7d

Ben Hutchings authored Jun 18, 2013

All drivers that select MII also need to select NET_CORE because MII
depends on it.  This is a bit ridiculous because NET_CORE is just a
menu option that doesn't enable any code by itself.

There is also no need for it to be a visible option, since its users
all select it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1606c7d

tcp:typo unset should be unsent · 9ef71e0c

Weiping Pan authored Jun 18, 2013

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ef71e0c

bonding: trivial: make alb use bond_slave_has_mac() · b88ec38d

Veaceslav Falico authored Jun 18, 2013

Also, cleanup bond_alb_handle_active_change() from 2 identical ifs.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b88ec38d