Commits · bd99abdd5b876406c34b872956b3237e18613566 · Kirill Smelkov / linux

25 Sep, 2013 4 commits

Bluetooth: Move mgmt response convenience functions to a better location · bd99abdd

Johan Hedberg authored 11 years ago

The settings_rsp and cmd_status_rsp functions can be useful for all mgmt
command handlers when asynchronous request callbacks are used. They will
e.g. be used by subsequent patches to change set_le to use an async
request as well as a new set_advertising command. Therefore, move them
higher up in the mgmt.c file to avoid unnecessary forward declarations
or mixing this trivial change with other patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

bd99abdd

Bluetooth: Fix busy return for mgmt_set_powered in some cases · 87b95ba6

Johan Hedberg authored 11 years ago


We should return a "busy" error always when there is another
mgmt_set_powered operation in progress. Previously when powering on
while the auto off timer was still set the code could have let two or
more pending power on commands to be queued. This patch fixes the issue
by moving the check for duplicate commands to an earlier point in the
set_powered handler.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

87b95ba6

Bluetooth: Clean up socket locking in l2cap_sock_recvmsg · 970871bc

Johan Hedberg authored 11 years ago


This patch cleans up the locking login in l2cap_sock_recvmsg by pairing
up each lock_sock call with a release_sock call. The function already
has a "done" label that handles releasing the socket and returning from
the function so the fix is rather simple.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

970871bc

Bluetooth: Add clarifying comment to bt_sock_wait_state() · 0fba96f9

Johan Hedberg authored 11 years ago


The bt_sock_wait_state requires the sk lock to be held (through
lock_sock) so document it clearly in the code.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

0fba96f9

22 Sep, 2013 1 commit

Bluetooth: Fix assignment of 0/1 to bool variables · 941247f9

Peter Senna Tschudin authored 11 years ago

Convert 0 to false and 1 to true when assigning values to bool
variables. Inspired by commit 3db1cd5c.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/

):

@@
bool b;
@@
(
-b = 0
+b = false
|
-b = 1
+b = true
)
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

941247f9

19 Sep, 2013 2 commits

Bluetooth: Add event mask page 2 setting support · d62e6d67

Johan Hedberg authored 11 years ago


For those controller that support the HCI_Set_Event_Mask_Page_2 command
we should include it in the init sequence. This patch implements sending
of the command and enables the events in it based on supported features
(currently only CSB is checked).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

d62e6d67

Bluetooth: Add synchronization train parameters reading support · 5d4e7e8d

Johan Hedberg authored 11 years ago


This patch adds support for reading the synchronization train parameters
for controllers that support the feature. Since the feature is
detectable through the local features page 2, which is retreived only in
stage 3 of the HCI init sequence, there is no other option than to add a
fourth stage to the init sequence.

For now the patch doesn't yet add storing of the parameters, but it is
nevertheless convenient to have around to see what kind of parameters
various controllers use by default (analyzable e.g. with the btmon user
space tool).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

5d4e7e8d

18 Sep, 2013 7 commits

Bluetooth: Fix waiting for clearing of BT_SK_SUSPEND flag · e793dcf0

Johan Hedberg authored 11 years ago

In the case of blocking sockets we should not proceed with sendmsg() if
the socket has the BT_SK_SUSPEND flag set. So far the code was only
ensuring that POLLOUT doesn't get set for non-blocking sockets using
poll() but there was no code in place to ensure that blocking sockets do
the right thing when writing to them.

This patch adds a new bt_sock_wait_ready helper function to sleep in the
sendmsg call if the BT_SK_SUSPEND flag is set, and wake up as soon as it
is unset. It also updates the L2CAP and RFCOMM sendmsg callbacks to take
advantage of this new helper function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

e793dcf0

Bluetooth: Fix responding to invalid L2CAP signaling commands · 69c4e4e8

Johan Hedberg authored 11 years ago


When we have an LE link we should not respond to any data on the BR/EDR
L2CAP signaling channel (0x0001) and vice-versa when we have a BR/EDR
link we should not respond to LE L2CAP (CID 0x0005) signaling commands.
This patch fixes this issue by checking for a valid link type and
ignores data if it is wrong.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

69c4e4e8

Bluetooth: Fix sending responses to identified L2CAP response packets · 9245e737

Johan Hedberg authored 11 years ago


When L2CAP packets return a non-zero error and the value is passed
onwards by l2cap_bredr_sig_cmd this will trigger a command reject packet
to be sent. However, the core specification (page 1416 in core 4.0) says
the following: "Command Reject packets should not be sent in response to
an identified Response packet.".

This patch ensures that a command reject packet is not sent for any
identified response packet by ignoring the error return value from the
response handler functions.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

9245e737

Bluetooth: Fix L2CAP command reject reason · 7c2005d6

Johan Hedberg authored 11 years ago


There are several possible reason codes that can be sent in the command
reject L2CAP packet. Before this patch the code has used a hard-coded
single response code ("command not understood"). This patch adds a
helper function to map the return value of an L2CAP handler function to
the correct command reject reason.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

7c2005d6

Bluetooth: Fix L2CAP Disconnect response for unknown CID · c4ea249f

Johan Hedberg authored 11 years ago


If we receive an L2CAP Disconnect Request for an unknown CID we should
not just silently drop it but reply with a proper Command Reject
response. This patch fixes this by ensuring that the disconnect handler
returns a proper error instead of 0 and will cause the function caller
to send the right response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

c4ea249f

Bluetooth: Fix L2CAP error return used for failed channel lookups · 21870b52

Johan Hedberg authored 11 years ago


The EFAULT error should only be used for memory address related errors
and ENOENT might be needed for other purposes than invalid CID errors.
This patch fixes the l2cap_config_req, l2cap_connect_create_rsp and
l2cap_create_channel_req handlers to use the unique EBADSLT error to
indicate failed lookups on a given CID.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

21870b52

Bluetooth: Fix double error response for l2cap_create_chan_req · dc280801

Johan Hedberg authored 11 years ago

When an L2CAP request handler returns non-zero the calling code will
send a command reject response. The l2cap_create_chan_req function will
in some cases send its own response but then still return a -EFAULT
error which would cause two responses to be sent. This patch fixes this
by making the function return 0 after sending its own response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

dc280801

16 Sep, 2013 11 commits

Bluetooth: Only schedule raw queue when user channel is active · 52de599e

Marcel Holtmann authored 11 years ago


When the user channel is set and an user application has full control
over the device, do not bother trying to schedule any queues except
the raw queue.

This is an optimization since with user channel, only the raw queue
is in use.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

52de599e

Bluetooth: Use GFP_KERNEL when cloning SKB in a workqueue · a675d7f1

Marcel Holtmann authored 11 years ago


There is no need to use GFP_ATOMIC with skb_clone() when the code is
executed in a workqueue.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

a675d7f1

Bluetooth: Disable upper layer connections when user channel is active · af750e94

Marcel Holtmann authored 11 years ago


When the device has the user channel flag set, it means it is driven by
an user application. In that case do not allow any connections from
L2CAP or SCO sockets.

This is the same situation as when the device has the raw flag set and
it will then return EHOSTUNREACH.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

af750e94

Bluetooth: Introduce new HCI socket channel for user operation · 23500189

Marcel Holtmann authored 11 years ago


This patch introcuces a new HCI socket channel that allows user
applications to take control over a specific HCI device. The application
gains exclusive access to this device and forces the kernel to stay away
and not manage it. In case of the management interface it will actually
hide the device.

Such operation is useful for security testing tools that need to operate
underneath the Bluetooth stack and need full control over a device. The
advantage here is that the kernel still provides the service of hardware
abstraction and HCI level access. The use of Bluetooth drivers for
hardware access also means that sniffing tools like btmon or hcidump
are still working and the whole set of transaction can be traced with
existing tools.

With the new channel it is possible to send HCI commands, ACL and SCO
data packets and receive HCI events, ACL and SCO packets from the
device. The format follows the well established H:4 protocol.

The new HCI user channel can only be established when a device has been
through its setup routine and is currently powered down. This is
enforced to not cause any problems with current operations. In addition
only one user channel per HCI device is allowed. It is exclusive access
for one user application. Access to this channel is limited to process
with CAP_NET_RAW capability.

Using this new facility does not require any external library or special
ioctl or socket filters. Just create the socket and bind it. After that
the file descriptor is ready to speak H:4 protocol.

        struct sockaddr_hci addr;
        int fd;

        fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);

        memset(&addr, 0, sizeof(addr));
        addr.hci_family = AF_BLUETOOTH;
        addr.hci_dev = 0;
        addr.hci_channel = HCI_CHANNEL_USER;

        bind(fd, (struct sockaddr *) &addr, sizeof(addr));

The example shows on how to create a user channel for hci0 device. Error
handling has been left out of the example. However with the limitations
mentioned above it is advised to handle errors. Binding of the user
cahnnel socket can fail for various reasons. Specifically if the device
is currently activated by BlueZ or if the access permissions are not
present.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

23500189

Bluetooth: Introduce user channel flag for HCI devices · 0736cfa8

Marcel Holtmann authored 11 years ago


This patch introduces a new user channel flag that allows to give full
control of a HCI device to a user application. The kernel will stay away
from the device and does not allow any further modifications of the
device states.

The existing raw flag is not used since it has a bit of unclear meaning
due to its legacy. Using a new flag makes the code clearer.

A device with the user channel flag set can still be enumerate using the
legacy API, but it does not longer enumerate using the new management
interface used by BlueZ 5 and beyond. This is intentional to not confuse
users of modern systems.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

0736cfa8

Bluetooth: Restrict ioctls to HCI raw channel sockets · c1c4f956

Marcel Holtmann authored 11 years ago

The various legacy ioctls used with HCI sockets are limited to raw
channel only. They are not used on the other channels and also have
no meaning there. So return an error if tried to use them.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

c1c4f956

Bluetooth: Fix error handling for HCI socket options · c2371e80

Marcel Holtmann authored 11 years ago


The HCI sockets for monitor and control do not support any HCI specific
socket options and if tried, an error will be returned. However the
error used is EINVAL and that is not really descriptive. To make it
clear that these sockets are not handling HCI socket options, return
EBADFD instead.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

c2371e80

Bluetooth: Report error for HCI reset ioctl when device is down · 808a049e

Marcel Holtmann authored 11 years ago


Even if this is legacy API, there is no reason to not report a proper
error when trying to reset a HCI device that is down.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

808a049e

Bluetooth: Fix handling of getsockname() for HCI sockets · 9d4b68b2

Marcel Holtmann authored 11 years ago

The hci_dev check is not protected and so move it into the socket lock. In
addition return the HCI channel identifier instead of always 0 channel.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

9d4b68b2

Bluetooth: Fix handling of getpeername() for HCI sockets · 06f43cbc

Marcel Holtmann authored 11 years ago


The HCI sockets do not have a peer associated with it and so make sure
that getpeername() returns EOPNOTSUPP since this operation is actually
not supported on HCI sockets.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

06f43cbc

Bluetooth: Refactor raw socket filter into more readable code · f81fe64f

Marcel Holtmann authored 11 years ago

The handling of the raw socket filter is rather obscure code and it gets
in the way of future extensions. Instead of inline filtering in the raw
socket packet routine, refactor it into its own function.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

f81fe64f

04 Sep, 2013 15 commits

net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions · b4af8def

Daniel Borkmann authored 11 years ago


We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4af8def

net: ipv6: mld: refactor query processing into v1/v2 functions · 2b7c121f

Daniel Borkmann authored 11 years ago


Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b7c121f

net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 · cc7f7ab7

Daniel Borkmann authored 11 years ago


Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc7f7ab7

net: ipv6: mld: implement RFC3810 MLDv2 mode only · 58c0ecfd

Daniel Borkmann authored 11 years ago


RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

58c0ecfd

net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170

Daniel Borkmann authored 11 years ago


Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3f5b170

net: ipv6: mld: clean up MLD_V1_SEEN macro · 6c567b78

Daniel Borkmann authored 11 years ago


Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c567b78

net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c

Daniel Borkmann authored 11 years ago

i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da9228

 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

89225d1c

tcp: better comments for RTO initiallization · 52f20e65

Yuchung Cheng authored 11 years ago

Commit 1b7fdd2a

("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52f20e65

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 · c08751c8

Alexander Sverdlin authored 11 years ago

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.

With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.

After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

The bug was there for years already, but
- is a performance issue, the packets are still transmitted
- doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c08751c8

netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet · 1205e1fa

Phil Oester authored 11 years ago

In commit b396966c

 (netfilter: xt_TCPMSS: Fix missing fragmentation handling),
I attempted to add safe fragment handling to xt_TCPMSS.  However, Andy Padavan
of Project N56U correctly points out that returning XT_CONTINUE in this
function does not work.  The callers (tcpmss_tg[46]) expect to receive a value
of 0 in order to return XT_CONTINUE.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1205e1fa

netfilter: SYNPROXY: let unrelated packets continue · 7cc9eb6e

Jesper Dangaard Brouer authored 11 years ago


Packets reaching SYNPROXY were default dropped, as they were most
likely invalid (given the recommended state matching).  This
patch, changes SYNPROXY target to let packets, not consumed,
continue being processed by the stack.

This will be more in line other target modules. As it will allow
more flexible configurations of handling, logging or matching on
packets in INVALID states.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7cc9eb6e

netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length() · f4de4c89

Patrick McHardy authored 11 years ago


With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init:

[   80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]()

The reason is that the conntrack template is set to confirmed before adding
the extension and it is invalid to add extensions to already confirmed
conntracks. Fix by adding the extensions before setting the conntrack to
confirmed.
Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f4de4c89

netfilter: more strict TCP flag matching in SYNPROXY · 775ada6d

Jesper Dangaard Brouer authored 11 years ago


Its seems Patrick missed to incoorporate some of my requested changes
during review v2 of SYNPROXY netfilter module.

Which were, to avoid SYN+ACK packets to enter the path, meant for the
ACK packet from the client (from the 3WHS).

Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets
that didn't exclude the ACK flag.

Go a step further with SYN packet/flag matching by excluding flags
ACK+FIN+RST, in both IPv4 and IPv6 modules.

The intented usage of SYNPROXY is as follows:
(gracefully describing usage in commit)

 iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
 iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \
         -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

 echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

This does filter SYN flags early, for packets in the UNTRACKED state,
but packets in the INVALID state with other TCP flags could still
reach the module, thus this stricter flag matching is still needed.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

775ada6d

tcp: Change return value of tcp_rcv_established() · c995ae22

Vijay Subramanian authored 11 years ago

tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).

After commit 0c24604b

 (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().

We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c995ae22

net: tcp_probe: adapt tbuf size for recent changes · cc8c6c1b

Daniel Borkmann authored 11 years ago

With recent changes in tcp_probe module (e.g. f925d0a6

 ("net: tcp_probe:
add IPv6 support")) we also need to take into account that tbuf needs to
be updated as format string will be further expanded. tbuf sits on the stack
in tcpprobe_read() function that is invoked when user space reads procfs
file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established().
Having a size similarly as in sctp_probe module of 256 bytes is fully
sufficient for that, we need theoretical maximum of 252 bytes otherwise we
could get truncated.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc8c6c1b