1. 25 Sep, 2013 4 commits
  2. 22 Sep, 2013 1 commit
  3. 19 Sep, 2013 2 commits
  4. 18 Sep, 2013 7 commits
  5. 16 Sep, 2013 11 commits
  6. 04 Sep, 2013 15 commits
    • Daniel Borkmann's avatar
      net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions · b4af8def
      Daniel Borkmann authored
      
      We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
      mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
      more readable.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4af8def
    • Daniel Borkmann's avatar
      net: ipv6: mld: refactor query processing into v1/v2 functions · 2b7c121f
      Daniel Borkmann authored
      
      Make igmp6_event_query() a bit easier to read by refactoring code
      parts into mld_process_v1() and mld_process_v2().
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b7c121f
    • Daniel Borkmann's avatar
      net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 · cc7f7ab7
      Daniel Borkmann authored
      
      Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
      0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
      eventually done in igmp6_group_queried() anyway, so we can simplify
      a check there.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc7f7ab7
    • Daniel Borkmann's avatar
      net: ipv6: mld: implement RFC3810 MLDv2 mode only · 58c0ecfd
      Daniel Borkmann authored
      
      RFC3810, 10. Security Considerations says under subsection 10.1.
      Query Message:
      
        A forged Version 1 Query message will put MLDv2 listeners on that
        link in MLDv1 Host Compatibility Mode. This scenario can be avoided
        by providing MLDv2 hosts with a configuration option to ignore
        Version 1 messages completely.
      
      Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:
      
        echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
        echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version
      
      Note that <all> device has a higher precedence as it was previously
      also the case in the macro MLD_V1_SEEN() that would "short-circuit"
      if condition on <all> case.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58c0ecfd
    • Daniel Borkmann's avatar
      net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170
      Daniel Borkmann authored
      
      Get rid of MLDV2_MRC and use our new macros for mantisse and
      exponent to calculate Maximum Response Delay out of the Maximum
      Response Code.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3f5b170
    • Daniel Borkmann's avatar
      net: ipv6: mld: clean up MLD_V1_SEEN macro · 6c567b78
      Daniel Borkmann authored
      
      Replace the macro with a function to make it more readable. GCC will
      eventually decide whether to inline this or not (also, that's not
      fast-path anyway).
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c567b78
    • Daniel Borkmann's avatar
      net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c
      Daniel Borkmann authored
      i) RFC3810, 9.2. Query Interval [QI] says:
      
         The Query Interval variable denotes the interval between General
         Queries sent by the Querier. Default value: 125 seconds. [...]
      
      ii) RFC3810, 9.3. Query Response Interval [QRI] says:
      
        The Maximum Response Delay used to calculate the Maximum Response
        Code inserted into the periodic General Queries. Default value:
        10000 (10 seconds) [...] The number of seconds represented by the
        [Query Response Interval] must be less than the [Query Interval].
      
      iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:
      
        The Older Version Querier Present Timeout is the time-out for
        transitioning a host back to MLDv2 Host Compatibility Mode. When an
        MLDv1 query is received, MLDv2 hosts set their Older Version Querier
        Present Timer to [Older Version Querier Present Timeout].
      
        This value MUST be ([Robustness Variable] times (the [Query Interval]
        in the last Query received)) plus ([Query Response Interval]).
      
      Hence, on *default* the timeout results in:
      
        [RV] = 2, [QI] = 125sec, [QRI] = 10sec
        [OVQPT] = [RV] * [QI] + [QRI] = 260sec
      
      Having that said, we currently calculate [OVQPT] (here given as 'switchback'
      variable) as ...
      
        switchback = (idev->mc_qrv + 1) * max_delay
      
      RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
      section "9.14. Configuring timers", it is said:
      
        This section is meant to provide advice to network administrators on
        how to tune these settings to their network. Ambitious router
        implementations might tune these settings dynamically based upon
        changing characteristics of the network. [...]
      
      iv) RFC38010, 9.14.2. Query Interval:
      
        The overall level of periodic MLD traffic is inversely proportional
        to the Query Interval. A longer Query Interval results in a lower
        overall level of MLD traffic. The value of the Query Interval MUST
        be equal to or greater than the Maximum Response Delay used to
        calculate the Maximum Response Code inserted in General Query
        messages.
      
      I assume that was why switchback is calculated as is (3 * max_delay), although
      this setting seems to be meant for routers only to configure their [QI]
      interval for non-default intervals. So usage here like this is clearly wrong.
      
      Concluding, the current behaviour in IPv6's multicast code is not conform
      to the RFC as switch back is calculated wrongly. That is, it has a too small
      value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
      instead of ~260secs on default.
      
      Hence, introduce necessary helper functions and fix this up properly as it
      should be.
      
      Introduced in 06da9228
      
       ("[IPV6]: Add MLDv2 support."). Credits to Hannes
      Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
      who did initial testing.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: David Stevens <dlstevens@us.ibm.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89225d1c
    • Yuchung Cheng's avatar
      tcp: better comments for RTO initiallization · 52f20e65
      Yuchung Cheng authored
      Commit 1b7fdd2a
      
      ("tcp: do not use cached RTT for RTT estimation")
      removes important comments on how RTO is initialized and updated.
      Hopefully this patch puts those information back.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52f20e65
    • Alexander Sverdlin's avatar
      net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 · c08751c8
      Alexander Sverdlin authored
      
      net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4
      
      Initially the problem was observed with ipsec, but later it became clear that
      SCTP data chunk fragmentation algorithm has problems with MTU values which are
      not multiple of 4. Test program was used which just transmits 2000 bytes long
      packets to other host. tcpdump was used to observe re-fragmentation in IP layer
      after SCTP already fragmented data chunks.
      
      With MTU 1500:
      12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
          10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
      12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
          10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
      12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      With MTU 1499:
      13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
          10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
      13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
          10.151.38.153 > 10.151.24.91: ip-proto-132
      13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
          10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
      13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      Here problem in data portion limit calculation leads to re-fragmentation in IP,
      which is sub-optimal. The problem is max_data initial value, which doesn't take
      into account the fact, that data chunk must be padded to 4-bytes boundary.
      It's enough to correct max_data, because all later adjustments are correctly
      aligned to 4-bytes boundary.
      
      After the fix is applied, everything is fragmented correctly for uneven MTUs:
      15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
          10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
      15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
          10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
      15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      The bug was there for years already, but
       - is a performance issue, the packets are still transmitted
       - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nsn.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c08751c8
    • Phil Oester's avatar
      netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet · 1205e1fa
      Phil Oester authored
      In commit b396966c
      
       (netfilter: xt_TCPMSS: Fix missing fragmentation handling),
      I attempted to add safe fragment handling to xt_TCPMSS.  However, Andy Padavan
      of Project N56U correctly points out that returning XT_CONTINUE in this
      function does not work.  The callers (tcpmss_tg[46]) expect to receive a value
      of 0 in order to return XT_CONTINUE.
      Signed-off-by: default avatarPhil Oester <kernel@linuxace.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1205e1fa
    • Jesper Dangaard Brouer's avatar
      netfilter: SYNPROXY: let unrelated packets continue · 7cc9eb6e
      Jesper Dangaard Brouer authored
      
      Packets reaching SYNPROXY were default dropped, as they were most
      likely invalid (given the recommended state matching).  This
      patch, changes SYNPROXY target to let packets, not consumed,
      continue being processed by the stack.
      
      This will be more in line other target modules. As it will allow
      more flexible configurations of handling, logging or matching on
      packets in INVALID states.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7cc9eb6e
    • Patrick McHardy's avatar
      netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length() · f4de4c89
      Patrick McHardy authored
      
      With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init:
      
      [   80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]()
      
      The reason is that the conntrack template is set to confirmed before adding
      the extension and it is invalid to add extensions to already confirmed
      conntracks. Fix by adding the extensions before setting the conntrack to
      confirmed.
      Reported-by: default avatarJesper Dangaard Brouer <jesper.brouer@gmail.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f4de4c89
    • Jesper Dangaard Brouer's avatar
      netfilter: more strict TCP flag matching in SYNPROXY · 775ada6d
      Jesper Dangaard Brouer authored
      
      Its seems Patrick missed to incoorporate some of my requested changes
      during review v2 of SYNPROXY netfilter module.
      
      Which were, to avoid SYN+ACK packets to enter the path, meant for the
      ACK packet from the client (from the 3WHS).
      
      Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets
      that didn't exclude the ACK flag.
      
      Go a step further with SYN packet/flag matching by excluding flags
      ACK+FIN+RST, in both IPv4 and IPv6 modules.
      
      The intented usage of SYNPROXY is as follows:
      (gracefully describing usage in commit)
      
       iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
       iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \
               -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn
      
       echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
      
      This does filter SYN flags early, for packets in the UNTRACKED state,
      but packets in the INVALID state with other TCP flags could still
      reach the module, thus this stricter flag matching is still needed.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      775ada6d
    • Vijay Subramanian's avatar
      tcp: Change return value of tcp_rcv_established() · c995ae22
      Vijay Subramanian authored
      tcp_rcv_established() returns only one value namely 0. We change the return
      value to void (as suggested by David Miller).
      
      After commit 0c24604b
      
       (tcp: implement RFC 5961 4.2), we no longer send RSTs in
      response to SYNs. We can remove the check and processing on the return value of
      tcp_rcv_established().
      
      We also fix jtcp_rcv_established() in tcp_probe.c to match that of
      tcp_rcv_established().
      Signed-off-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c995ae22
    • Daniel Borkmann's avatar
      net: tcp_probe: adapt tbuf size for recent changes · cc8c6c1b
      Daniel Borkmann authored
      With recent changes in tcp_probe module (e.g. f925d0a6
      
       ("net: tcp_probe:
      add IPv6 support")) we also need to take into account that tbuf needs to
      be updated as format string will be further expanded. tbuf sits on the stack
      in tcpprobe_read() function that is invoked when user space reads procfs
      file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established().
      Having a size similarly as in sctp_probe module of 256 bytes is fully
      sufficient for that, we need theoretical maximum of 252 bytes otherwise we
      could get truncated.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc8c6c1b