1. 26 Oct, 2013 14 commits
    • Ying Xue's avatar
      tipc: fix lockdep warning during bearer initialization · e7b1664d
      Ying Xue authored
      [ Upstream commit 4225a398 ]
      
      When the lockdep validator is enabled, it will report the below
      warning when we enable a TIPC bearer:
      
      [ INFO: possible irq lock inversion dependency detected ]
      ---------------------------------------------------------
      Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(ptype_lock);
                                      local_irq_disable();
                                      lock(tipc_net_lock);
                                      lock(ptype_lock);
         <Interrupt>
         lock(tipc_net_lock);
      
        *** DEADLOCK ***
      
      the shortest dependencies between 2nd lock and 1st lock:
        -> (ptype_lock){+.+...} ops: 10 {
      [...]
      SOFTIRQ-ON-W at:
                            [<c1089418>] __lock_acquire+0x528/0x13e0
                            [<c108a360>] lock_acquire+0x90/0x100
                            [<c1553c38>] _raw_spin_lock+0x38/0x50
                            [<c14651ca>] dev_add_pack+0x3a/0x60
                            [<c182da75>] arp_init+0x1a/0x48
                            [<c182dce5>] inet_init+0x181/0x27e
                            [<c1001114>] do_one_initcall+0x34/0x170
                            [<c17f7329>] kernel_init+0x110/0x1b2
                            [<c155b6a2>] kernel_thread_helper+0x6/0x10
      [...]
         ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
         ... acquired at:
          [<c108a360>] lock_acquire+0x90/0x100
          [<c1553c38>] _raw_spin_lock+0x38/0x50
          [<c14651ca>] dev_add_pack+0x3a/0x60
          [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
          [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
          [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
          [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
          [<c148e802>] genl_rcv_msg+0x232/0x280
          [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
          [<c148e5bc>] genl_rcv+0x1c/0x30
          [<c148d144>] netlink_unicast+0x174/0x1f0
          [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
          [<c1456bc1>] sock_aio_write+0x161/0x170
          [<c1135a7c>] do_sync_write+0xac/0xf0
          [<c11360f6>] vfs_write+0x156/0x170
          [<c11361e2>] sys_write+0x42/0x70
          [<c155b0df>] sysenter_do_call+0x12/0x38
      [...]
      }
        -> (tipc_net_lock){+..-..} ops: 4 {
      [...]
          IN-SOFTIRQ-R at:
                           [<c108953a>] __lock_acquire+0x64a/0x13e0
                           [<c108a360>] lock_acquire+0x90/0x100
                           [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                           [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                           [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                           [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                           [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                           [<c13c43d2>] pcnet32_poll+0x292/0x780
                           [<c146b00a>] net_rx_action+0xfa/0x1e0
                           [<c103a4be>] __do_softirq+0xae/0x1e0
      [...]
      }
      
      >From the log, we can see three different call chains between
      CPU0 and CPU1:
      
      Time 0 on CPU0:
      
        kernel_init()->inet_init()->dev_add_pack()
      
      At time 0, the ptype_lock is held by CPU0 in dev_add_pack();
      
      Time 1 on CPU1:
      
        tipc_enable_bearer()->enable_bearer()->dev_add_pack()
      
      At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
      wants to take ptype_lock to register TIPC protocol handler into the
      networking stack.  But the ptype_lock has been taken by dev_add_pack()
      on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
      busy looping.
      
      Time 2 on CPU0:
      
        netif_receive_skb()->recv_msg()->tipc_recv_msg()
      
      At time 2, an incoming TIPC packet arrives at CPU0, hence
      tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
      to hold tipc_net_lock.  At the moment, below scenario happens:
      
      On CPU0, below is our sequence of taking locks:
      
        lock(ptype_lock)->lock(tipc_net_lock)
      
      On CPU1, our sequence of taking locks looks like:
      
        lock(tipc_net_lock)->lock(ptype_lock)
      
      Obviously deadlock may happen in this case.
      
      But please note the deadlock possibly doesn't occur at all when the
      first TIPC bearer is enabled.  Before enable_bearer() -- running on
      CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
      recv_msg()) is not registered successfully via dev_add_pack(), so
      the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
      message comes to CPU0. But when the second TIPC bearer is
      registered, the deadlock can perhaps really happen.
      
      To fix it, we will push the work of registering TIPC protocol
      handler into workqueue context. After the change, both paths taking
      ptype_lock are always in process contexts, thus, the deadlock should
      never occur.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e7b1664d
    • Jiri Bohac's avatar
      ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO · 6b16097d
      Jiri Bohac authored
      [ Upstream commit 61e76b17 ]
      
      RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination
      unreachable) messages:
              5 - Source address failed ingress/egress policy
      	6 - Reject route to destination
      
      Now they are treated as protocol error and icmpv6_err_convert() converts them
      to EPROTO.
      
      RFC 4443 says:
      	"Codes 5 and 6 are more informative subsets of code 1."
      
      Treat codes 5 and 6 as code 1 (EACCES)
      
      Btw, connect() returning -EPROTO confuses firefox, so that fallback to
      other/IPv4 addresses does not work:
      https://bugzilla.mozilla.org/show_bug.cgi?id=910773Signed-off-by: default avatarJiri Bohac <jbohac@suse.cz>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6b16097d
    • Daniel Borkmann's avatar
      net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay · f99d61dc
      Daniel Borkmann authored
      [ Upstream commit 2d98c29b ]
      
      While looking into MLDv1/v2 code, I noticed that bridging code does
      not convert it's max delay into jiffies for MLDv2 messages as we do
      in core IPv6' multicast code.
      
      RFC3810, 5.1.3. Maximum Response Code says:
      
        The Maximum Response Code field specifies the maximum time allowed
        before sending a responding Report. The actual time allowed, called
        the Maximum Response Delay, is represented in units of milliseconds,
        and is derived from the Maximum Response Code as follows: [...]
      
      As we update timers that work with jiffies, we need to convert it.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Linus Lüssing <linus.luessing@web.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f99d61dc
    • Thomas Graf's avatar
      ipv6: Don't depend on per socket memory for neighbour discovery messages · 7676306f
      Thomas Graf authored
      [ Upstream commit 25a6e6b8 ]
      
      Allocating skbs when sending out neighbour discovery messages
      currently uses sock_alloc_send_skb() based on a per net namespace
      socket and thus share a socket wmem buffer space.
      
      If a netdevice is temporarily unable to transmit due to carrier
      loss or for other reasons, the queued up ndisc messages will cosnume
      all of the wmem space and will thus prevent from any more skbs to
      be allocated even for netdevices that are able to transmit packets.
      
      The number of neighbour discovery messages sent is very limited,
      use of alloc_skb() bypasses the socket wmem buffer size enforcement
      while the manual call to skb_set_owner_w() maintains the socket
      reference needed for the IPv6 output path.
      
      This patch has orginally been posted by Eric Dumazet in a modified
      form.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Fabio Estevam <festevam@gmail.com>
      Tested-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Tested-by: default avatarStephen Warren <swarren@nvidia.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7676306f
    • Hannes Frederic Sowa's avatar
      ipv6: drop packets with multiple fragmentation headers · 60c98d1e
      Hannes Frederic Sowa authored
      [ Upstream commit f46078cf ]
      
      It is not allowed for an ipv6 packet to contain multiple fragmentation
      headers. So discard packets which were already reassembled by
      fragmentation logic and send back a parameter problem icmp.
      
      The updates for RFC 6980 will come in later, I have to do a bit more
      research here.
      
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      60c98d1e
    • Hannes Frederic Sowa's avatar
      ipv6: remove max_addresses check from ipv6_create_tempaddr · 73199017
      Hannes Frederic Sowa authored
      [ Upstream commit 4b08a8f1 ]
      
      Because of the max_addresses check attackers were able to disable privacy
      extensions on an interface by creating enough autoconfigured addresses:
      
      <http://seclists.org/oss-sec/2012/q4/292>
      
      But the check is not actually needed: max_addresses protects the
      kernel to install too many ipv6 addresses on an interface and guards
      addrconf_prefix_rcv to install further addresses as soon as this limit
      is reached. We only generate temporary addresses in direct response of
      a new address showing up. As soon as we filled up the maximum number of
      addresses of an interface, we stop installing more addresses and thus
      also stop generating more temp addresses.
      
      Even if the attacker tries to generate a lot of temporary addresses
      by announcing a prefix and removing it again (lifetime == 0) we won't
      install more temp addresses, because the temporary addresses do count
      to the maximum number of addresses, thus we would stop installing new
      autoconfigured addresses when the limit is reached.
      
      This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
      possible).
      
      Thanks to Ding Tianhong to bring this topic up again.
      
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: George Kargiotakis <kargig@void.gr>
      Cc: P J P <ppandit@redhat.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      73199017
    • Dan Carpenter's avatar
      tun: signedness bug in tun_get_user() · 60be6501
      Dan Carpenter authored
      [ Upstream commit 15718ea0 ]
      
      The recent fix d9bf5f13 "tun: compare with 0 instead of total_len" is
      not totally correct.  Because "len" and "sizeof()" are size_t type, that
      means they are never less than zero.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      60be6501
    • Neil Horman's avatar
      8139cp: Add dma_mapping_error checking · 5cacde1a
      Neil Horman authored
      [ Upstream commits cf3c4c03 and
        d06f5187 (the latter is a fixup
        from Dave Jones) ]
      
      Self explanitory dma_mapping_error addition to the 8139 driver, based on this:
      https://bugzilla.redhat.com/show_bug.cgi?id=947250
      
      It showed several backtraces arising for dma_map_* usage without checking the
      return code on the mapping.  Add the check and abort the rx/tx operation if its
      failed.  Untested as I have no hardware and the reporter has wandered off, but
      seems pretty straightforward.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Francois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5cacde1a
    • Hannes Frederic Sowa's avatar
      ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match · 729d6632
      Hannes Frederic Sowa authored
      [ Upstream commit 3e3be275 ]
      
      In case a subtree did not match we currently stop backtracking and return
      NULL (root table from fib_lookup). This could yield in invalid routing
      table lookups when using subtrees.
      
      Instead continue to backtrack until a valid subtree or node is found
      and return this match.
      
      Also remove unneeded NULL check.
      Reported-by: default avatarTeco Boot <teco@inf-net.nl>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: David Lamparter <equinox@diac24.net>
      Cc: <boutier@pps.univ-paris-diderot.fr>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      729d6632
    • Eric Dumazet's avatar
      tcp: cubic: fix bug in bictcp_acked() · 1a6c214e
      Eric Dumazet authored
      [ Upstream commit cd6b423a ]
      
      While investigating about strange increase of retransmit rates
      on hosts ~24 days after boot, Van found hystart was disabled
      if ca->epoch_start was 0, as following condition is true
      when tcp_time_stamp high order bit is set.
      
      (s32)(tcp_time_stamp - ca->epoch_start) < HZ
      
      Quoting Van :
      
       At initialization & after every loss ca->epoch_start is set to zero so
       I believe that the above line will turn off hystart as soon as the 2^31
       bit is set in tcp_time_stamp & hystart will stay off for 24 days.
       I think we've observed that cubic's restart is too aggressive without
       hystart so this might account for the higher drop rate we observe.
      Diagnosed-by: default avatarVan Jacobson <vanj@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1a6c214e
    • Eric Dumazet's avatar
      tcp: cubic: fix overflow error in bictcp_update() · 9588edd0
      Eric Dumazet authored
      [ Upstream commit 2ed0edf9 ]
      
      commit 17a6e9f1 ("tcp_cubic: fix clock dependency") added an
      overflow error in bictcp_update() in following code :
      
      /* change the unit from HZ to bictcp_HZ */
      t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
            ca->epoch_start) << BICTCP_HZ) / HZ;
      
      Because msecs_to_jiffies() being unsigned long, compiler does
      implicit type promotion.
      
      We really want to constrain (tcp_time_stamp - ca->epoch_start)
      to a signed 32bit value, or else 't' has unexpected high values.
      
      This bugs triggers an increase of retransmit rates ~24 days after
      boot [1], as the high order bit of tcp_time_stamp flips.
      
      [1] for hosts with HZ=1000
      
      Big thanks to Van Jacobson for spotting this problem.
      Diagnosed-by: default avatarVan Jacobson <vanj@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9588edd0
    • Eric Dumazet's avatar
      fib_trie: remove potential out of bound access · c521d041
      Eric Dumazet authored
      [ Upstream commit aab515d7 ]
      
      AddressSanitizer [1] dynamic checker pointed a potential
      out of bound access in leaf_walk_rcu()
      
      We could allocate one more slot in tnode_new() to leave the prefetch()
      in-place but it looks not worth the pain.
      
      Bug added in commit 82cfbb00 ("[IPV4] fib_trie: iterator recode")
      
      [1] :
      https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelReported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c521d041
    • Roman Gushchin's avatar
      net: check net.core.somaxconn sysctl values · 243e49a5
      Roman Gushchin authored
      [ Upstream commit 5f671d6b ]
      
      It's possible to assign an invalid value to the net.core.somaxconn
      sysctl variable, because there is no checks at all.
      
      The sk_max_ack_backlog field of the sock structure is defined as
      unsigned short. Therefore, the backlog argument in inet_listen()
      shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
      is truncated to the somaxconn value. So, the somaxconn value shouldn't
      exceed 65535 (USHRT_MAX).
      Also, negative values of somaxconn are meaningless.
      
      before:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      net.core.somaxconn = 65536
      $ sysctl -w net.core.somaxconn=-100
      net.core.somaxconn = -100
      
      after:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      error: "Invalid argument" setting key "net.core.somaxconn"
      $ sysctl -w net.core.somaxconn=-100
      error: "Invalid argument" setting key "net.core.somaxconn"
      
      Based on a prior patch from Changli Gao.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Reported-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      243e49a5
    • stephen hemminger's avatar
      htb: fix sign extension bug · 746db946
      stephen hemminger authored
      [ Upstream commit cbd37556 ]
      
      When userspace passes a large priority value
      the assignment of the unsigned value hopt->prio
      to  signed int cl->prio causes cl->prio to become negative and the
      comparison is with TC_HTB_NUMPRIO is always false.
      
      The result is that HTB crashes by referencing outside
      the array when processing packets. With this patch the large value
      wraps around like other values outside the normal range.
      
      See: https://bugzilla.kernel.org/show_bug.cgi?id=60669Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      746db946
  2. 10 Sep, 2013 26 commits