1. 05 Feb, 2014 6 commits
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() · 53b70287
      Patrick McHardy authored
      The map that is used to allocate anonymous sets is indeed
      BITS_PER_BYTE * PAGE_SIZE long.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      53b70287
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · e53376be
      Pablo Neira Ayuso authored
      With this patch, the conntrack refcount is initially set to zero and
      it is bumped once it is added to any of the list, so we fulfill
      Eric's golden rule which is that all released objects always have a
      refcount that equals zero.
      
      Andrey Vagin reports that nf_conntrack_free can't be called for a
      conntrack with non-zero ref-counter, because it can race with
      nf_conntrack_find_get().
      
      A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
      ref-counter says that this conntrack is used. So when we release
      a conntrack with non-zero counter, we break this assumption.
      
      CPU1                                    CPU2
      ____nf_conntrack_find()
                                              nf_ct_put()
                                               destroy_conntrack()
                                              ...
                                              init_conntrack
                                               __nf_conntrack_alloc (set use = 1)
      atomic_inc_not_zero(&ct->use) (use = 2)
                                               if (!l4proto->new(ct, skb, dataoff, timeouts))
                                                nf_conntrack_free(ct); (use = 2 !!!)
                                              ...
                                              __nf_conntrack_alloc (set use = 1)
       if (!nf_ct_key_equal(h, tuple, zone))
        nf_ct_put(ct); (use = 0)
         destroy_conntrack()
                                              /* continue to work with CT */
      
      After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
      race in nf_conntrack_find_get" another bug was triggered in
      destroy_conntrack():
      
      <4>[67096.759334] ------------[ cut here ]------------
      <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
      ...
      <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
      <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
      <4>[67096.760255] Call Trace:
      <4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
      <4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
      <4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
      <4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
      <4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
      <4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
      <4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
      <4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
      <4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
      <4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
      <4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
      <4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
      <4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
      <4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      
      I have reused the original title for the RFC patch that Andrey posted and
      most of the original patch description.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Vagin <avagin@parallels.com>
      Cc: Florian Westphal <fw@strlen.de>
      Reported-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAndrew Vagin <avagin@parallels.com>
      e53376be
    • Alexey Dobriyan's avatar
      netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() · 829d9315
      Alexey Dobriyan authored
      Similar bug fixed in SIP module in 3f509c68 ("netfilter: nf_nat_sip: fix
      incorrect handling of EBUSY for RTCP expectation").
      
      BUG: unable to handle kernel paging request at 00100104
      IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
      ...
      Call Trace:
        [<c0244bd8>] ? del_timer+0x48/0x70
        [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
        [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
        [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c024442d>] call_timer_fn+0x1d/0x80
        [<c024461e>] run_timer_softirq+0x18e/0x1a0
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c023e6f3>] __do_softirq+0xa3/0x170
        [<c023e650>] ? __local_bh_enable+0x70/0x70
        <IRQ>
        [<c023e587>] ? irq_exit+0x67/0xa0
        [<c0202af6>] ? do_IRQ+0x46/0xb0
        [<c027ad05>] ? clockevents_notify+0x35/0x110
        [<c066ac6c>] ? common_interrupt+0x2c/0x40
        [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
        [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
        [<c02085f8>] ? arch_cpu_idle+0x8/0x30
        [<c027314b>] ? cpu_idle_loop+0x4b/0x140
        [<c0273258>] ? cpu_startup_entry+0x18/0x20
        [<c066056d>] ? rest_init+0x5d/0x70
        [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
        [<c081364f>] ? repair_env_string+0x5b/0x5b
        [<c0813269>] ? i386_start_kernel+0x33/0x35
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      829d9315
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · c6825c09
      Andrey Vagin authored
      Lets look at destroy_conntrack:
      
      hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
      ...
      nf_conntrack_free(ct)
      	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
      
      net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
      
      The hash is protected by rcu, so readers look up conntracks without
      locks.
      A conntrack is removed from the hash, but in this moment a few readers
      still can use the conntrack. Then this conntrack is released and another
      thread creates conntrack with the same address and the equal tuple.
      After this a reader starts to validate the conntrack:
      * It's not dying, because a new conntrack was created
      * nf_ct_tuple_equal() returns true.
      
      But this conntrack is not initialized yet, so it can not be used by two
      threads concurrently. In this case BUG_ON may be triggered from
      nf_nat_setup_info().
      
      Florian Westphal suggested to check the confirm bit too. I think it's
      right.
      
      task 1			task 2			task 3
      			nf_conntrack_find_get
      			 ____nf_conntrack_find
      destroy_conntrack
       hlist_nulls_del_rcu
       nf_conntrack_free
       kmem_cache_free
      						__nf_conntrack_alloc
      						 kmem_cache_alloc
      						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
      			 if (nf_ct_is_dying(ct))
      			 if (!nf_ct_tuple_equal()
      
      I'm not sure, that I have ever seen this race condition in a real life.
      Currently we are investigating a bug, which is reproduced on a few nodes.
      In our case one conntrack is initialized from a few tasks concurrently,
      we don't have any other explanation for this.
      
      <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
      ...
      <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
      ...
      <4>[46267.085549] Call Trace:
      <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
      <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
      <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
      <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
      <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
      <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
      <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
      <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
      <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
      <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
      <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
      <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
      <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
      <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
      <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
      <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
      <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
      <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
      <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
      <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
      <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
      <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
      <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
      <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
      <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c6825c09
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix oops when deleting a chain with references · 3dd7279f
      Patrick McHardy authored
      The following commands trigger an oops:
      
       # nft -i
       nft> add table filter
       nft> add chain filter input { type filter hook input priority 0; }
       nft> add chain filter test
       nft> add rule filter input jump test
       nft> delete chain filter test
      
      We need to check the chain use counter before allowing destruction since
      we might have references from sets or jump rules.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341Reported-by: default avatarMatthew Ife <deleriux1@gmail.com>
      Tested-by: default avatarMatthew Ife <deleriux1@gmail.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3dd7279f
    • Arturo Borrero's avatar
      netfilter: nft_ct: fix unconditional dump of 'dir' attr · 2a53bfb3
      Arturo Borrero authored
      We want to make sure that the information that we get from the kernel can
      be reinjected without troubles. The kernel shouldn't return an attribute
      that is not required, or even prohibited.
      
      Dumping unconditionally NFTA_CT_DIRECTION could lead an application in
      userspace to interpret that the attribute was originally set, while it
      was not.
      Signed-off-by: default avatarArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2a53bfb3
  2. 04 Feb, 2014 1 commit
  3. 28 Jan, 2014 8 commits
    • Martin Schwenke's avatar
      net: Document promote_secondaries · d922e1cb
      Martin Schwenke authored
      From 038a821667f62c496f2bbae27081b1b612122a97 Mon Sep 17 00:00:00 2001
      From: Martin Schwenke <martin@meltin.net>
      Date: Tue, 28 Jan 2014 15:16:49 +1100
      Subject: [PATCH] net: Document promote_secondaries
      
      This option was added a long time ago...
      
        commit 8f937c60
        Author: Harald Welte <laforge@gnumonks.org>
        Date:   Sun May 29 20:23:46 2005 -0700
      
          [IPV4]: Primary and secondary addresses
      Signed-off-by: default avatarMartin Schwenke <martin@meltin.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d922e1cb
    • Duan Jiong's avatar
      net: gre: use icmp_hdr() to get inner ip header · c0c0c50f
      Duan Jiong authored
      When dealing with icmp messages, the skb->data points the
      ip header that triggered the sending of the icmp message.
      
      In gre_cisco_err(), the parse_gre_header() is called, and the
      iptunnel_pull_header() is called to pull the skb at the end of
      the parse_gre_header(), so the skb->data doesn't point the
      inner ip header.
      
      Unfortunately, the ipgre_err still needs those ip addresses in
      inner ip header to look up tunnel by ip_tunnel_lookup().
      
      So just use icmp_hdr() to get inner ip header instead of skb->data.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0c0c50f
    • Dave Jones's avatar
      i40e: Add missing braces to i40e_dcb_need_reconfig() · 3d9667a9
      Dave Jones authored
      Indentation mismatch spotted with Coverity.
      Introduced in 4e3b35b0 ("i40e: add DCB and DCBNL support")
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d9667a9
    • Annie Li's avatar
      xen-netfront: fix resource leak in netfront · cefe0078
      Annie Li authored
      This patch removes grant transfer releasing code from netfront, and uses
      gnttab_end_foreign_access to end grant access since
      gnttab_end_foreign_access_ref may fail when the grant entry is
      currently used for reading or writing.
      
      * clean up grant transfer code kept from old netfront(2.6.18) which grants
      pages for access/map and transfer. But grant transfer is deprecated in current
      netfront, so remove corresponding release code for transfer.
      
      * fix resource leak, release grant access (through gnttab_end_foreign_access)
      and skb for tx/rx path, use get_page to ensure page is released when grant
      access is completed successfully.
      
      Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
      for them will be created separately.
      
      V6: Correct subject line and commit message.
      
      V5: Remove unecessary change in xennet_end_access.
      
      V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
      single patch.
      
      V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
      grant acess is ended.
      
      V2: Improve patch comments.
      Signed-off-by: default avatarAnnie Li <annie.li@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cefe0078
    • Stephen Rothwell's avatar
      ce60e0c4
    • Haiyang Zhang's avatar
      hyperv: Add support for physically discontinuous receive buffer · b679ef73
      Haiyang Zhang authored
      This will allow us to use bigger receive buffer, and prevent allocation failure
      due to fragmented memory.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b679ef73
    • Stanislaw Gruszka's avatar
      sky2: initialize napi before registering device · 731073b9
      Stanislaw Gruszka authored
      There is race condition when call netif_napi_add() after
      register_netdevice(), as ->open() can be called without napi initialized
      and trigger BUG_ON() on napi_enable(), like on below messages:
      
      [    9.699863] sky2: driver version 1.30
      [    9.699960] sky2 0000:02:00.0: Yukon-2 EC Ultra chip revision 2
      [    9.700020] sky2 0000:02:00.0: irq 45 for MSI/MSI-X
      [    9.700498] ------------[ cut here ]------------
      [    9.703391] kernel BUG at include/linux/netdevice.h:501!
      [    9.703391] invalid opcode: 0000 [#1] PREEMPT SMP
      <snip>
      [    9.830018] Call Trace:
      [    9.830018]  [<fa996169>] sky2_open+0x309/0x360 [sky2]
      [    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
      [    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
      [    9.830018]  [<c135ed4b>] __dev_open+0x9b/0x120
      [    9.830018]  [<c1431cbe>] ? _raw_spin_unlock_bh+0x1e/0x20
      [    9.830018]  [<c135efd9>] __dev_change_flags+0x89/0x150
      [    9.830018]  [<c135f148>] dev_change_flags+0x18/0x50
      [    9.830018]  [<c13bb8e0>] devinet_ioctl+0x5d0/0x6e0
      [    9.830018]  [<c13bcced>] inet_ioctl+0x6d/0xa0
      
      To fix the problem patch changes the order of initialization.
      
      Bug report:
      https://bugzilla.kernel.org/show_bug.cgi?id=67151
      
      Reported-and-tested-by: ebrahim.azarisooreh@gmail.com
      Signed-off-by: default avatarStanislaw Gruszka <stf_xl@wp.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      731073b9
    • Holger Eitzenberger's avatar
      net: Fix memory leak if TPROXY used with TCP early demux · a452ce34
      Holger Eitzenberger authored
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a452ce34
  4. 27 Jan, 2014 10 commits
  5. 26 Jan, 2014 6 commits
  6. 25 Jan, 2014 9 commits
    • Linus Torvalds's avatar
      Merge branch 'ipmi' (ipmi patches from Corey Minyard) · b2e448ec
      Linus Torvalds authored
      Merge ipmi fixes from Corey Minyard:
       "Just some collected fixes for 3.14.  Nothing huge"
      
      * emailed patches from Corey Minyard <minyard@acm.org>:
        ipmi: Cleanup error return
        ipmi: fix timeout calculation when bmc is disconnected
        ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful
        ipmi: remove deprecated IRQF_DISABLED
      b2e448ec
    • Corey Minyard's avatar
      ipmi: Cleanup error return · d02b3709
      Corey Minyard authored
      Return proper errors for a lot of IPMI failure cases.  Also call
      pci_disable_device when IPMI PCI devices are removed.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d02b3709
    • Xie XiuQi's avatar
      ipmi: fix timeout calculation when bmc is disconnected · e21404dc
      Xie XiuQi authored
      Loading ipmi_si module while bmc is disconnected, we found the timeout
      is longer than 5 secs.  Actually it takes about 3 mins and 20
      secs.(HZ=250)
      
      error message as below:
        Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
        Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
        [...]
        Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
        Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
        Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
        Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]
      
      Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
      sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
      usecs from timeout.
      Reported-by: default avatarHu Shiyuan <hushiyuan@huawei.com>
      Signed-off-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e21404dc
    • Xie XiuQi's avatar
      ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful · ccb3368c
      Xie XiuQi authored
      Use USEC_PER_SEC instead of 1000000, that making the later bugfix
      more clearly.
      Signed-off-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ccb3368c
    • Michael Opdenacker's avatar
      ipmi: remove deprecated IRQF_DISABLED · aa5b2bab
      Michael Opdenacker authored
      This patch proposes to remove the use of the IRQF_DISABLED flag
      
      It's a NOOP since 2.6.35 and it will be removed one day.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa5b2bab
    • Linus Torvalds's avatar
      Merge tag 'spi-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 2d2e7d19
      Linus Torvalds authored
      Pull spi updates from Mark Brown:
       "A respun version of the merges for the pull request previously sent
        with a few additional fixes.  The last two merges were fixed up by
        hand since the branches have moved on and currently have the prior
        merge in them.
      
        Quite a busy release for the SPI subsystem, mostly in cleanups big and
        small scattered through the stack rather than anything else:
      
         - New driver for the Broadcom BC63xx HSSPI controller
         - Fix duplicate device registration for ACPI
         - Conversion of s3c64xx to DMAEngine (this pulls in platform and DMA
           changes upon which the transiton depends)
         - Some small optimisations to reduce the amount of time we hold locks
           in the datapath, eliminate some redundant checks and the size of a
           spi_transfer
         - Lots of fixes, cleanups and general enhancements to drivers,
           especially the rspi and Atmel drivers"
      
      * tag 'spi-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (112 commits)
        spi: core: Fix transfer failure when master->transfer_one returns positive value
        spi: Correct set_cs() documentation
        spi: Clarify transfer_one() w.r.t. spi_finalize_current_transfer()
        spi: Spelling s/finised/finished/
        spi: sc18is602: Convert to use bits_per_word_mask
        spi: Remove duplicate code to set default bits_per_word setting
        spi/pxa2xx: fix compilation warning when !CONFIG_PM_SLEEP
        spi: clps711x: Add MODULE_ALIAS to support module auto-loading
        spi: rspi: Add missing clk_disable() calls in error and cleanup paths
        spi: rspi: Spelling s/transmition/transmission/
        spi: rspi: Add support for specifying CPHA/CPOL
        spi/pxa2xx: initialize DMA channels to -1 to prevent inadvertent match
        spi: rspi: Add more QSPI register documentation
        spi: rspi: Add more RSPI register documentation
        spi: rspi: Remove dependency on DMAE for SHMOBILE
        spi/s3c64xx: Correct indentation
        spi: sh: Use spi_sh_clear_bit() instead of open-coded
        spi: bitbang: Grammar s/make to make/to make/
        spi: sh-hspi: Spelling s/recive/receive/
        spi: core: Improve tx/rx_nbits check comments
        ...
      2d2e7d19
    • Linus Torvalds's avatar
      Merge tag 'regulator-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 15333539
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "A respin of the merges in the previous pull request with one extra
        fix.
      
        A quiet release for the regulator API, quite a large number of small
        improvements all over but other than the addition of new drivers for
        the AS3722 and MAX14577 there is nothing of substantial non-local
        impact"
      
      * tag 'regulator-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (47 commits)
        regulator: pfuze100-regulator: Improve dev_info() message
        regulator: pfuze100-regulator: Fix some checkpatch complaints
        regulator: twl: Fix checkpatch issue
        regulator: core: Fix checkpatch issue
        regulator: anatop-regulator: Remove unneeded memset()
        regulator: s5m8767: Update LDO index in s5m8767-regulator.txt
        regulator: as3722: set enable time for SD0/1/6
        regulator: as3722: detect SD0 low-voltage mode
        regulator: tps62360: Fix up a pointer-integer size mismatch warning
        regulator: anatop-regulator: Remove unneeded kstrdup()
        regulator: act8865: Fix build error when !OF
        regulator: act8865: register all regulators regardless of how many are used
        regulator: wm831x-dcdc: Remove unneeded 'err' label
        regulator: anatop-regulator: Add MODULE_ALIAS()
        regulator: act8865: fix incorrect devm_kzalloc for act8865
        regulator: act8865: Remove set_suspend_[en|dis]able implementation
        regulator: act8865: Remove unneeded regulator_unregister() calls
        regulator: s2mps11: Clean up redundant code
        regulator: tps65910: Simplify setting enable_mask for regulators
        regulator: act8865: add device tree binding doc
        ...
      15333539
    • Linus Torvalds's avatar
      Merge tag 'regmap-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · bb1b6490
      Linus Torvalds authored
      Pull regmap updates from Mark Brown:
       "Nothing terribly exciting with regmap this release, mainly a few small
        extensions to allow more devices to be supported:
      
         - Allow the bulk I/O APIs to be used with no-bus regmaps
         - Support interrupt controllers with zero ack base
         - Warning and spelling fixes"
      
      * tag 'regmap-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: fix a couple of typos
        regmap: Allow regmap_bulk_write() to work for "no-bus" regmaps
        regmap: Allow regmap_bulk_read() to work for "no-bus" regmaps
        regmap: irq: Allow using zero value for ack_base
        regmap: Fix 'ret' would return an uninitialized value
      bb1b6490
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · 4ba9920e
      Linus Torvalds authored
      Pull networking updates from David Miller:
      
       1) BPF debugger and asm tool by Daniel Borkmann.
      
       2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
      
       3) Correct reciprocal_divide and update users, from Hannes Frederic
          Sowa and Daniel Borkmann.
      
       4) Currently we only have a "set" operation for the hw timestamp socket
          ioctl, add a "get" operation to match.  From Ben Hutchings.
      
       5) Add better trace events for debugging driver datapath problems, also
          from Ben Hutchings.
      
       6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
          have a small send and a previous packet is already in the qdisc or
          device queue, defer until TX completion or we get more data.
      
       7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
      
       8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
          Borkmann.
      
       9) Share IP header compression code between Bluetooth and IEEE802154
          layers, from Jukka Rissanen.
      
      10) Fix ipv6 router reachability probing, from Jiri Benc.
      
      11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
      
      12) Support tunneling in GRO layer, from Jerry Chu.
      
      13) Allow bonding to be configured fully using netlink, from Scott
          Feldman.
      
      14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
          already get the TCI.  From Atzm Watanabe.
      
      15) New "Heavy Hitter" qdisc, from Terry Lam.
      
      16) Significantly improve the IPSEC support in pktgen, from Fan Du.
      
      17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
          Herbert.
      
      18) Add Proportional Integral Enhanced packet scheduler, from Vijay
          Subramanian.
      
      19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
      
      20) Key TCP metrics blobs also by source address, not just destination
          address.  From Christoph Paasch.
      
      21) Support 10G in generic phylib.  From Andy Fleming.
      
      22) Try to short-circuit GRO flow compares using device provided RX
          hash, if provided.  From Tom Herbert.
      
      The wireless and netfilter folks have been busy little bees too.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
        net/cxgb4: Fix referencing freed adapter
        ipv6: reallocate addrconf router for ipv6 address when lo device up
        fib_frontend: fix possible NULL pointer dereference
        rtnetlink: remove IFLA_BOND_SLAVE definition
        rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
        qlcnic: update version to 5.3.55
        qlcnic: Enhance logic to calculate msix vectors.
        qlcnic: Refactor interrupt coalescing code for all adapters.
        qlcnic: Update poll controller code path
        qlcnic: Interrupt code cleanup
        qlcnic: Enhance Tx timeout debugging.
        qlcnic: Use bool for rx_mac_learn.
        bonding: fix u64 division
        rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
        sfc: Use the correct maximum TX DMA ring size for SFC9100
        Add Shradha Shah as the sfc driver maintainer.
        net/vxlan: Share RX skb de-marking and checksum checks with ovs
        tulip: cleanup by using ARRAY_SIZE()
        ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
        net/cxgb4: Don't retrieve stats during recovery
        ...
      4ba9920e