1. 23 Aug, 2016 1 commit
    • Shmulik Ladkani's avatar
      net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset · c0451fe1
      Shmulik Ladkani authored
      In b8247f09,
      
         "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"
      
      gso skbs arriving from an ingress interface that go through UDP
      tunneling, are allowed to be fragmented if the resulting encapulated
      segments exceed the dst mtu of the egress interface.
      
      This aligned the behavior of gso skbs to non-gso skbs going through udp
      encapsulation path.
      
      However the non-gso vs gso anomaly is present also in the following
      cases of a GRE tunnel:
       - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
         (e.g. OvS vport-gre with df_default=false)
       - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
      
      In both of the above cases, the non-gso skbs get fragmented, whereas the
      gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
      as they don't go through the segment+fragment code path.
      
      Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.
      
      Tunnels that do set IP_DF, will not go to fragmentation of segments.
      This preserves behavior of ip_gre in (the default) pmtudisc mode.
      
      Fixes: b8247f09 ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
      Reported-by: default avatarwenxu <wenxu@ucloud.cn>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Tested-by: default avatarwenxu <wenxu@ucloud.cn>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0451fe1
  2. 22 Aug, 2016 9 commits
  3. 21 Aug, 2016 2 commits
  4. 20 Aug, 2016 5 commits
  5. 19 Aug, 2016 21 commits
  6. 18 Aug, 2016 2 commits
    • Liping Zhang's avatar
      netfilter: cttimeout: fix use after free error when delete netns · b75911b6
      Liping Zhang authored
      In general, when we want to delete a netns, cttimeout_net_exit will
      be called before ipt_unregister_table, i.e. before ctnl_timeout_put.
      
      But after call kfree_rcu in cttimeout_net_exit, we will still decrease
      the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
      and will cause a use after free error.
      
      It is easy to reproduce this problem:
        # while : ; do
        ip netns add xxx
        ip netns exec xxx nfct add timeout testx inet icmp timeout 200
        ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
        ip netns del xxx
        done
      
        =======================================================================
        BUG kmalloc-96 (Tainted: G    B       E  ): Poison overwritten
        -----------------------------------------------------------------------
        INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
        0x6b
        INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        age=104 cpu=0 pid=3330
        ___slab_alloc+0x4da/0x540
        __slab_alloc+0x20/0x40
        __kmalloc+0x1c8/0x240
        cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
        [ ... ]
      
      So only when the refcnt decreased to 0, we call kfree_rcu to free the
      timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
      avoid race between ctnl_timeout_try_del and ctnl_timeout_put.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b75911b6
    • Liping Zhang's avatar
      netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy · 12be15dd
      Liping Zhang authored
      Suppose that we input the following commands at first:
        # nfacct add test
        # iptables -A INPUT -m nfacct --nfacct-name test
      
      And now "test" acct's refcnt is 2, but later when we try to delete the
      "test" nfacct and the related iptables rule at the same time, race maybe
      happen:
            CPU0                                    CPU1
        nfnl_acct_try_del                      nfnl_acct_put
        atomic_dec_and_test //ref=1,testfail          -
             -                                 atomic_dec_and_test //ref=0,testok
             -                                 kfree_rcu
        atomic_inc //ref=1                            -
      
      So after the rcu grace period, nf_acct will be freed but it is still linked
      in the nfnl_acct_list, and we can access it later, then oops will happen.
      
      Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
      operation atomic_cmpxchg here to fix this problem.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      12be15dd