1. 14 Dec, 2015 26 commits
    • Daniel Borkmann's avatar
      net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds · 306c5357
      Daniel Borkmann authored
      [ Upstream commit 6900317f ]
      
      David and HacKurx reported a following/similar size overflow triggered
      in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:
      
      (Already fixed in later grsecurity versions by Brad and PaX Team.)
      
      [ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
                     cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
      [ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
      [ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
      [ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
      [ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
      [ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
      [ 1002.296176] Call Trace:
      [ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
      [ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
      [ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
      [ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
      [ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
      [ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
      [ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
      [ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
      [ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
      [ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
      [ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85
      
      Further investigation showed that this can happen when an *odd* number of
      fds are being passed over AF_UNIX sockets.
      
      In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
      where i is the number of successfully passed fds, differ by 4 bytes due
      to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
      on 64 bit. The padding is used to align subsequent cmsg headers in the
      control buffer.
      
      When the control buffer passed in from the receiver side *lacks* these 4
      bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
      overflow in scm_detach_fds():
      
        int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
        err = put_user(SOL_SOCKET, &cm->cmsg_level);
        if (!err)
          err = put_user(SCM_RIGHTS, &cm->cmsg_type);
        if (!err)
          err = put_user(cmlen, &cm->cmsg_len);
        if (!err) {
          cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
          msg->msg_control += cmlen;
          msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
        }                                            ... wrap-around
      
      F.e. it will wrap to a length of 18446744073709551612 bytes in case the
      receiver passed in msg->msg_controllen of 20 bytes, and the sender
      properly transferred 1 fd to the receiver, so that its CMSG_LEN results
      in 20 bytes and CMSG_SPACE in 24 bytes.
      
      In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
      issue in my tests as alignment seems always on 4 byte boundary. Same
      should be in case of native 32 bit, where we end up with 4 byte boundaries
      as well.
      
      In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
      a single fd would mean that on successful return, msg->msg_controllen is
      being set by the kernel to 24 bytes instead, thus more than the input
      buffer advertised. It could f.e. become an issue if such application later
      on zeroes or copies the control buffer based on the returned msg->msg_controllen
      elsewhere.
      
      Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
      
      Going over the code, it seems like msg->msg_controllen is not being read
      after scm_detach_fds() in scm_recv() anymore by the kernel, good!
      
      Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
      and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
      and ___sys_recvmsg() places the updated length, that is, new msg_control -
      old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
      in the example).
      
      Long time ago, Wei Yongjun fixed something related in commit 1ac70e7a
      ("[NET]: Fix function put_cmsg() which may cause usr application memory
      overflow").
      
      RFC3542, section 20.2. says:
      
        The fields shown as "XX" are possible padding, between the cmsghdr
        structure and the data, and between the data and the next cmsghdr
        structure, if required by the implementation. While sending an
        application may or may not include padding at the end of last
        ancillary data in msg_controllen and implementations must accept both
        as valid. On receiving a portable application must provide space for
        padding at the end of the last ancillary data as implementations may
        copy out the padding at the end of the control message buffer and
        include it in the received msg_controllen. When recvmsg() is called
        if msg_controllen is too small for all the ancillary data items
        including any trailing padding after the last item an implementation
        may set MSG_CTRUNC.
      
      Since we didn't place MSG_CTRUNC for already quite a long time, just do
      the same as in 1ac70e7a to avoid an overflow.
      
      Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
      error in SCM_RIGHTS code sample"). Some people must have copied this (?),
      thus it got triggered in the wild (reported several times during boot by
      David and HacKurx).
      
      No Fixes tag this time as pre 2002 (that is, pre history tree).
      Reported-by: default avatarDavid Sterba <dave@jikos.cz>
      Reported-by: default avatarHacKurx <hackurx@gmail.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      306c5357
    • Eric Dumazet's avatar
      tcp: initialize tp->copied_seq in case of cross SYN connection · 7d76c6f7
      Eric Dumazet authored
      [ Upstream commit 142a2e7e ]
      
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      generated program that triggers the WARNING at
      net/ipv4/tcp.c:1729 in tcp_recvmsg() :
      
      WARN_ON(tp->copied_seq != tp->rcv_nxt &&
              !(flags & (MSG_PEEK | MSG_TRUNC)));
      
      His program is specifically attempting a Cross SYN TCP exchange,
      that we support (for the pleasure of hackers ?), but it looks we
      lack proper tcp->copied_seq initialization.
      
      Thanks again Dmitry for your report and testings.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7d76c6f7
    • Eric Dumazet's avatar
      tcp: fix potential huge kmalloc() calls in TCP_REPAIR · ea5ef9fc
      Eric Dumazet authored
      [ Upstream commit 5d4c9bfb ]
      
      tcp_send_rcvq() is used for re-injecting data into tcp receive queue.
      
      Problems :
      
      - No check against size is performed, allowed user to fool kernel in
        attempting very large memory allocations, eventually triggering
        OOM when memory is fragmented.
      
      - In case of fault during the copy we do not return correct errno.
      
      Lets use alloc_skb_with_frags() to cook optimal skbs.
      
      Fixes: 292e8d8c ("tcp: Move rcvq sending to tcp_input.c")
      Fixes: c0e88ff0 ("tcp: Repair socket queues")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ea5ef9fc
    • Eric Dumazet's avatar
      tcp: md5: fix lockdep annotation · 7af83230
      Eric Dumazet authored
      [ Upstream commit 1b8e6a01 ]
      
      When a passive TCP is created, we eventually call tcp_md5_do_add()
      with sk pointing to the child. It is not owner by the user yet (we
      will add this socket into listener accept queue a bit later anyway)
      
      But we do own the spinlock, so amend the lockdep annotation to avoid
      following splat :
      
      [ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
      [ 8451.090932]
      [ 8451.090932] other info that might help us debug this:
      [ 8451.090932]
      [ 8451.090934]
      [ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
      [ 8451.090936] 3 locks held by socket_sockopt_/214795:
      [ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
      [ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
      [ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
      [ 8451.090958]
      [ 8451.090958] stack backtrace:
      [ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_
      
      [ 8451.091215] Call Trace:
      [ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
      [ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
      [ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
      [ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
      [ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
      [ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
      [ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
      [ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
      [ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
      [ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
      [ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
      [ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
      [ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
      [ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
      [ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90
      
      Fixes: a8afca03 ("tcp: md5: protects md5sig_info with RCU")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7af83230
    • Bjørn Mork's avatar
      net: qmi_wwan: add XS Stick W100-2 from 4G Systems · 922e1fd3
      Bjørn Mork authored
      [ Upstream commit 68242a5a ]
      
      Thomas reports
      "
      4gsystems sells two total different LTE-surfsticks under the same name.
      ..
      The newer version of XS Stick W100 is from "omega"
      ..
      Under windows the driver switches to the same ID, and uses MI03\6 for
      network and MI01\6 for modem.
      ..
      echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
      echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1c9e ProdID=9b01 Rev=02.32
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=
      C:  #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      
      Now all important things are there:
      
      wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at)
      
      There is also ttyUSB0, but it is not usable, at least not for at.
      
      The device works well with qmi and ModemManager-NetworkManager.
      "
      Reported-by: default avatarThomas Schäfer <tschaefer@t-online.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      922e1fd3
    • Neil Horman's avatar
      snmp: Remove duplicate OUTMCAST stat increment · a3fd7012
      Neil Horman authored
      [ Upstream commit 41033f02 ]
      
      the OUTMCAST stat is double incremented, getting bumped once in the mcast code
      itself, and again in the common ip output path.  Remove the mcast bump, as its
      not needed
      
      Validated by the reporter, with good results
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarClaus Jensen <claus.jensen@microsemi.com>
      CC: Claus Jensen <claus.jensen@microsemi.com>
      CC: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a3fd7012
    • Jason A. Donenfeld's avatar
      ip_tunnel: disable preemption when updating per-cpu tstats · 6cc44e69
      Jason A. Donenfeld authored
      [ Upstream commit b4fe85f9 ]
      
      Drivers like vxlan use the recently introduced
      udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
      makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
      packet, updates the struct stats using the usual
      u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
      udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
      tstats, so drivers like vxlan, immediately after, call
      iptunnel_xmit_stats, which does the same thing - calls
      u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).
      
      While vxlan is probably fine (I don't know?), calling a similar function
      from, say, an unbound workqueue, on a fully preemptable kernel causes
      real issues:
      
      [  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
      [  188.435579] caller is debug_smp_processor_id+0x17/0x20
      [  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
      [  188.435607] Call Trace:
      [  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
      [  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
      [  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20
      
      The solution would be to protect the whole
      this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
      disabling preemption and then reenabling it.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6cc44e69
    • lucien's avatar
      sctp: translate host order to network order when setting a hmacid · 84564ff4
      lucien authored
      [ Upstream commit ed5a377d ]
      
      now sctp auth cannot work well when setting a hmacid manually, which
      is caused by that we didn't use the network order for hmacid, so fix
      it by adding the transformation in sctp_auth_ep_set_hmacs.
      
      even we set hmacid with the network order in userspace, it still
      can't work, because of this condition in sctp_auth_ep_set_hmacs():
      
      		if (id > SCTP_AUTH_HMAC_ID_MAX)
      			return -EOPNOTSUPP;
      
      so this wasn't working before and thus it won't break compatibility.
      
      Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      84564ff4
    • Daniel Borkmann's avatar
      packet: fix tpacket_snd max frame len · 29ac1e1e
      Daniel Borkmann authored
      [ Upstream commit 5cfb4c8d ]
      
      Since it's introduction in commit 69e3c75f ("net: TX_RING and
      packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
      side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
      reserve check should have reserve as 0, but currently, this is
      unconditionally set (in it's original form as dev->hard_header_len).
      
      I think this is not correct since tpacket_fill_skb() would then
      take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
      the extra VLAN_HLEN could be possible in both cases. Presumably, the
      reserve code was copied from packet_snd(), but later on missed the
      check. Make it similar as we have it in packet_snd().
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      29ac1e1e
    • Daniel Borkmann's avatar
      packet: infer protocol from ethernet header if unset · e0bc2c24
      Daniel Borkmann authored
      [ Upstream commit c72219b7 ]
      
      In case no struct sockaddr_ll has been passed to packet
      socket's sendmsg() when doing a TX_RING flush run, then
      skb->protocol is set to po->num instead, which is the protocol
      passed via socket(2)/bind(2).
      
      Applications only xmitting can go the path of allocating the
      socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
      TX_RING with sll_protocol of 0. That way, register_prot_hook()
      is neither called on creation nor on bind time, which saves
      cycles when there's no interest in capturing anyway.
      
      That leaves us however with po->num 0 instead and therefore
      the TX_RING flush run sets skb->protocol to 0 as well. Eric
      reported that this leads to problems when using tools like
      trafgen over bonding device. I.e. the bonding's hash function
      could invoke the kernel's flow dissector, which depends on
      skb->protocol being properly set. In the current situation, all
      the traffic is then directed to a single slave.
      
      Fix it up by inferring skb->protocol from the Ethernet header
      when not set and we have ARPHRD_ETHER device type. This is only
      done in case of SOCK_RAW and where we have a dev->hard_header_len
      length. In case of ARPHRD_ETHER devices, this is guaranteed to
      cover ETH_HLEN, and therefore being accessed on the skb after
      the skb_store_bits().
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e0bc2c24
    • Daniel Borkmann's avatar
      packet: only allow extra vlan len on ethernet devices · 9293664a
      Daniel Borkmann authored
      [ Upstream commit 3c70c132 ]
      
      Packet sockets can be used by various net devices and are not
      really restricted to ARPHRD_ETHER device types. However, when
      currently checking for the extra 4 bytes that can be transmitted
      in VLAN case, our assumption is that we generally probe on
      ARPHRD_ETHER devices. Therefore, before looking into Ethernet
      header, check the device type first.
      
      This also fixes the issue where non-ARPHRD_ETHER devices could
      have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
      the check would test unfilled linear part of the skb (instead
      of non-linear).
      
      Fixes: 57f89bfa ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
      Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9293664a
    • Alexander Drozdov's avatar
      packet: tpacket_snd(): fix signed/unsigned comparison · e4f3c003
      Alexander Drozdov authored
      [ Upstream commit dbd46ab4 ]
      
      tpacket_fill_skb() can return a negative value (-errno) which
      is stored in tp_len variable. In that case the following
      condition will be (but shouldn't be) true:
      
      tp_len > dev->mtu + dev->hard_header_len
      
      as dev->mtu and dev->hard_header_len are both unsigned.
      
      That may lead to just returning an incorrect EMSGSIZE errno
      to the user.
      
      Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e4f3c003
    • Daniel Borkmann's avatar
      packet: always probe for transport header · 302c6356
      Daniel Borkmann authored
      [ Upstream commit 8fd6c80d ]
      
      We concluded that the skb_probe_transport_header() should better be
      called unconditionally. Avoiding the call into the flow dissector has
      also not really much to do with the direct xmit mode.
      
      While it seems that only virtio_net code makes use of GSO from non
      RX/TX ring packet socket paths, we should probe for a transport header
      nevertheless before they hit devices.
      
      Reference: http://thread.gmane.org/gmane.linux.network/386173/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      302c6356
    • Daniel Borkmann's avatar
      packet: do skb_probe_transport_header when we actually have data · 22fb967d
      Daniel Borkmann authored
      [ Upstream commit efdfa2f7 ]
      
      In tpacket_fill_skb() commit c1aad275 ("packet: set transport
      header before doing xmit") and later on 40893fd0 ("net: switch
      to use skb_probe_transport_header()") was probing for a transport
      header on the skb from a ring buffer slot, but at a time, where
      the skb has _not even_ been filled with data yet. So that call into
      the flow dissector is pretty useless. Lets do it after we've set
      up the skb frags.
      
      Fixes: c1aad275 ("packet: set transport header before doing xmit")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      22fb967d
    • Kamal Mostafa's avatar
      tools/net: Use include/uapi with __EXPORTED_HEADERS__ · e5d4fd3e
      Kamal Mostafa authored
      [ Upstream commit d7475de5 ]
      
      Use the local uapi headers to keep in sync with "recently" added #define's
      (e.g. SKF_AD_VLAN_TPID).  Refactored CFLAGS, and bpf_asm doesn't need -I.
      
      Fixes: 3f356385 ("filter: bpf_asm: add minimal bpf asm tool")
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e5d4fd3e
    • Sasha Levin's avatar
      Revert "net: Fix skb_set_peeked use-after-free bug" · 3653158c
      Sasha Levin authored
      This reverts commit d9a11334.
      3653158c
    • Marcelo Leitner's avatar
      ipv6: addrconf: validate new MTU before applying it · a9ff3cb5
      Marcelo Leitner authored
      [ Upstream commit 77751427 ]
      
      Currently we don't check if the new MTU is valid or not and this allows
      one to configure a smaller than minimum allowed by RFCs or even bigger
      than interface own MTU, which is a problem as it may lead to packet
      drops.
      
      If you have a daemon like NetworkManager running, this may be exploited
      by remote attackers by forging RA packets with an invalid MTU, possibly
      leading to a DoS. (NetworkManager currently only validates for values
      too small, but not for too big ones.)
      
      The fix is just to make sure the new value is valid. That is, between
      IPV6_MIN_MTU and interface's MTU.
      
      Note that similar check is already performed at
      ndisc_router_discovery(), for when kernel itself parses the RA.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a9ff3cb5
    • Nadav Amit's avatar
      KVM: x86: Use new is_noncanonical_address in _linearize · 5b4df464
      Nadav Amit authored
      [ Upstream commit 4be4de7e ]
      
      Replace the current canonical address check with the new function which is
      identical.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5b4df464
    • Eric Northup's avatar
      KVM: x86: work around infinite loop in microcode when #AC is delivered · 79e62de2
      Eric Northup authored
      [ Upstream commit 54a20552 ]
      
      It was found that a guest can DoS a host by triggering an infinite
      stream of "alignment check" (#AC) exceptions.  This causes the
      microcode to enter an infinite loop where the core never receives
      another interrupt.  The host kernel panics pretty quickly due to the
      effects (CVE-2015-5307).
      Signed-off-by: default avatarEric Northup <digitaleric@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      79e62de2
    • Florian Fainelli's avatar
      ARM: orion: Fix DSA platform device after mvmdio conversion · a62b833d
      Florian Fainelli authored
      [ Upstream commit d836ace6 ]
      
      DSA expects the host_dev pointer to be the device structure associated
      with the MDIO bus controller driver. First commit breaking that was
      c3a07134 ("mv643xx_eth: convert to use the Marvell Orion MDIO
      driver"), and then, it got completely under the radar for a while.
      Reported-by: default avatarFrans van de Wiel <fvdw@fvdw.eu>
      Fixes: c3a07134 ("mv643xx_eth: convert to use the Marvell Orion MDIO driver")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a62b833d
    • Sasha Levin's avatar
      RDS: verify the underlying transport exists before creating a connection · cf6580ef
      Sasha Levin authored
      [ Upstream commit 74e98eb0 ]
      
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cf6580ef
    • Francesco Ruggeri's avatar
      packet: race condition in packet_bind · f236cda1
      Francesco Ruggeri authored
      [ Upstream commit 30f7ea1c ]
      
      There is a race conditions between packet_notifier and packet_bind{_spkt}.
      
      It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
      time packet_bind{_spkt} takes a reference on the new netdevice and the
      time packet_do_bind sets po->ifindex.
      In this case the notification can be missed.
      If this happens during a dev_change_net_namespace this can result in the
      netdevice to be moved to the new namespace while the packet_sock in the
      old namespace still holds a reference on it. When the netdevice is later
      deleted in the new namespace the deletion hangs since the packet_sock
      is not found in the new namespace' &net->packet.sklist.
      It can be reproduced with the script below.
      
      This patch makes packet_do_bind check again for the presence of the
      netdevice in the packet_sock's namespace after the synchronize_net
      in unregister_prot_hook.
      More in general it also uses the rcu lock for the duration of the bind
      to stop dev_change_net_namespace/rollback_registered_many from
      going past the synchronize_net following unlist_netdevice, so that
      no NETDEV_UNREGISTER notifications can happen on the new netdevice
      while the bind is executing. In order to do this some code from
      packet_bind{_spkt} is consolidated into packet_do_dev.
      
      import socket, os, time, sys
      proto=7
      realDev='em1'
      vlanId=400
      if len(sys.argv) > 1:
         vlanId=int(sys.argv[1])
      dev='vlan%d' % vlanId
      
      os.system('taskset -p 0x10 %d' % os.getpid())
      
      s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
      os.system('ip link add link %s name %s type vlan id %d' %
                (realDev, dev, vlanId))
      os.system('ip netns add dummy')
      
      pid=os.fork()
      
      if pid == 0:
         # dev should be moved while packet_do_bind is in synchronize net
         os.system('taskset -p 0x20000 %d' % os.getpid())
         os.system('ip link set %s netns dummy' % dev)
         os.system('ip netns exec dummy ip link del %s' % dev)
         s.close()
         sys.exit(0)
      
      time.sleep(.004)
      try:
         s.bind(('%s' % dev, proto+1))
      except:
         print 'Could not bind socket'
         s.close()
         os.system('ip netns del dummy')
         sys.exit(0)
      
      os.waitpid(pid, 0)
      s.close()
      os.system('ip netns del dummy')
      sys.exit(0)
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f236cda1
    • WANG Cong's avatar
      ipv4: disable BH when changing ip local port range · 59fed94a
      WANG Cong authored
      [ Upstream commit 4ee3bd4a ]
      
      This fixes the following lockdep warning:
      
       [ INFO: inconsistent lock state ]
       4.3.0-rc7+ #1197 Not tainted
       ---------------------------------
       inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
       sysctl/1019 [HC0[0]:SC0[0]:HE1:SE1] takes:
        (&(&net->ipv4.ip_local_ports.lock)->seqcount){+.+-..}, at: [<ffffffff81921de7>] ipv4_local_port_range+0xb4/0x12a
       {IN-SOFTIRQ-R} state was registered at:
         [<ffffffff810bd682>] __lock_acquire+0x2f6/0xdf0
         [<ffffffff810be6d5>] lock_acquire+0x11c/0x1a4
         [<ffffffff818e599c>] inet_get_local_port_range+0x4e/0xae
         [<ffffffff8166e8e3>] udp_flow_src_port.constprop.40+0x23/0x116
         [<ffffffff81671cb9>] vxlan_xmit_one+0x219/0xa6a
         [<ffffffff81672f75>] vxlan_xmit+0xa6b/0xaa5
         [<ffffffff817f2deb>] dev_hard_start_xmit+0x2ae/0x465
         [<ffffffff817f35ed>] __dev_queue_xmit+0x531/0x633
         [<ffffffff817f3702>] dev_queue_xmit_sk+0x13/0x15
         [<ffffffff818004a5>] neigh_resolve_output+0x12f/0x14d
         [<ffffffff81959cfa>] ip6_finish_output2+0x344/0x39f
         [<ffffffff8195bf58>] ip6_finish_output+0x88/0x8e
         [<ffffffff8195bfef>] ip6_output+0x91/0xe5
         [<ffffffff819792ae>] dst_output_sk+0x47/0x4c
         [<ffffffff81979392>] NF_HOOK_THRESH.constprop.30+0x38/0x82
         [<ffffffff8197981e>] mld_sendpack+0x189/0x266
         [<ffffffff8197b28b>] mld_ifc_timer_expire+0x1ef/0x223
         [<ffffffff810de581>] call_timer_fn+0xfb/0x28c
         [<ffffffff810ded1e>] run_timer_softirq+0x1c7/0x1f1
      
      Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      59fed94a
    • Sabrina Dubroca's avatar
      ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev · 8d135fed
      Sabrina Dubroca authored
      [ Upstream commit 2a189f9e ]
      
      In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
      the dev_snmp6 entry that we have already registered for this device.
      Call snmp6_unregister_dev in this case.
      
      Fixes: a317a2f1 ("ipv6: fail early when creating netdev named all or default")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8d135fed
    • Martin Habets's avatar
      sfc: push partner queue for skb->xmit_more · 725242ca
      Martin Habets authored
      [ Upstream commit b2663a4f ]
      
      When the IP stack passes SKBs the sfc driver puts them in 2 different TX
      queues (called partners), one for checksummed and one for not checksummed.
      If the SKB has xmit_more set the driver will delay pushing the work to the
      NIC.
      
      When later it does decide to push the buffers this patch ensures it also
      pushes the partner queue, if that also has any delayed work. Before this
      fix the work in the partner queue would be left for a long time and cause
      a netdev watchdog.
      
      Fixes: 70b33fb0 ("sfc: add support for skb->xmit_more")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarMartin Habets <mhabets@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      725242ca
    • Eric Dumazet's avatar
      sit: fix sit0 percpu double allocations · 624fe175
      Eric Dumazet authored
      [ Upstream commit 4ece9009 ]
      
      sit0 device allocates its percpu storage twice :
      - One time in ipip6_tunnel_init()
      - One time in ipip6_fb_tunnel_init()
      
      Thus we leak 48 bytes per possible cpu per network namespace dismantle.
      
      ipip6_fb_tunnel_init() can be much simpler and does not
      return an error, and should be called after register_netdev()
      
      Note that ipip6_tunnel_clone_6rd() also needs to be called
      after register_netdev() (calling ipip6_tunnel_init())
      
      Fixes: ebe084aa ("sit: Use ipip6_tunnel_init as the ndo_init function.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      624fe175
  2. 03 Dec, 2015 14 commits