1. 26 Aug, 2014 1 commit
  2. 19 Aug, 2014 39 commits
    • Kirill Tkhai's avatar
      sparc64: Make itc_sync_lock raw · 47b75261
      Kirill Tkhai authored
      [ Upstream commit 49b6c01f ]
      
      One more place where we must not be able
      to be preempted or to be interrupted in RT.
      
      Always actually disable interrupts during
      synchronization cycle.
      Signed-off-by: default avatarKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      47b75261
    • David S. Miller's avatar
      sparc64: Fix argument sign extension for compat_sys_futex(). · b87c3db2
      David S. Miller authored
      [ Upstream commit aa3449ee ]
      
      Only the second argument, 'op', is signed.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b87c3db2
    • Eric Dumazet's avatar
      sctp: fix possible seqlock seadlock in sctp_packet_transmit() · 76ae4fcb
      Eric Dumazet authored
      [ Upstream commit 757efd32 ]
      
      Dave reported following splat, caused by improper use of
      IP_INC_STATS_BH() in process context.
      
      BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
      caller is __this_cpu_preempt_check+0x13/0x20
      CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
       ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
       0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
       ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
      Call Trace:
       [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
       [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
       [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
       [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
       [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
       [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
       [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
       [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
       [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
       [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
       [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
       [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
       [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
       [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
       [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
       [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
       [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
       [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
       [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
       [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
       [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
       [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
       [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
       [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2
      
      This is a followup of commits f1d8cba6 ("inet: fix possible
      seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock
      deadlock in ip6_finish_output2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      76ae4fcb
    • Sasha Levin's avatar
      iovec: make sure the caller actually wants anything in memcpy_fromiovecend · cb3770c1
      Sasha Levin authored
      [ Upstream commit 06ebb06d ]
      
      Check for cases when the caller requests 0 bytes instead of running off
      and dereferencing potentially invalid iovecs.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cb3770c1
    • Vlad Yasevich's avatar
      net: Correctly set segment mac_len in skb_segment(). · 4b29e9c0
      Vlad Yasevich authored
      [ Upstream commit fcdfe3a7 ]
      
      When performing segmentation, the mac_len value is copied right
      out of the original skb.  However, this value is not always set correctly
      (like when the packet is VLAN-tagged) and we'll end up copying a bad
      value.
      
      One way to demonstrate this is to configure a VM which tags
      packets internally and turn off VLAN acceleration on the forwarding
      bridge port.  The packets show up corrupt like this:
      16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
      (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
              0x0000:  8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
              0x0010:  0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
              0x0020:  29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
              0x0030:  000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
              0x0040:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              0x0050:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              0x0060:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              ...
      
      This also leads to awful throughput as GSO packets are dropped and
      cause retransmissions.
      
      The solution is to set the mac_len using the values already available
      in then new skb.  We've already adjusted all of the header offset, so we
      might as well correctly figure out the mac_len using skb_reset_mac_len().
      After this change, packets are segmented correctly and performance
      is restored.
      
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b29e9c0
    • Vlad Yasevich's avatar
      macvlan: Initialize vlan_features to turn on offload support. · 6a25e8f7
      Vlad Yasevich authored
      [ Upstream commit 081e83a7 ]
      
      Macvlan devices do not initialize vlan_features.  As a result,
      any vlan devices configured on top of macvlans perform very poorly.
      Initialize vlan_features based on the vlan features of the lower-level
      device.
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a25e8f7
    • Daniel Borkmann's avatar
      net: sctp: inherit auth_capable on INIT collisions · 4a07c786
      Daniel Borkmann authored
      [ Upstream commit 1be9a950 ]
      
      Jason reported an oops caused by SCTP on his ARM machine with
      SCTP authentication enabled:
      
      Internal error: Oops: 17 [#1] ARM
      CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
      task: c6eefa40 ti: c6f52000 task.ti: c6f52000
      PC is at sctp_auth_calculate_hmac+0xc4/0x10c
      LR is at sg_init_table+0x20/0x38
      pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
      sp : c6f538e8  ip : 00000000  fp : c6f53924
      r10: c6f50d80  r9 : 00000000  r8 : 00010000
      r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
      r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
      Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005397f  Table: 06f28000  DAC: 00000015
      Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
      Stack: (0xc6f538e8 to 0xc6f54000)
      [...]
      Backtrace:
      [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
      [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
      [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
      [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
      [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
      [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
      [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
      [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)
      
      While we already had various kind of bugs in that area
      ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
      we/peer is AUTH capable") and b14878cc ("net: sctp: cache
      auth_enable per endpoint"), this one is a bit of a different
      kind.
      
      Giving a bit more background on why SCTP authentication is
      needed can be found in RFC4895:
      
        SCTP uses 32-bit verification tags to protect itself against
        blind attackers. These values are not changed during the
        lifetime of an SCTP association.
      
        Looking at new SCTP extensions, there is the need to have a
        method of proving that an SCTP chunk(s) was really sent by
        the original peer that started the association and not by a
        malicious attacker.
      
      To cause this bug, we're triggering an INIT collision between
      peers; normal SCTP handshake where both sides intent to
      authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
      parameters that are being negotiated among peers:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
      
      RFC4895 says that each endpoint therefore knows its own random
      number and the peer's random number *after* the association
      has been established. The local and peer's random number along
      with the shared key are then part of the secret used for
      calculating the HMAC in the AUTH chunk.
      
      Now, in our scenario, we have 2 threads with 1 non-blocking
      SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
      and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
      sctp_bindx(3), listen(2) and connect(2) against each other,
      thus the handshake looks similar to this, e.g.:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
        -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
        ...
      
      Since such collisions can also happen with verification tags,
      the RFC4895 for AUTH rather vaguely says under section 6.1:
      
        In case of INIT collision, the rules governing the handling
        of this Random Number follow the same pattern as those for
        the Verification Tag, as explained in Section 5.2.4 of
        RFC 2960 [5]. Therefore, each endpoint knows its own Random
        Number and the peer's Random Number after the association
        has been established.
      
      In RFC2960, section 5.2.4, we're eventually hitting Action B:
      
        B) In this case, both sides may be attempting to start an
           association at about the same time but the peer endpoint
           started its INIT after responding to the local endpoint's
           INIT. Thus it may have picked a new Verification Tag not
           being aware of the previous Tag it had sent this endpoint.
           The endpoint should stay in or enter the ESTABLISHED
           state but it MUST update its peer's Verification Tag from
           the State Cookie, stop any init or cookie timers that may
           running and send a COOKIE ACK.
      
      In other words, the handling of the Random parameter is the
      same as behavior for the Verification Tag as described in
      Action B of section 5.2.4.
      
      Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
      case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
      side effect interpreter, and in fact it properly copies over
      peer_{random, hmacs, chunks} parameters from the newly created
      association to update the existing one.
      
      Also, the old asoc_shared_key is being released and based on
      the new params, sctp_auth_asoc_init_active_key() updated.
      However, the issue observed in this case is that the previous
      asoc->peer.auth_capable was 0, and has *not* been updated, so
      that instead of creating a new secret, we're doing an early
      return from the function sctp_auth_asoc_init_active_key()
      leaving asoc->asoc_shared_key as NULL. However, we now have to
      authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).
      
      That in fact causes the server side when responding with ...
      
        <------------------ AUTH; COOKIE-ACK -----------------
      
      ... to trigger a NULL pointer dereference, since in
      sctp_packet_transmit(), it discovers that an AUTH chunk is
      being queued for xmit, and thus it calls sctp_auth_calculate_hmac().
      
      Since the asoc->active_key_id is still inherited from the
      endpoint, and the same as encoded into the chunk, it uses
      asoc->asoc_shared_key, which is still NULL, as an asoc_key
      and dereferences it in ...
      
        crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)
      
      ... causing an oops. All this happens because sctp_make_cookie_ack()
      called with the *new* association has the peer.auth_capable=1
      and therefore marks the chunk with auth=1 after checking
      sctp_auth_send_cid(), but it is *actually* sent later on over
      the then *updated* association's transport that didn't initialize
      its shared key due to peer.auth_capable=0. Since control chunks
      in that case are not sent by the temporary association which
      are scheduled for deletion, they are issued for xmit via
      SCTP_CMD_REPLY in the interpreter with the context of the
      *updated* association. peer.auth_capable was 0 in the updated
      association (which went from COOKIE_WAIT into ESTABLISHED state),
      since all previous processing that performed sctp_process_init()
      was being done on temporary associations, that we eventually
      throw away each time.
      
      The correct fix is to update to the new peer.auth_capable
      value as well in the collision case via sctp_assoc_update(),
      so that in case the collision migrated from 0 -> 1,
      sctp_auth_asoc_init_active_key() can properly recalculate
      the secret. This therefore fixes the observed server panic.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Reported-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Tested-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4a07c786
    • Christoph Paasch's avatar
      tcp: Fix integer-overflow in TCP vegas · 3ef99737
      Christoph Paasch authored
      [ Upstream commit 1f74e613 ]
      
      In vegas we do a multiplication of the cwnd and the rtt. This
      may overflow and thus their result is stored in a u64. However, we first
      need to cast the cwnd so that actually 64-bit arithmetic is done.
      
      Then, we need to do do_div to allow this to be used on 32-bit arches.
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Doug Leith <doug.leith@nuim.ie>
      Fixes: 8d3a564d (tcp: tcp_vegas cong avoid fix)
      Signed-off-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3ef99737
    • Christoph Paasch's avatar
      tcp: Fix integer-overflows in TCP veno · 1ba9d8a3
      Christoph Paasch authored
      [ Upstream commit 45a07695 ]
      
      In veno we do a multiplication of the cwnd and the rtt. This
      may overflow and thus their result is stored in a u64. However, we first
      need to cast the cwnd so that actually 64-bit arithmetic is done.
      
      A first attempt at fixing 76f10177 ([TCP]: TCP Veno congestion
      control) was made by 15913114 (tcp: Overflow bug in Vegas), but it
      failed to add the required cast in tcp_veno_cong_avoid().
      
      Fixes: 76f10177 ([TCP]: TCP Veno congestion control)
      Signed-off-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1ba9d8a3
    • Andrey Ryabinin's avatar
      net: sendmsg: fix NULL pointer dereference · 2cf6a6ae
      Andrey Ryabinin authored
      [ Upstream commit 40eea803 ]
      
      Sasha's report:
      	> While fuzzing with trinity inside a KVM tools guest running the latest -next
      	> kernel with the KASAN patchset, I've stumbled on the following spew:
      	>
      	> [ 4448.949424] ==================================================================
      	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
      	> [ 4448.952988] Read of size 2 by thread T19638:
      	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
      	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
      	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
      	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
      	> [ 4448.961266] Call Trace:
      	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
      	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
      	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
      	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
      	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
      	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
      	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
      	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
      	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
      	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
      	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
      	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
      	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
      	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
      	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
      	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
      	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
      	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
      	> [ 4448.988929] ==================================================================
      
      This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
      
      After this report there was no usual "Unable to handle kernel NULL pointer dereference"
      and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
      
      This bug was introduced in f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic).
      Commit message states that:
      	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      	 affect sendto as it would bail out earlier while trying to copy-in the
      	 address."
      But in fact this affects sendto when address 0 is mapped and contains
      socket address structure in it. In such case copy-in address will succeed,
      verify_iovec() function will successfully exit with msg->msg_namelen > 0
      and msg->msg_name == NULL.
      
      This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2cf6a6ae
    • Eric Dumazet's avatar
      ip: make IP identifiers less predictable · 2b03bce8
      Eric Dumazet authored
      [ Upstream commit 04ca6973 ]
      
      In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
      Jedidiah describe ways exploiting linux IP identifier generation to
      infer whether two machines are exchanging packets.
      
      With commit 73f156a6 ("inetpeer: get rid of ip_id_count"), we
      changed IP id generation, but this does not really prevent this
      side-channel technique.
      
      This patch adds a random amount of perturbation so that IP identifiers
      for a given destination [1] are no longer monotonically increasing after
      an idle period.
      
      Note that prandom_u32_max(1) returns 0, so if generator is used at most
      once per jiffy, this patch inserts no hole in the ID suite and do not
      increase collision probability.
      
      This is jiffies based, so in the worst case (HZ=1000), the id can
      rollover after ~65 seconds of idle time, which should be fine.
      
      We also change the hash used in __ip_select_ident() to not only hash
      on daddr, but also saddr and protocol, so that ICMP probes can not be
      used to infer information for other protocols.
      
      For IPv6, adds saddr into the hash as well, but not nexthdr.
      
      If I ping the patched target, we can see ID are now hard to predict.
      
      21:57:11.008086 IP (...)
          A > target: ICMP echo request, seq 1, length 64
      21:57:11.010752 IP (... id 2081 ...)
          target > A: ICMP echo reply, seq 1, length 64
      
      21:57:12.013133 IP (...)
          A > target: ICMP echo request, seq 2, length 64
      21:57:12.015737 IP (... id 3039 ...)
          target > A: ICMP echo reply, seq 2, length 64
      
      21:57:13.016580 IP (...)
          A > target: ICMP echo request, seq 3, length 64
      21:57:13.019251 IP (... id 3437 ...)
          target > A: ICMP echo reply, seq 3, length 64
      
      [1] TCP sessions uses a per flow ID generator not changed by this patch.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJeffrey Knockel <jeffk@cs.unm.edu>
      Reported-by: default avatarJedidiah R. Crandall <crandall@cs.unm.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hannes Frederic Sowa <hannes@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2b03bce8
    • Eric Dumazet's avatar
      inetpeer: get rid of ip_id_count · 16cc7c2f
      Eric Dumazet authored
      [ Upstream commit 73f156a6 ]
      
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      16cc7c2f
    • Dmitry Kravkov's avatar
      bnx2x: fix crash during TSO tunneling · 9ac1f1f2
      Dmitry Kravkov authored
      [ Upstream commit fe26566d ]
      
      When TSO packet is transmitted additional BD w/o mapping is used
      to describe the packed. The BD needs special handling in tx
      completion.
      
      kernel: Call Trace:
      kernel: <IRQ>  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
      kernel: [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
      kernel: [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
      kernel: [<ffffffff814a8c0d>] ? find_iova+0x4d/0x90
      kernel: [<ffffffff814ab0e2>] intel_unmap_page.part.36+0x142/0x160
      kernel: [<ffffffff814ad0e6>] intel_unmap_page+0x26/0x30
      kernel: [<ffffffffa01f55d7>] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x]
      kernel: [<ffffffffa01f8dac>] bnx2x_tx_int+0xac/0x220 [bnx2x]
      kernel: [<ffffffff8101a0d9>] ? read_tsc+0x9/0x20
      kernel: [<ffffffffa01f8fdb>] bnx2x_poll+0xbb/0x3c0 [bnx2x]
      kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
      kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
      kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
      kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
      kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
      kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
      kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
      kernel: <EOI>  [<ffffffff810bbff7>] ? clockevents_notify+0x127/0x140
      kernel: [<ffffffff814834df>] ? cpuidle_enter_state+0x4f/0xc0
      kernel: [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
      kernel: [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
      kernel: [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
      kernel: [<ffffffff815cfee1>] start_secondary+0x265/0x27b
      kernel: ---[ end trace 11aa7726f18d7e80 ]---
      
      Fixes: a848ade4 ("bnx2x: add CSUM and TSO support for encapsulation protocols")
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarDmitry Kravkov <Dmitry.Kravkov@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9ac1f1f2
    • Boris Ostrovsky's avatar
      x86/espfix/xen: Fix allocation of pages for paravirt page tables · 9377a0c1
      Boris Ostrovsky authored
      commit 8762e509 upstream.
      
      init_espfix_ap() is currently off by one level when informing hypervisor
      that allocated pages will be used for ministacks' page tables.
      
      The most immediate effect of this on a PV guest is that if
      'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
      will refuse to use it for a page table (which it shouldn't be anyway). This will
      result in warnings by both Xen and Linux.
      
      More importantly, a subsequent write to that page (again, by a PV guest) is
      likely to result in fatal page fault.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.comReviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9377a0c1
    • Minfei Huang's avatar
      lib/btree.c: fix leak of whole btree nodes · 834ed96b
      Minfei Huang authored
      commit c75b53af upstream.
      
      I use btree from 3.14-rc2 in my own module.  When the btree module is
      removed, a warning arises:
      
       kmem_cache_destroy btree_node: Slab cache still has objects
       CPU: 13 PID: 9150 Comm: rmmod Tainted: GF          O 3.14.0-rc2 #1
       Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013
       Call Trace:
         dump_stack+0x49/0x5d
         kmem_cache_destroy+0xcf/0xe0
         btree_module_exit+0x10/0x12 [btree]
         SyS_delete_module+0x198/0x1f0
         system_call_fastpath+0x16/0x1b
      
      The cause is that it doesn't release the last btree node, when height = 1
      and fill = 1.
      
      [akpm@linux-foundation.org: remove unneeded test of NULL]
      Signed-off-by: default avatarMinfei Huang <huangminfei@ucloud.cn>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      834ed96b
    • Sasha Levin's avatar
      net/l2tp: don't fall back on UDP [get|set]sockopt · c6f5709b
      Sasha Levin authored
      commit 3cf521f7 upstream.
      
      The l2tp [get|set]sockopt() code has fallen back to the UDP functions
      for socket option levels != SOL_PPPOL2TP since day one, but that has
      never actually worked, since the l2tp socket isn't an inet socket.
      
      As David Miller points out:
      
        "If we wanted this to work, it'd have to look up the tunnel and then
         use tunnel->sk, but I wonder how useful that would be"
      
      Since this can never have worked so nobody could possibly have depended
      on that functionality, just remove the broken code and return -EINVAL.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: Phil Turnbull <phil.turnbull@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c6f5709b
    • Max Filippov's avatar
      xtensa: add fixup for double exception raised in window overflow · e0a37035
      Max Filippov authored
      commit 17290231 upstream.
      
      There are two FIXMEs in the double exception handler 'for the extremely
      unlikely case'. This case gets hit by gcc during kernel build once in
      a few hours, resulting in an unrecoverable exception condition.
      
      Provide missing fixup routine to handle this case. Double exception
      literals now need 8 more bytes, add them to the linker script.
      
      Also replace bbsi instructions with bbsi.l as we're branching depending
      on 8th and 7th LSB-based bits of exception address.
      
      This may be tested by adding the explicit DTLB invalidation to window
      overflow handlers, like the following:
      
      #    --- a/arch/xtensa/kernel/vectors.S
      #    +++ b/arch/xtensa/kernel/vectors.S
      #    @@ -592,6 +592,14 @@ ENDPROC(_WindowUnderflow4)
      #     ENTRY_ALIGN64(_WindowOverflow8)
      #
      #    	s32e	a0, a9, -16
      #    +	bbsi.l	a9, 31, 1f
      #    +	rsr	a0, ccount
      #    +	bbsi.l	a0, 4, 1f
      #    +	pdtlb	a0, a9
      #    +	idtlb	a0
      #    +	movi	a0, 9
      #    +	idtlb	a0
      #    +1:
      #    	l32e    a0, a1, -12
      #    	s32e    a2, a9,  -8
      #    	s32e    a1, a9, -12
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e0a37035
    • Johannes Berg's avatar
      Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" · 3516cca6
      Johannes Berg authored
      commit 08b99399 upstream.
      
      This reverts commit 277d916f as it was
      at least breaking iwlwifi by setting the IEEE80211_TX_CTL_NO_PS_BUFFER
      flag in all kinds of interface modes, not only for AP mode where it is
      appropriate.
      
      To avoid reintroducing the original problem, explicitly check for probe
      request frames in the multicast buffering code.
      
      Fixes: 277d916f ("mac80211: move "bufferable MMPDU" check to fix AP mode scan")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      
      3516cca6
    • Malcolm Priestley's avatar
      staging: vt6655: Fix Warning on boot handle_irq_event_percpu. · 6e1af056
      Malcolm Priestley authored
      commit 6cff1f6a upstream.
      
      WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0()
      irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts
      
      Using spin_lock_irqsave appears to fix this.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      
      6e1af056
    • Andy Lutomirski's avatar
      x86_64/entry/xen: Do not invoke espfix64 on Xen · ad878a9e
      Andy Lutomirski authored
      commit 7209a75d upstream.
      
      This moves the espfix64 logic into native_iret.  To make this work,
      it gets rid of the native patch for INTERRUPT_RETURN:
      INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
      
      This changes the 16-bit SS behavior on Xen from OOPSing to leaking
      some bits of the Xen hypervisor's RSP (I think).
      
      [ hpa: this is a nonzero cost on native, but probably not enough to
        measure. Xen needs to fix this in their own code, probably doing
        something equivalent to espfix64. ]
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ad878a9e
    • H. Peter Anvin's avatar
      x86, espfix: Make it possible to disable 16-bit support · da629b7d
      H. Peter Anvin authored
      commit 34273f41 upstream.
      
      Embedded systems, which may be very memory-size-sensitive, are
      extremely unlikely to ever encounter any 16-bit software, so make it
      a CONFIG_EXPERT option to turn off support for any 16-bit software
      whatsoever.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      da629b7d
    • H. Peter Anvin's avatar
      x86, espfix: Make espfix64 a Kconfig option, fix UML · 9217746a
      H. Peter Anvin authored
      commit 197725de upstream.
      
      Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
      build which had broken due to the non-existence of init_espfix_bsp()
      in UML: since UML uses its own Kconfig, this option does not appear in
      the UML build.
      
      This also makes it possible to make support for 16-bit segments a
      configuration option, for the people who want to minimize the size of
      the kernel.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Richard Weinberger <richard@nod.at>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9217746a
    • H. Peter Anvin's avatar
      x86, espfix: Fix broken header guard · 134b7223
      H. Peter Anvin authored
      commit 20b68535 upstream.
      
      Header guard is #ifndef, not #ifdef...
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      134b7223
    • H. Peter Anvin's avatar
      x86, espfix: Move espfix definitions into a separate header file · 24ebf77f
      H. Peter Anvin authored
      commit e1fe9ed8 upstream.
      
      Sparse warns that the percpu variables aren't declared before they are
      defined.  Rather than hacking around it, move espfix definitions into
      a proper header file.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      24ebf77f
    • H. Peter Anvin's avatar
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 7e329475
      H. Peter Anvin authored
      commit 3891a04a upstream.
      
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7e329475
    • H. Peter Anvin's avatar
      Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" · a7b4794c
      H. Peter Anvin authored
      commit 7ed6fb9b upstream.
      
      This reverts commit fa81511b in
      preparation of merging in the proper fix (espfix64).
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a7b4794c
    • Jan Kara's avatar
      timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks · 62c54cb1
      Jan Kara authored
      commit 504d5874 upstream.
      
      clockevents_increase_min_delta() calls printk() from under
      hrtimer_bases.lock. That causes lock inversion on scheduler locks because
      printk() can call into the scheduler. Lockdep puts it as:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc8-06195-g939f04be #2 Not tainted
      -------------------------------------------------------
      trinity-main/74 is trying to acquire lock:
       (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
      
      but task is already holding lock:
       (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (hrtimer_bases.lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
             [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
             [<81080792>] task_clock_event_start+0x3a/0x3f
             [<810807a4>] task_clock_event_add+0xd/0x14
             [<8108259a>] event_sched_in+0xb6/0x17a
             [<810826a2>] group_sched_in+0x44/0x122
             [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
             [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
             [<81082bf6>] __perf_install_in_context+0x8b/0xa3
             [<8107eb8e>] remote_function+0x12/0x2a
             [<8105f5af>] smp_call_function_single+0x2d/0x53
             [<8107e17d>] task_function_call+0x30/0x36
             [<8107fb82>] perf_install_in_context+0x87/0xbb
             [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
             [<810856f9>] SyS_perf_event_open+0x17/0x19
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #4 (&ctx->lock){......}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      -> #3 (&rq->lock){-.-.-.}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81040873>] __task_rq_lock+0x33/0x3a
             [<8104184c>] wake_up_new_task+0x25/0xc2
             [<8102474b>] do_fork+0x15c/0x2a0
             [<810248a9>] kernel_thread+0x1a/0x1f
             [<814232a2>] rest_init+0x1a/0x10e
             [<817af949>] start_kernel+0x303/0x308
             [<817af2ab>] i386_start_kernel+0x79/0x7d
      
      -> #2 (&p->pi_lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<810413dd>] try_to_wake_up+0x1d/0xd6
             [<810414cd>] default_wake_function+0xb/0xd
             [<810461f3>] __wake_up_common+0x39/0x59
             [<81046346>] __wake_up+0x29/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #1 (&tty->write_wait){-.....}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<81046332>] __wake_up+0x15/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #0 (&port_lock_key){-.....}:
             [<8104a62d>] __lock_acquire+0x9ea/0xc6d
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<811c60be>] serial8250_console_write+0x8c/0x10c
             [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
             [<8104f5d5>] console_unlock+0x1d7/0x398
             [<8104fb70>] vprintk_emit+0x3da/0x3e4
             [<81425f76>] printk+0x17/0x19
             [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
             [<8105c548>] clockevents_program_event+0xe7/0xf3
             [<8105cc1c>] tick_program_event+0x1e/0x23
             [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
             [<8103c49e>] __remove_hrtimer+0x5b/0x79
             [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
             [<8103cb4b>] hrtimer_cancel+0xd/0x18
             [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
             [<81080705>] task_clock_event_stop+0x20/0x64
             [<81080756>] task_clock_event_del+0xd/0xf
             [<81081350>] event_sched_out+0xab/0x11e
             [<810813e0>] group_sched_out+0x1d/0x66
             [<81081682>] ctx_sched_out+0xaf/0xbf
             [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &port_lock_key --> &ctx->lock --> hrtimer_bases.lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(hrtimer_bases.lock);
                                     lock(&ctx->lock);
                                     lock(hrtimer_bases.lock);
        lock(&port_lock_key);
      
       *** DEADLOCK ***
      
      4 locks held by trinity-main/74:
       #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
       #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
       #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
      
      stack backtrace:
      CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2
       00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
       8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
       8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
      Call Trace:
       [<81426f69>] dump_stack+0x16/0x18
       [<81425a99>] print_circular_bug+0x18f/0x19c
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104af87>] ? lock_release+0x191/0x223
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8104416d>] ? __dequeue_entity+0x23/0x27
       [<81044505>] ? pick_next_task_fair+0xb1/0x120
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
       [<810475b0>] ? trace_hardirqs_off+0xb/0xd
       [<81056346>] ? rcu_irq_exit+0x64/0x77
      
      Fix the problem by using printk_deferred() which does not call into the
      scheduler.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      62c54cb1
    • John Stultz's avatar
      printk: rename printk_sched to printk_deferred · d6a1cfb5
      John Stultz authored
      commit aac74dc4 upstream.
      
      After learning we'll need some sort of deferred printk functionality in
      the timekeeping core, Peter suggested we rename the printk_sched function
      so it can be reused by needed subsystems.
      
      This only changes the function name. No logic changes.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d6a1cfb5
    • Anssi Hannula's avatar
      dm cache: fix race affecting dirty block count · 49cbb95e
      Anssi Hannula authored
      commit 44fa816b upstream.
      
      nr_dirty is updated without locking, causing it to drift so that it is
      non-zero (either a small positive integer, or a very large one when an
      underflow occurs) even when there are no actual dirty blocks.  This was
      due to a race between the workqueue and map function accessing nr_dirty
      in parallel without proper protection.
      
      People were seeing under runs due to a race on increment/decrement of
      nr_dirty, see: https://lkml.org/lkml/2014/6/3/648
      
      Fix this by using an atomic_t for nr_dirty.
      
      Reported-by: roma1390@gmail.com
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      49cbb95e
    • Greg Thelen's avatar
      dm bufio: fully initialize shrinker · 9ee4fa03
      Greg Thelen authored
      commit d8c712ea upstream.
      
      1d3d4437 ("vmscan: per-node deferred work") added a flags field to
      struct shrinker assuming that all shrinkers were zero filled.  The dm
      bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
      in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
      But there are proposed patches which add other bits to shrinker.flags
      (e.g. memcg awareness).
      
      Rather than simply initializing the shrinker, this patch uses kzalloc()
      when allocating the dm_bufio_client to ensure that the embedded shrinker
      and any other similar structures are zeroed.
      
      This fixes theoretical over aggressive shrinking of dm bufio objects.
      If the uninitialized dm_bufio_client.shrinker.flags contains
      SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
      each numa node rather than just once.  This has been broken since 3.12.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9ee4fa03
    • Lars-Peter Clausen's avatar
      iio: buffer: Fix demux table creation · 8504485d
      Lars-Peter Clausen authored
      commit 61bd55ce upstream.
      
      When creating the demux table we need to iterate over the selected scan mask for
      the buffer to get the samples which should be copied to destination buffer.
      Right now the code uses the mask which contains all active channels, which means
      the demux table contains entries which causes it to copy all the samples from
      source to destination buffer one by one without doing any demuxing.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8504485d
    • Peter Meerwald's avatar
      iio:bma180: Missing check for frequency fractional part · e167bebf
      Peter Meerwald authored
      commit 9b2a4d35 upstream.
      
      val2 should be zero
      
      This will make no difference for correct inputs but will reject
      incorrect ones with a decimal part in the value written to the sysfs
      interface.
      Signed-off-by: default avatarPeter Meerwald <pmeerw@pmeerw.net>
      Cc: Oleksandr Kravchenko <o.v.kravchenko@globallogic.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e167bebf
    • Peter Meerwald's avatar
      iio:bma180: Fix scale factors to report correct acceleration units · 357249fb
      Peter Meerwald authored
      commit 381676d5 upstream.
      
      The userspace interface for acceleration sensors is documented as using
      m/s^2 units [Documentation/ABI/testing/sysfs-bus-iio]
      
      The fullscale raw values for the BMA80 corresponds to -/+ 1, 1.5, 2, etc G
      depending on the selected mode.
      
      The scale table was converting to G rather than m/s^2.
      Change the scaling table to match the documented interface.
      
      See commit 71702e6e, iio: mma8452: Use correct acceleration units,
      for a related fix.
      Signed-off-by: default avatarPeter Meerwald <pmeerw@pmeerw.net>
      Cc: Oleksandr Kravchenko <o.v.kravchenko@globallogic.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      357249fb
    • Malcolm Priestley's avatar
      staging: vt6655: Fix disassociated messages every 10 seconds · 0a5d8d71
      Malcolm Priestley authored
      commit 4aa0abed upstream.
      
      byReAssocCount is incremented every second resulting in
      disassociated message being send every 10 seconds whether
      connection or not.
      
      byReAssocCount should only advance while eCommandState
      is in WLAN_ASSOCIATE_WAIT
      
      Change existing scope to if condition.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0a5d8d71
    • Michal Hocko's avatar
      memcg: oom_notify use-after-free fix · 5f24f2a6
      Michal Hocko authored
      commit 2bcf2e92 upstream.
      
      Paul Furtado has reported the following GPF:
      
        general protection fault: 0000 [#1] SMP
        Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront
        CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1
        task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000
        RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240
        RSP: e02b:ffff8801d2ec7d48  EFLAGS: 00010283
        RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e
        RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200
        RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800
        R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30
        FS:  00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000
        CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660
        Call Trace:
          pagefault_out_of_memory+0x18/0x90
          mm_fault_error+0xa9/0x1a0
          __do_page_fault+0x478/0x4c0
          do_page_fault+0x2c/0x40
          page_fault+0x28/0x30
        Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75
        RIP  mem_cgroup_oom_synchronize+0x140/0x240
      
      Commit fb2a6fc5 ("mm: memcg: rework and document OOM waiting and
      wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock
      assuming it is protected by the hierarchical OOM-lock.
      
      Although this is true for the notification part the protection doesn't
      cover unregistration of event which can happen in parallel now so
      mem_cgroup_oom_notify can see already unlinked and/or freed
      mem_cgroup_eventfd_list.
      
      Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881
      
      Fixes: fb2a6fc5 (mm: memcg: rework and document OOM waiting and wakeup)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reported-by: default avatarPaul Furtado <paulfurtado91@gmail.com>
      Tested-by: default avatarPaul Furtado <paulfurtado91@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5f24f2a6
    • David Rientjes's avatar
      mm, thp: do not allow thp faults to avoid cpuset restrictions · 06c18813
      David Rientjes authored
      commit b104a35d upstream.
      
      The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
      should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
      should be restricted only to the set of allowed cpuset mems.
      
      Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
      the fault path from using memory compaction or direct reclaim.  Thus, it
      is unfairly able to allocate outside of its cpuset mems restriction as a
      side-effect.
      
      This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
      truly GFP_ATOMIC by verifying it is also not a thp allocation.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      06c18813
    • Maxim Patlasov's avatar
      mm/page-writeback.c: fix divide by zero in bdi_dirty_limits() · c5fec566
      Maxim Patlasov authored
      commit f6789593 upstream.
      
      Under memory pressure, it is possible for dirty_thresh, calculated by
      global_dirty_limits() in balance_dirty_pages(), to equal zero.  Then, if
      strictlimit is true, bdi_dirty_limits() tries to resolve the proportion:
      
        bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh
      
      by dividing by zero.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c5fec566
    • James Bottomley's avatar
      scsi: handle flush errors properly · ca519c15
      James Bottomley authored
      commit 89fb4cd1 upstream.
      
      Flush commands don't transfer data and thus need to be special cased
      in the I/O completion handler so that we can propagate errors to
      the block layer and filesystem.
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Reported-by: default avatarSteven Haber <steven@qumulo.com>
      Tested-by: default avatarSteven Haber <steven@qumulo.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ca519c15
    • Alexandre Bounine's avatar
      rapidio/tsi721_dma: fix failure to obtain transaction descriptor · 55415f0d
      Alexandre Bounine authored
      commit 0193ed82 upstream.
      
      This is a bug fix for the situation when function tsi721_desc_get() fails
      to obtain a free transaction descriptor.
      
      The bug usually results in a memory access crash dump when data transfer
      scatter-gather list has more entries than size of hardware buffer
      descriptors ring.  This fix ensures that error is properly returned to a
      caller instead of an invalid entry.
      
      This patch is applicable to kernel versions starting from v3.5.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      55415f0d