1. 31 Oct, 2014 11 commits
    • Guillaume Nault's avatar
      l2tp: fix race while getting PMTU on PPP pseudo-wire · 15c5ee37
      Guillaume Nault authored
      [ Upstream commit eed4d839 ]
      
      Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
      
      The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
      could return NULL if tunnel->sock->sk_dst_cache was reset just before the
      call, thus making dst_mtu() dereference a NULL pointer:
      
      [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
      [ 1937.664005] Oops: 0000 [#1] SMP
      [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
      [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
      [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
      [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
      [ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
      [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
      [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
      [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
      [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
      [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
      [ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
      [ 1937.664005] Stack:
      [ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
      [ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
      [ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
      [ 1937.664005] Call Trace:
      [ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
      [ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
      [ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
      [ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
      [ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
      [ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
      [ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
      [ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
      [ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
      [ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
      [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
      [ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005]  RSP <ffff8800c43c7de8>
      [ 1937.664005] CR2: 0000000000000020
      [ 1939.559375] ---[ end trace 82d44500f28f8708 ]---
      
      Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      15c5ee37
    • Gerhard Stenzel's avatar
      vxlan: fix incorrect initializer in union vxlan_addr · a1cc6e72
      Gerhard Stenzel authored
      [ Upstream commit a45e92a5 ]
      
      The first initializer in the following
      
              union vxlan_addr ipa = {
                  .sin.sin_addr.s_addr = tip,
                  .sa.sa_family = AF_INET,
              };
      
      is optimised away by the compiler, due to the second initializer,
      therefore initialising .sin.sin_addr.s_addr always to 0.
      This results in netlink messages indicating a L3 miss never contain the
      missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
      The problem affects user space programs relying on an IP address being
      sent as part of a netlink message indicating a L3 miss.
      
      Changing
                  .sa.sa_family = AF_INET,
      to
                  .sin.sin_family = AF_INET,
      fixes the problem.
      Signed-off-by: default avatarGerhard Stenzel <gerhard.stenzel@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a1cc6e72
    • Jiri Benc's avatar
      openvswitch: fix panic with multiple vlan headers · 05cd3928
      Jiri Benc authored
      [ Upstream commit 2ba5af42 ]
      
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      05cd3928
    • Eric Dumazet's avatar
      packet: handle too big packets for PACKET_V3 · 81070199
      Eric Dumazet authored
      [ Upstream commit dc808110 ]
      
      af_packet can currently overwrite kernel memory by out of bound
      accesses, because it assumed a [new] block can always hold one frame.
      
      This is not generally the case, even if most existing tools do it right.
      
      This patch clamps too long frames as API permits, and issue a one time
      error on syslog.
      
      [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
      
      In this example, packet header tp_snaplen was set to 3966,
      and tp_len was set to 5042 (skb->len)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      81070199
    • Neal Cardwell's avatar
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 7e3f24de
      Neal Cardwell authored
      [ Upstream commit 0c9ab092 ]
      
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7e3f24de
    • Neal Cardwell's avatar
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · e86507f2
      Neal Cardwell authored
      [ Upstream commit 4fab9071 ]
      
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e86507f2
    • Shmulik Ladkani's avatar
      sit: Fix ipip6_tunnel_lookup device matching criteria · 4ba9870f
      Shmulik Ladkani authored
      [ Upstream commit bc8fc7b8 ]
      
      As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
      when looking up a tunnel, tunnel's underlying interface (t->parms.link)
      is verified to match incoming traffic's ingress device.
      
      However the comparison was incorrectly based on skb->dev->iflink.
      
      Instead, dev->ifindex should be used, which correctly represents the
      interface from which the IP stack hands the ipip6 packets.
      
      This allows setting up sit tunnels bound to vlan interfaces (otherwise
      incoming ipip6 traffic on the vlan interface was dropped due to
      ipip6_tunnel_lookup match failure).
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4ba9870f
    • Andrey Vagin's avatar
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 1430b731
      Andrey Vagin authored
      [ Upstream commit 9d186cac ]
      
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1430b731
    • Stanislaw Gruszka's avatar
      myri10ge: check for DMA mapping errors · 5f2447e2
      Stanislaw Gruszka authored
      [ Upstream commit 10545937 ]
      
      On IOMMU systems DMA mapping can fail, we need to check for
      that possibility.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5f2447e2
    • Jiri Benc's avatar
      rtnetlink: fix VF info size · 1b50b995
      Jiri Benc authored
      [ Upstream commit 945a3676 ]
      
      Commit 1d8faf48 ("net/core: Add VF link state control") added new
      attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
      of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
      may trigger warnings in rtnl_getlink and similar functions when many VF
      links are enabled, as the information does not fit into the allocated skb.
      
      Fixes: 1d8faf48 ("net/core: Add VF link state control")
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1b50b995
    • Daniel Borkmann's avatar
      netlink: reset network header before passing to taps · 18c2c1df
      Daniel Borkmann authored
      [ Upstream commit 4e48ed88 ]
      
      netlink doesn't set any network header offset thus when the skb is
      being passed to tap devices via dev_queue_xmit_nit(), it emits klog
      false positives due to it being unset like:
      
        ...
        [  124.990397] protocol 0000 is buggy, dev nlmon0
        [  124.990411] protocol 0000 is buggy, dev nlmon0
        ...
      
      So just reset the network header before passing to the device; for
      packet sockets that just means nothing will change - mac and net
      offset hold the same value just as before.
      Reported-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18c2c1df
  2. 30 Oct, 2014 1 commit
  3. 28 Oct, 2014 5 commits
    • Cristian Stoica's avatar
      crypto: caam - remove duplicated sg copy functions · eacbc065
      Cristian Stoica authored
      commit 307fd543 upstream.
      
      Replace equivalent (and partially incorrect) scatter-gather functions
      with ones from crypto-API.
      
      The replacement is motivated by page-faults in sg_copy_part triggered
      by successive calls to crypto_hash_update. The following fault appears
      after calling crypto_ahash_update twice, first with 13 and then
      with 285 bytes:
      
      Unable to handle kernel paging request for data at address 0x00000008
      Faulting instruction address: 0xf9bf9a8c
      Oops: Kernel access of bad area, sig: 11 [#1]
      SMP NR_CPUS=8 CoreNet Generic
      Modules linked in: tcrypt(+) caamhash caam_jr caam tls
      CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted
      3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75
      task: e9308530 ti: e700e000 task.ti: e700e000
      NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0
      REGS: e700fb80 TRAP: 0300   Not tainted
      (3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2)
      MSR: 00029002 <CE,EE,ME>  CR: 44f92024  XER: 20000000
      DEAR: 00000008, ESR: 00000000
      
      GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888
      GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154
      GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48
      GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000
      NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash]
      LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]
      Call Trace:
      [e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable)
      [e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]
      [e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls]
      [e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls]
      [e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70
      [e700fe40] [c02259f0] alg_test_aead+0x50/0xc0
      [e700fe60] [c02241e4] alg_test+0x114/0x2e0
      [e700fee0] [c022276c] cryptomgr_test+0x4c/0x60
      [e700fef0] [c004f658] kthread+0x98/0xa0
      [e700ff40] [c000fd04] ret_from_kernel_thread+0x5c/0x64
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      eacbc065
    • Andy Lutomirski's avatar
      x86,kvm,vmx: Preserve CR4 across VM entry · 24d6e925
      Andy Lutomirski authored
      commit d974baa3 upstream.
      
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      24d6e925
    • Catalin Marinas's avatar
      futex: Ensure get_futex_key_refs() always implies a barrier · 5bb2394a
      Catalin Marinas authored
      commit 76835b0e upstream.
      
      Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's
      nothing to wake up) changes the futex code to avoid taking a lock when
      there are no waiters. This code has been subsequently fixed in commit
      11d4616b (futex: revert back to the explicit waiter counting code).
      Both the original commit and the fix-up rely on get_futex_key_refs() to
      always imply a barrier.
      
      However, for private futexes, none of the cases in the switch statement
      of get_futex_key_refs() would be hit and the function completes without
      a memory barrier as required before checking the "waiters" in
      futex_wake() -> hb_waiters_pending(). The consequence is a race with a
      thread waiting on a futex on another CPU, allowing the waker thread to
      read "waiters == 0" while the waiter thread to have read "futex_val ==
      locked" (in kernel).
      
      Without this fix, the problem (user space deadlocks) can be seen with
      Android bionic's mutex implementation on an arm64 multi-cluster system.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMatteo Franchin <Matteo.Franchin@arm.com>
      Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5bb2394a
    • Valdis Kletnieks's avatar
      pstore: Fix duplicate {console,ftrace}-efi entries · 42dde344
      Valdis Kletnieks authored
      commit d4bf205d upstream.
      
      The pstore filesystem still creates duplicate filename/inode pairs for
      some pstore types.  Add the id to the filename to prevent that.
      
      Before patch:
      
      [/sys/fs/pstore] ls -li
      total 0
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
      
      After:
      
      [/sys/fs/pstore] ls -li
      total 0
      1232 -r--r--r--. 1 root root 148 Sep 29 17:09 console-efi-141202499100000
      1231 -r--r--r--. 1 root root  67 Sep 29 17:09 console-efi-141202499200000
      1230 -r--r--r--. 1 root root 148 Sep 29 17:44 console-efi-141202705400000
      1229 -r--r--r--. 1 root root  67 Sep 29 17:44 console-efi-141202705500000
      1228 -r--r--r--. 1 root root  67 Sep 29 20:42 console-efi-141203772600000
      1227 -r--r--r--. 1 root root 148 Sep 29 23:42 console-efi-141204854900000
      1226 -r--r--r--. 1 root root  67 Sep 29 23:42 console-efi-141204855000000
      1225 -r--r--r--. 1 root root 148 Sep 29 23:59 console-efi-141204954200000
      1224 -r--r--r--. 1 root root  67 Sep 29 23:59 console-efi-141204954400000
      Signed-off-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42dde344
    • Stephen Smalley's avatar
      selinux: fix inode security list corruption · ffc51c96
      Stephen Smalley authored
      commit 923190d3 upstream.
      
      sb_finish_set_opts() can race with inode_free_security()
      when initializing inode security structures for inodes
      created prior to initial policy load or by the filesystem
      during ->mount().   This appears to have always been
      a possible race, but commit 3dc91d43 ("SELinux:  Fix possible
      NULL pointer dereference in selinux_inode_permission()")
      made it more evident by immediately reusing the unioned
      list/rcu element  of the inode security structure for call_rcu()
      upon an inode_free_security().  But the underlying issue
      was already present before that commit as a possible use-after-free
      of isec.
      
      Shivnandan Kumar reported the list corruption and proposed
      a patch to split the list and rcu elements out of the union
      as separate fields of the inode_security_struct so that setting
      the rcu element would not affect the list element.  However,
      this would merely hide the issue and not truly fix the code.
      
      This patch instead moves up the deletion of the list entry
      prior to dropping the sbsec->isec_lock initially.  Then,
      if the inode is dropped subsequently, there will be no further
      references to the isec.
      Reported-by: default avatarShivnandan Kumar <shivnandan.k@samsung.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ffc51c96
  4. 20 Oct, 2014 23 commits
    • Michael S. Tsirkin's avatar
      virtio_pci: fix virtio spec compliance on restore · 8dab8969
      Michael S. Tsirkin authored
      commit 6fbc198c upstream.
      
      On restore, virtio pci does the following:
      + set features
      + init vqs etc - device can be used at this point!
      + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
      
      This is in violation of the virtio spec, which
      requires the following order:
      - ACKNOWLEDGE
      - DRIVER
      - init vqs
      - DRIVER_OK
      
      This behaviour will break with hypervisors that assume spec compliant
      behaviour.  It seems like a good idea to have this patch applied to
      stable branches to reduce the support butden for the hypervisors.
      
      Cc: Amit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8dab8969
    • Prarit Bhargava's avatar
      modules, lock around setting of MODULE_STATE_UNFORMED · 8cff1403
      Prarit Bhargava authored
      commit d3051b48 upstream.
      
      A panic was seen in the following sitation.
      
      There are two threads running on the system. The first thread is a system
      monitoring thread that is reading /proc/modules. The second thread is
      loading and unloading a module (in this example I'm using my simple
      dummy-module.ko).  Note, in the "real world" this occurred with the qlogic
      driver module.
      
      When doing this, the following panic occurred:
      
       ------------[ cut here ]------------
       kernel BUG at kernel/module.c:3739!
       invalid opcode: 0000 [#1] SMP
       Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
       CPU: 37 PID: 186343 Comm: cat Tainted: GF          O--------------   3.10.0+ #7
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
       task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
       RIP: 0010:[<ffffffff810d64c5>]  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
       RSP: 0018:ffff88080fa7fe18  EFLAGS: 00010246
       RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
       RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
       RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
       R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
       R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
       FS:  00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
        ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
        ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
       Call Trace:
        [<ffffffff810d666c>] m_show+0x19c/0x1e0
        [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0
        [<ffffffff812281ed>] proc_reg_read+0x3d/0x80
        [<ffffffff811c0f2c>] vfs_read+0x9c/0x170
        [<ffffffff811c1a58>] SyS_read+0x58/0xb0
        [<ffffffff81605829>] system_call_fastpath+0x16/0x1b
       Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
       RIP  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
        RSP <ffff88080fa7fe18>
      
          Consider the two processes running on the system.
      
          CPU 0 (/proc/modules reader)
          CPU 1 (loading/unloading module)
      
          CPU 0 opens /proc/modules, and starts displaying data for each module by
          traversing the modules list via fs/seq_file.c:seq_open() and
          fs/seq_file.c:seq_read().  For each module in the modules list, seq_read
          does
      
                  op->start()  <-- this is a pointer to m_start()
                  op->show()   <- this is a pointer to m_show()
                  op->stop()   <-- this is a pointer to m_stop()
      
          The m_start(), m_show(), and m_stop() module functions are defined in
          kernel/module.c. The m_start() and m_stop() functions acquire and release
          the module_mutex respectively.
      
          ie) When reading /proc/modules, the module_mutex is acquired and released
          for each module.
      
          m_show() is called with the module_mutex held.  It accesses the module
          struct data and attempts to write out module data.  It is in this code
          path that the above BUG_ON() warning is encountered, specifically m_show()
          calls
      
          static char *module_flags(struct module *mod, char *buf)
          {
                  int bx = 0;
      
                  BUG_ON(mod->state == MODULE_STATE_UNFORMED);
          ...
      
          The other thread, CPU 1, in unloading the module calls the syscall
          delete_module() defined in kernel/module.c.  The module_mutex is acquired
          for a short time, and then released.  free_module() is called without the
          module_mutex.  free_module() then sets mod->state = MODULE_STATE_UNFORMED,
          also without the module_mutex.  Some additional code is called and then the
          module_mutex is reacquired to remove the module from the modules list:
      
              /* Now we can delete it from the lists */
              mutex_lock(&module_mutex);
              stop_machine(__unlink_module, mod, NULL);
              mutex_unlock(&module_mutex);
      
      This is the sequence of events that leads to the panic.
      
      CPU 1 is removing dummy_module via delete_module().  It acquires the
      module_mutex, and then releases it.  CPU 1 has NOT set dummy_module->state to
      MODULE_STATE_UNFORMED yet.
      
      CPU 0, which is reading the /proc/modules, acquires the module_mutex and
      acquires a pointer to the dummy_module which is still in the modules list.
      CPU 0 calls m_show for dummy_module.  The check in m_show() for
      MODULE_STATE_UNFORMED passed for dummy_module even though it is being
      torn down.
      
      Meanwhile CPU 1, which has been continuing to remove dummy_module without
      holding the module_mutex, now calls free_module() and sets
      dummy_module->state to MODULE_STATE_UNFORMED.
      
      CPU 0 now calls module_flags() with dummy_module and ...
      
      static char *module_flags(struct module *mod, char *buf)
      {
              int bx = 0;
      
              BUG_ON(mod->state == MODULE_STATE_UNFORMED);
      
      and BOOM.
      
      Acquire and release the module_mutex lock around the setting of
      MODULE_STATE_UNFORMED in the teardown path, which should resolve the
      problem.
      
      Testing: In the unpatched kernel I can panic the system within 1 minute by
      doing
      
      while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done
      
      and
      
      while (true) do cat /proc/modules; done
      
      in separate terminals.
      
      In the patched kernel I was able to run just over one hour without seeing
      any issues.  I also verified the output of panic via sysrq-c and the output
      of /proc/modules looks correct for all three states for the dummy_module.
      
              dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
              dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
              dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8cff1403
    • Eric W. Biederman's avatar
      mnt: Prevent pivot_root from creating a loop in the mount tree · 21a9498d
      Eric W. Biederman authored
      commit 0d082601 upstream.
      
      Andy Lutomirski recently demonstrated that when chroot is used to set
      the root path below the path for the new ``root'' passed to pivot_root
      the pivot_root system call succeeds and leaks mounts.
      
      In examining the code I see that starting with a new root that is
      below the current root in the mount tree will result in a loop in the
      mount tree after the mounts are detached and then reattached to one
      another.  Resulting in all kinds of ugliness including a leak of that
      mounts involved in the leak of the mount loop.
      
      Prevent this problem by ensuring that the new mount is reachable from
      the current root of the mount tree.
      
      [Added stable cc.  Fixes CVE-2014-7970.  --Andy]
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      21a9498d
    • Ilya Dryomov's avatar
      libceph: ceph-msgr workqueue needs a resque worker · 64982ef8
      Ilya Dryomov authored
      commit f9865f06 upstream.
      
      Commit f363e45f ("net/ceph: make ceph_msgr_wq non-reentrant")
      effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
      wrong - libceph is very much a memory reclaim path, so restore it.
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Tested-by: default avatarMicha Krause <micha@krausam.de>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      64982ef8
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix deadlock in synth voice lookup · cfec8447
      Takashi Iwai authored
      commit 95926035 upstream.
      
      The emu10k1 voice allocator takes voice_lock spinlock.  When there is
      no empty stream available, it tries to release a voice used by synth,
      and calls get_synth_voice.  The callback function,
      snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
      thus it deadlocks.
      
      The fix is simply removing the voice_lock holds in
      snd_emu10k1_synth_get_voice(), as this is always called in the
      spinlock context.
      Reported-and-tested-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cfec8447
    • Sasha Levin's avatar
      kernel: add support for gcc 5 · fbffc5ce
      Sasha Levin authored
      commit 71458cfc upstream.
      
      We're missing include/linux/compiler-gcc5.h which is required now
      because gcc branched off to v5 in trunk.
      
      Just copy the relevant bits out of include/linux/compiler-gcc4.h,
      no new code is added as of now.
      
      This fixes a build error when using gcc 5.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fbffc5ce
    • Thorsten Knabe's avatar
      um: ubd: Fix for processes stuck in D state forever · 103ed905
      Thorsten Knabe authored
      commit 2a236122 upstream.
      
      Starting with Linux 3.12 processes get stuck in D state forever in
      UserModeLinux under sync heavy workloads. This bug was introduced by
      commit 805f11a0 (um: ubd: Add REQ_FLUSH suppport).
      Fix bug by adding a check if FLUSH request was successfully submitted to
      the I/O thread and keeping the FLUSH request on the request queue on
      submission failures.
      
      Fixes: 805f11a0 (um: ubd: Add REQ_FLUSH suppport)
      Signed-off-by: default avatarThorsten Knabe <linux@thorsten-knabe.de>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      103ed905
    • Hans de Goede's avatar
      Input: i8042 - add noloop quirk for Asus X750LN · 1c1275de
      Hans de Goede authored
      commit 9ff84a17 upstream.
      
      Without this the aux port does not get detected, and consequently the
      touchpad will not work.
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1110011Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1c1275de
    • Dmitry Torokhov's avatar
      Input: synaptics - gate forcepad support by DMI check · 5a4f7318
      Dmitry Torokhov authored
      commit aa972409 upstream.
      
      Unfortunately, ForcePad capability is not actually exported over PS/2, so
      we have to resort to DMI checks.
      Reported-by: default avatarNicole Faerber <nicole.faerber@kernelconcepts.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5a4f7318
    • Junxiao Bi's avatar
      mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set · 32304d12
      Junxiao Bi authored
      commit 934f3072 upstream.
      
      commit 21caf2fc ("mm: teach mm by current context info to not do I/O
      during memory allocation") introduces PF_MEMALLOC_NOIO flag to avoid doing
      I/O inside memory allocation, __GFP_IO is cleared when this flag is set,
      but __GFP_FS implies __GFP_IO, it should also be cleared.  Or it may still
      run into I/O, like in superblock shrinker.  And this will make the kernel
      run into the deadlock case described in that commit.
      
      See Dave Chinner's comment about io in superblock shrinker:
      
      Filesystem shrinkers do indeed perform IO from the superblock shrinker and
      have for years.  Even clean inodes can require IO before they can be freed
      - e.g.  on an orphan list, need truncation of post-eof blocks, need to
      wait for ordered operations to complete before it can be freed, etc.
      
      IOWs, Ext4, btrfs and XFS all can issue and/or block on arbitrary amounts
      of IO in the superblock shrinker context.  XFS, in particular, has been
      doing transactions and IO from the VFS inode cache shrinker since it was
      first introduced....
      
      Fix this by clearing __GFP_FS in memalloc_noio_flags(), this function has
      masked all the gfp_mask that will be passed into fs for the processes
      setting PF_MEMALLOC_NOIO in the direct reclaim path.
      
      v1 thread at: https://lkml.org/lkml/2014/9/3/32Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: joyce.xue <xuejiufei@huawei.com>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      32304d12
    • Yann Droneaud's avatar
      fanotify: enable close-on-exec on events' fd when requested in fanotify_init() · 1319526d
      Yann Droneaud authored
      commit 0b37e097 upstream.
      
      According to commit 80af2588 ("fanotify: groups can specify their
      f_flags for new fd"), file descriptors created as part of file access
      notification events inherit flags from the event_f_flags argument passed
      to syscall fanotify_init(2)[1].
      
      Unfortunately O_CLOEXEC is currently silently ignored.
      
      Indeed, event_f_flags are only given to dentry_open(), which only seems to
      care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
      open_check_o_direct() and O_LARGEFILE in generic_file_open().
      
      It's a pity, since, according to some lookup on various search engines and
      http://codesearch.debian.net/, there's already some userspace code which
      use O_CLOEXEC:
      
      - in systemd's readahead[2]:
      
          fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
      
      - in clsync[3]:
      
          #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
      
          int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
      
      - in examples [4] from "Filesystem monitoring in the Linux
        kernel" article[5] by Aleksander Morgado:
      
          if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                            O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
      
      Additionally, since commit 48149e9d ("fanotify: check file flags
      passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
      second argument is expressly allowed.
      
      So it seems expected to set close-on-exec flag on the file descriptors if
      userspace is allowed to request it with O_CLOEXEC.
      
      But Andrew Morton raised[6] the concern that enabling now close-on-exec
      might break existing applications which ask for O_CLOEXEC but expect the
      file descriptor to be inherited across exec().
      
      In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
      descriptor returned as part of file access notify can break applications
      due to deadlock.  So close-on-exec is needed for most applications.
      
      More, applications asking for close-on-exec are likely expecting it to be
      enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
      their security, as noted by Jan Kara[8].
      
      So this patch replaces call to macro get_unused_fd() by a call to function
      get_unused_fd_flags() with event_f_flags value as argument.  This way
      O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
      interpreted and close-on-exec get enabled when requested.
      
      [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
      [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
      [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
          https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
      [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
      [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
      [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
      [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
      [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
      
      Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Tested-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
      Cc: Pádraig Brady <P@draigBrady.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
      Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1319526d
    • Mike Snitzer's avatar
      block: fix alignment_offset math that assumes io_min is a power-of-2 · db78229e
      Mike Snitzer authored
      commit b8839b8c upstream.
      
      The math in both blk_stack_limits() and queue_limit_alignment_offset()
      assume that a block device's io_min (aka minimum_io_size) is always a
      power-of-2.  Fix the math such that it works for non-power-of-2 io_min.
      
      This issue (of alignment_offset != 0) became apparent when testing
      dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
      1280K.  Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data
      block size") unlocked the potential for alignment_offset != 0 due to
      the dm-thin-pool's io_min possibly being a non-power-of-2.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      db78229e
    • Al Viro's avatar
      fix misuses of f_count() in ppp and netlink · f86b6802
      Al Viro authored
      commit 24dff96a upstream.
      
      we used to check for "nobody else could start doing anything with
      that opened file" by checking that refcount was 2 or less - one
      for descriptor table and one we'd acquired in fget() on the way to
      wherever we are.  That was race-prone (somebody else might have
      had a reference to descriptor table and do fget() just as we'd
      been checking) and it had become flat-out incorrect back when
      we switched to fget_light() on those codepaths - unlike fget(),
      it doesn't grab an extra reference unless the descriptor table
      is shared.  The same change allowed a race-free check, though -
      we are safe exactly when refcount is less than 2.
      
      It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
      to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
      netlink hadn't grown that check until 3.9 and ppp used to live
      in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
      well before that, though, and the same fix used to apply in old
      location of file.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f86b6802
    • Mikulas Patocka's avatar
      fs: make cont_expand_zero interruptible · 343952a7
      Mikulas Patocka authored
      commit c2ca0fcd upstream.
      
      This patch makes it possible to kill a process looping in
      cont_expand_zero. A process may spend a lot of time in this function, so
      it is desirable to be able to kill it.
      
      It happened to me that I wanted to copy a piece data from the disk to a
      file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
      to the "seek" parameter, dd attempted to extend the file and became stuck
      doing so - the only possibility was to reset the machine or wait many
      hours until the filesystem runs out of space and cont_expand_zero fails.
      We need this patch to be able to terminate the process.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      343952a7
    • Tetsuo Handa's avatar
      fs: Fix theoretical division by 0 in super_cache_scan(). · b92a2fc3
      Tetsuo Handa authored
      commit 475d0db7 upstream.
      
      total_objects could be 0 and is used as a denom.
      
      While total_objects is a "long", total_objects == 0 unlikely happens for
      3.12 and later kernels because 32-bit architectures would not be able to
      hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
      between 3.1 and 3.11 because total_objects in prune_super() was an "int"
      and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b92a2fc3
    • Al Viro's avatar
      [jffs2] kill wbuf_queued/wbuf_dwork_lock · 383aa9d3
      Al Viro authored
      commit 99358a1c upstream.
      
      schedule_delayed_work() happening when the work is already pending is
      a cheap no-op.  Don't bother with ->wbuf_queued logics - it's both
      broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris)
      and pointless.  It's cheaper to let schedule_delayed_work() handle that
      case.
      Reported-by: default avatarJeff Harris <jefftharris@gmail.com>
      Tested-by: default avatarJeff Harris <jefftharris@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      383aa9d3
    • Ben Hutchings's avatar
      x86: Reject x32 executables if x32 ABI not supported · 86e384a4
      Ben Hutchings authored
      commit 0e6d3112 upstream.
      
      It is currently possible to execve() an x32 executable on an x86_64
      kernel that has only ia32 compat enabled.  However all its syscalls
      will fail, even _exit().  This usually causes it to segfault.
      
      Change the ELF compat architecture check so that x32 executables are
      rejected if we don't support the x32 ABI.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      86e384a4
    • Benjamin Herrenschmidt's avatar
      drm/ast: Fix HW cursor image · 082046b1
      Benjamin Herrenschmidt authored
      commit 1e99cfa8 upstream.
      
      The translation from the X driver to the KMS one typo'ed a couple
      of array indices, causing the HW cursor to look weird (blocky with
      leaking edge colors). This fixes it.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      082046b1
    • Scott Carter's avatar
      pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller · fb7c68e9
      Scott Carter authored
      commit 37017ac6 upstream.
      
      The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211)
      does not support 64-KB DMA transfers.
      Whenever a 64-KB DMA transfer is attempted,
      the transfer fails and messages similar to the following
      are written to the console log:
      
         [ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code
         [ 2431.851139] sr 0:0:0:0: [sr0]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
         [ 2431.851152] sr 0:0:0:0: [sr0]  Sense Key : Hardware Error [current]
         [ 2431.851166] sr 0:0:0:0: [sr0]  Add. Sense: Logical unit communication time-out
         [ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00
         [ 2431.851210] end_request: I/O error, dev sr0, sector 121808
      
      When the libata and pata_serverworks modules
      are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h,
      the 64-KB transfer size in the scatter-gather list can be seen
      in the console log:
      
         [ 2664.897267] sr 9:0:0:0: [sr0] Send:
         [ 2664.897274] 0xf63d85e0
         [ 2664.897283] sr 9:0:0:0: [sr0] CDB:
         [ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00
         [ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700
         [ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40
         [ 2664.897338] ata_scsi_translate: ENTER
         [ 2664.897345] ata_sg_setup: ENTER, ata1
         [ 2664.897356] ata_sg_setup: 3 sg elements mapped
         [ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000)
         [ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000)
         ------------------------------------------------------> =======
         [ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000)
         [ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1
         [ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC
         [ 2664.897428] ata_sff_tf_load: device 0xA0
         [ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0
         [ 2664.897457] ata_scsi_translate: EXIT
         [ 2664.897462] leaving scsi_dispatch_cmnd()
         [ 2664.897497] Doing sr request, dev = sr0, block = 0
         [ 2664.897507] sr0 : reading 64/256 512 byte blocks.
         [ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58)
         [ 2664.897560] atapi_send_cdb: send cdb
         [ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64
         [ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3
         [ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51)
         [ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51)
         [ 2666.910129] sr 9:0:0:0: [sr0] Done:
         [ 2666.910136] 0xf63d85e0 TIMEOUT
      
      lspci shows that the driver used for the Broadcom OSB4 IDE Controller is
      pata_serverworks:
      
         00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP])
                 Flags: bus master, medium devsel, latency 64
                 [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
                 [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
                 I/O ports at 0170 [size=8]
                 I/O ports at 0374 [size=4]
                 I/O ports at 1440 [size=16]
                 Kernel driver in use: pata_serverworks
      
      The pata_serverworks driver supports five distinct device IDs,
      one being the OSB4 and the other four belonging to the CSB series.
      The CSB series appears to support 64-KB DMA transfers,
      as tests on a machine with an SAI2 motherboard
      containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212)
      showed no problems with 64-KB DMA transfers.
      
      This problem was first discovered when attempting to install openSUSE
      from a DVD on a machine with an STL2 motherboard.
      Using the pata_serverworks module,
      older releases of openSUSE will not install at all due to the timeouts.
      Releases of openSUSE prior to 11.3 can be installed by disabling
      the pata_serverworks module using the brokenmodules boot parameter,
      which causes the serverworks module to be used instead.
      Recent releases of openSUSE (12.2 and later) include better error recovery and
      will install, though very slowly.
      On all openSUSE releases, the problem can be recreated
      on a machine containing a Broadcom OSB4 IDE Controller
      by mounting an install DVD and running a command similar to the following:
      
         find /mnt -type f -print | xargs cat > /dev/null
      
      The patch below corrects the problem.
      Similar to the other ATA drivers that do not support 64-KB DMA transfers,
      the patch changes the ata_port_operations qc_prep vector to point to a routine
      that breaks any 64-KB segment into two 32-KB segments and
      changes the scsi_host_template sg_tablesize element to reduce by half
      the number of scatter/gather elements allowed.
      These two changes affect only the OSB4.
      Signed-off-by: default avatarScott Carter <ccscott@funsoft.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fb7c68e9
    • Daniel Mack's avatar
      ASoC: soc-dapm: fix use after free · a7c4cee0
      Daniel Mack authored
      commit e5092c96 upstream.
      
      Coverity spotted the following possible use-after-free condition in
      dapm_create_or_share_mixmux_kcontrol():
      
      If kcontrol is NULL, and (wname_in_long_name && kcname_in_long_name)
      validates to true, 'name' will be set to an allocated string, and be
      freed a few lines later via the 'long_name' alias. 'name', however,
      is used by dev_err() in case snd_ctl_add() fails.
      
      Fix this by adding a jump label that frees 'long_name' at the end of
      the function.
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a7c4cee0
    • Andy Lutomirski's avatar
      x86_64, entry: Filter RFLAGS.NT on entry from userspace · f4e0b275
      Andy Lutomirski authored
      commit 8c7aa698 upstream.
      
      The NT flag doesn't do anything in long mode other than causing IRET
      to #GP.  Oddly, CPL3 code can still set NT using popf.
      
      Entry via hardware or software interrupt clears NT automatically, so
      the only relevant entries are fast syscalls.
      
      If user code causes kernel code to run with NT set, then there's at
      least some (small) chance that it could cause trouble.  For example,
      user code could cause a call to EFI code with NT set, and who knows
      what would happen?  Apparently some games on Wine sometimes do
      this (!), and, if an IRET return happens, they will segfault.  That
      segfault cannot be handled, because signal delivery fails, too.
      
      This patch programs the CPU to clear NT on entry via SYSCALL (both
      32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
      in software on entry via SYSENTER.
      
      To save a few cycles, this borrows a trick from Jan Beulich in Xen:
      it checks whether NT is set before trying to clear it.  As a result,
      it seems to have very little effect on SYSENTER performance on my
      machine.
      
      There's another minor bug fix in here: it looks like the CFI
      annotations were wrong if CONFIG_AUDITSYSCALL=n.
      
      Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
      
      I haven't touched anything on 32-bit kernels.
      
      The syscall mask change comes from a variant of this patch by Anish
      Bhatt.
      
      Note to stable maintainers: there is no known security issue here.
      A misguided program can set NT and cause the kernel to try and fail
      to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
      on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275Reported-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f4e0b275
    • Chao Yu's avatar
      ecryptfs: avoid to access NULL pointer when write metadata in xattr · 94a5a180
      Chao Yu authored
      commit 35425ea2 upstream.
      
      Christopher Head 2014-06-28 05:26:20 UTC described:
      "I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
      in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      PGD d7840067 PUD b2c3c067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: nvidia(PO)
      CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
      Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
      task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
      RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
      RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
      RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
      RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
      R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
      FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
      Stack:
      ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
      ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
      00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
      Call Trace:
      [<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
      [<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
      [<ffffffff81082c2c>] ? should_resched+0x5/0x23
      [<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
      [<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
      [<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
      [<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
      [<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
      [<ffffffff810fa2a3>] ? path_openat+0x224/0x472
      [<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
      [<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
      [<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
      [<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP <ffff8800bad71c10>
      CR2: 0000000000000000
      ---[ end trace df9dba5f1ddb8565 ]---"
      
      If we create a file when we mount with ecryptfs_xattr_metadata option, we will
      encounter a crash in this path:
      ->ecryptfs_create
        ->ecryptfs_initialize_file
          ->ecryptfs_write_metadata
            ->ecryptfs_write_metadata_to_xattr
              ->ecryptfs_setxattr
                ->fsstack_copy_attr_all
      It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
      will be initialized when ecryptfs_initialize_file finish.
      
      So we should skip copying attr from lower inode when the value of ->d_inode is
      invalid.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      94a5a180
    • Alexey Khoroshilov's avatar
      dm log userspace: fix memory leak in dm_ulog_tfr_init failure path · 4cf4433f
      Alexey Khoroshilov authored
      commit 56ec16cb upstream.
      
      If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
      deallocate prealloced memory but calls cn_del_callback().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4cf4433f