1. 01 Dec, 2014 40 commits
    • Zefan Li's avatar
      Linux 3.4.105 · 7fd7a446
      Zefan Li authored
      7fd7a446
    • Guillaume Nault's avatar
      l2tp: fix race while getting PMTU on PPP pseudo-wire · 4290973d
      Guillaume Nault authored
      commit eed4d839 upstream.
      
      Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
      
      The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
      could return NULL if tunnel->sock->sk_dst_cache was reset just before the
      call, thus making dst_mtu() dereference a NULL pointer:
      
      [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
      [ 1937.664005] Oops: 0000 [#1] SMP
      [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
      [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
      [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
      [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
      [ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
      [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
      [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
      [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
      [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
      [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
      [ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
      [ 1937.664005] Stack:
      [ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
      [ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
      [ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
      [ 1937.664005] Call Trace:
      [ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
      [ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
      [ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
      [ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
      [ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
      [ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
      [ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
      [ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
      [ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
      [ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
      [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
      [ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005]  RSP <ffff8800c43c7de8>
      [ 1937.664005] CR2: 0000000000000020
      [ 1939.559375] ---[ end trace 82d44500f28f8708 ]---
      
      Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Guillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4290973d
    • Narendra K's avatar
      ixgbevf: Prevent RX/TX statistics getting reset to zero · 9024225c
      Narendra K authored
      commit 93659763 upstream.
      
      The commit 4197aa7b implements 64 bit
      per ring statistics. But the driver resets the 'total_bytes' and
      'total_packets' from RX and TX rings in the RX and TX interrupt
      handlers to zero. This results in statistics being lost and user space
      reporting RX and TX statistics as zero. This patch addresses the
      issue by preventing the resetting of RX and TX ring statistics to
      zero.
      Signed-off-by: default avatarNarendra K <narendra_k@dell.com>
      Tested-by: default avatarSibai Li <sibai.li@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Weng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9024225c
    • Benjamin Poirier's avatar
      net: Do not enable tx-nocache-copy by default · 10d5d534
      Benjamin Poirier authored
      commit cdb3f4a3 upstream.
      
      There are many cases where this feature does not improve performance or even
      reduces it.
      
      For example, here are the results from tests that I've run using 3.12.6 on one
      Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
      from the Xeon, but they're similar on the i7. All numbers report the
      mean±stddev over 10 runs of 10s.
      
      1) latency tests similar to what is described in "c6e1a0d1 net: Allow no-cache
      copy from user on transmit"
      There is no statistically significant difference between tx-nocache-copy
      on/off.
      nic irqs spread out (one queue per cpu)
      
      200x netperf -r 1400,1
      tx-nocache-copy off
              692000±1000 tps
              50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
      tx-nocache-copy on
              693000±1000 tps
              50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7
      
      200x netperf -r 14000,14000
      tx-nocache-copy off
              86450±80 tps
              50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
      tx-nocache-copy on
              86110±60 tps
              50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20
      
      2) single stream throughput tests
      tx-nocache-copy leads to higher service demand
      
                              throughput  cpu0        cpu1        demand
                              (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)
      
      nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
      
      tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
      tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004
      
      nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
      
      tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
      tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002
      
      As a second example, here are some results from Eric Dumazet with latest
      net-next.
      tx-nocache-copy also leads to higher service demand
      
      (cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             4282.648396 task-clock                #    0.423 CPUs utilized
                   9,348 context-switches          #    0.002 M/sec
                      88 CPU-migrations            #    0.021 K/sec
                     355 page-faults               #    0.083 K/sec
          11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
           9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
           4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
           6,053,172,792 instructions              #    0.51  insns per cycle
                                                   #    1.49  stalled cycles per insn [83.64%]
             597,275,583 branches                  #  139.464 M/sec                   [83.70%]
               8,960,541 branch-misses             #    1.50% of all branches         [83.65%]
      
            10.128990264 seconds time elapsed
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             2847.375441 task-clock                #    0.281 CPUs utilized
                  11,632 context-switches          #    0.004 M/sec
                      49 CPU-migrations            #    0.017 K/sec
                     354 page-faults               #    0.124 K/sec
           7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
           6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
           1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
           2,079,702,453 instructions              #    0.27  insns per cycle
                                                   #    2.94  stalled cycles per insn [83.22%]
             363,773,213 branches                  #  127.757 M/sec                   [83.29%]
               4,242,732 branch-misses             #    1.17% of all branches         [83.51%]
      
            10.128449949 seconds time elapsed
      
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      10d5d534
    • Hannes Frederic Sowa's avatar
      ipv6: reuse ip6_frag_id from ip6_ufo_append_data · c58038a1
      Hannes Frederic Sowa authored
      commit 916e4cf4 upstream.
      
      Currently we generate a new fragmentation id on UFO segmentation. It
      is pretty hairy to identify the correct net namespace and dst there.
      Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
      available at all.
      
      This causes unreliable or very predictable ipv6 fragmentation id
      generation while segmentation.
      
      Luckily we already have pregenerated the ip6_frag_id in
      ip6_ufo_append_data and can use it here.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust filename, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c58038a1
    • Ben Hutchings's avatar
      rtl8192ce: Fix null dereference in watchdog · 4b7f15c2
      Ben Hutchings authored
      Dmitry Semyonov reported that after upgrading from 3.2.54 to
      3.2.57 the rtl8192ce driver will crash when its interface is brought
      up.  The oops message shows:
      
      [ 1833.611397] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [ 1833.611455] IP: [<ffffffffa0410c6a>] rtl92ce_update_hal_rate_tbl+0x29/0x4db [rtl8192ce]
      ...
      [ 1833.613326] Call Trace:
      [ 1833.613346]  [<ffffffffa02ad9c6>] ? rtl92c_dm_watchdog+0xd0b/0xec9 [rtl8192c_common]
      [ 1833.613391]  [<ffffffff8105b5cf>] ? process_one_work+0x161/0x269
      [ 1833.613425]  [<ffffffff8105c598>] ? worker_thread+0xc2/0x145
      [ 1833.613458]  [<ffffffff8105c4d6>] ? manage_workers.isra.25+0x15b/0x15b
      [ 1833.613496]  [<ffffffff8105f6d9>] ? kthread+0x76/0x7e
      [ 1833.613527]  [<ffffffff81356b74>] ? kernel_thread_helper+0x4/0x10
      [ 1833.613563]  [<ffffffff8105f663>] ? kthread_worker_fn+0x139/0x139
      [ 1833.613598]  [<ffffffff81356b70>] ? gs_change+0x13/0x13
      
      Disassembly of rtl92ce_update_hal_rate_tbl() shows that the 'sta'
      parameter was null.  None of the changes to the rtlwifi family between
      3.2.54 and 3.2.57 seem to directly cause this, and reverting commit
      f78bccd7 ('rtlwifi: rtl8192ce: Fix too long disable of IRQs')
      doesn't fix it.
      
      rtl92c_dm_watchdog() calls rtl92ce_update_hal_rate_tbl() via
      rtl92c_dm_refresh_rate_adaptive_mask(), which does not appear in the
      call trace as it was inlined.  That function has been completely
      removed upstream which may explain why this crash wasn't seen there.
      
      I'm not sure that it is sensible to completely remove
      rtl92c_dm_refresh_rate_adaptive_mask() without making other
      compensating changes elsewhere, so try to work around this for 3.2 by
      checking for a null pointer in rtl92c_dm_refresh_rate_adaptive_mask()
      and then skipping the call to rtl92ce_update_hal_rate_tbl().
      
      References: https://bugs.debian.org/745137
      References: https://bugs.debian.org/745462Reported-by: default avatarDmitry Semyonov <linulin@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Larry Finger <Larry.Finger@lwfinger.net>
      Cc: Chaoming Li <chaoming_li@realsil.com.cn>
      Cc: Satoshi IWAMOTO <satoshi.iwamoto@nifty.ne.jp>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4b7f15c2
    • Marcelo Ricardo Leitner's avatar
      ipv4: disable bh while doing route gc · 50f1b3d5
      Marcelo Ricardo Leitner authored
      Further tests revealed that after moving the garbage collector to a work
      queue and protecting it with a spinlock may leave the system prone to
      soft lockups if bottom half gets very busy.
      
      It was reproced with a set of firewall rules that REJECTed packets. If
      the NIC bottom half handler ends up running on the same CPU that is
      running the garbage collector on a very large cache, the garbage
      collector will not be able to do its job due to the amount of work
      needed for handling the REJECTs and also won't reschedule.
      
      The fix is to disable bottom half during the garbage collecting, as it
      already was in the first place (most calls to it came from softirqs).
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      50f1b3d5
    • Marcelo Ricardo Leitner's avatar
      ipv4: avoid parallel route cache gc executions · b54ca608
      Marcelo Ricardo Leitner authored
      When rt_intern_hash() has to deal with neighbour cache overflowing,
      it triggers the route cache garbage collector in an attempt to free
      some references on neighbour entries.
      
      Such call cannot be done async but should also not run in parallel with
      an already-running one, so that they don't collapse fighting over the
      hash lock entries.
      
      This patch thus blocks parallel executions with spinlocks:
      - A call from worker and from rt_intern_hash() are not the same, and
      cannot be merged, thus they will wait each other on rt_gc_lock.
      - Calls to gc from rt_intern_hash() may happen in parallel but we must
      wait for it to finish in order to try again. This dedup and
      synchrinozation is then performed by the locking just before calling
      __do_rt_garbage_collect().
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b54ca608
    • Marcelo Ricardo Leitner's avatar
      ipv4: move route garbage collector to work queue · b6153ead
      Marcelo Ricardo Leitner authored
      Currently the route garbage collector gets called by dst_alloc() if it
      have more entries than the threshold. But it's an expensive call, that
      don't really need to be done by then.
      
      Another issue with current way is that it allows running the garbage
      collector with the same start parameters on multiple CPUs at once, which
      is not optimal. A system may even soft lockup if the cache is big enough
      as the garbage collectors will be fighting over the hash lock entries.
      
      This patch thus moves the garbage collector to run asynchronously on a
      work queue, much similar to how rt_expire_check runs.
      
      There is one condition left that allows multiple executions, which is
      handled by the next patch.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b6153ead
    • James Bottomley's avatar
      Fix spurious request sense in error handling · 23dd72fe
      James Bottomley authored
      commit d555a2ab upstream.
      
      We unconditionally execute scsi_eh_get_sense() to make sure all failed
      commands that should have sense attached, do.  However, the routine forgets
      that some commands, because of the way they fail, will not have any sense code
      ... we should not bother them with a REQUEST_SENSE command.  Fix this by
      testing to see if we actually got a CHECK_CONDITION return and skip asking for
      sense if we don't.
      Tested-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      23dd72fe
    • Mikulas Patocka's avatar
      dm crypt: fix access beyond the end of allocated space · be73621b
      Mikulas Patocka authored
      commit d49ec52f upstream.
      
      The DM crypt target accesses memory beyond allocated space resulting in
      a crash on 32 bit x86 systems.
      
      This bug is very old (it dates back to 2.6.25 commit 3a7f6c99 "dm
      crypt: use async crypto").  However, this bug was masked by the fact
      that kmalloc rounds the size up to the next power of two.  This bug
      wasn't exposed until 3.17-rc1 commit 298a9fa0 ("dm crypt: use per-bio
      data").  By switching to using per-bio data there was no longer any
      padding beyond the end of a dm-crypt allocated memory block.
      
      To minimize allocation overhead dm-crypt puts several structures into one
      block allocated with kmalloc.  The block holds struct ablkcipher_request,
      cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
      struct dm_crypt_request and an initialization vector.
      
      The variable dmreq_start is set to offset of struct dm_crypt_request
      within this memory block.  dm-crypt allocates the block with this size:
      cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.
      
      When accessing the initialization vector, dm-crypt uses the function
      iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
      + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).
      
      dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
      structure.  However, when dm-crypt accesses the initialization vector, it
      takes a pointer to the end of dm_crypt_request, aligns it, and then uses
      it as the initialization vector.  If the end of dm_crypt_request is not
      aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
      alignment causes the initialization vector to point beyond the allocated
      space.
      
      Fix this bug by calculating the variable iv_size_padding and adding it
      to the allocated size.
      
      Also correct the alignment of dm_crypt_request.  struct dm_crypt_request
      is specific to dm-crypt (it isn't used by the crypto subsystem at all),
      so it is aligned on __alignof__(struct dm_crypt_request).
      
      Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is
      aligned as if the block was allocated with kmalloc.
      Reported-by: default avatarKrzysztof Kolasa <kkolasa@winsoft.pl>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [lizf: Backported by Mikulas]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      be73621b
    • Eric W. Biederman's avatar
      mnt: Only change user settable mount flags in remount · b47d65db
      Eric W. Biederman authored
      commit a6138db8 upstream.
      
      Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
      read-only bind mount read-only in a user namespace the
      MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
      to the remount a read-only mount read-write.
      
      Correct this by replacing the mask of mount flags to preserve
      with a mask of mount flags that may be changed, and preserve
      all others.   This ensures that any future bugs with this mask and
      remount will fail in an easy to detect way where new mount flags
      simply won't change.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Francis Moreau <francis.moro@gmail.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b47d65db
    • Felipe Balbi's avatar
      bluetooth: hci_ldisc: fix deadlock condition · ae552f6b
      Felipe Balbi authored
      commit da64c27d upstream.
      
      LDISCs shouldn't call tty->ops->write() from within
      ->write_wakeup().
      
      ->write_wakeup() is called with port lock taken and
      IRQs disabled, tty->ops->write() will try to acquire
      the same port lock and we will deadlock.
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Reviewed-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Reported-by: default avatarHuang Shijie <b32955@freescale.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Tested-by: default avatarAndreas Bießmann <andreas@biessmann.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [tim.niemeyer@corscience.de: rebased on 3.4.103]
      Signed-off-by: default avatarTim Niemeyer <tim.niemeyer@corscience.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ae552f6b
    • Pawel Moll's avatar
      perf: Handle compat ioctl · 2f1eef27
      Pawel Moll authored
      commit b3f20785 upstream.
      
      When running a 32-bit userspace on a 64-bit kernel (eg. i386
      application on x86_64 kernel or 32-bit arm userspace on arm64
      kernel) some of the perf ioctls must be treated with special
      care, as they have a pointer size encoded in the command.
      
      For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
      as 0x80042407, but 64-bit kernel will expect 0x80082407. In
      result the ioctl will fail returning -ENOTTY.
      
      This patch solves the problem by adding code fixing up the
      size as compat_ioctl file operation.
      Reported-by: default avatarDrew Richardson <drew.richardson@arm.com>
      Signed-off-by: default avatarPawel Moll <pawel.moll@arm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      [lizf: Backported to 3.4 by David Ahern]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2f1eef27
    • Sergio Gelato's avatar
      NFS: fix stable regression · 69724a60
      Sergio Gelato authored
      BugLink: http://bugs.launchpad.net/bugs/1348670
      
      Fix regression introduced in pre-3.14 kernels by cherry-picking
      aa07c713
      (NFSD: Call ->set_acl with a NULL ACL structure if no entries).
      
      The affected code was removed in 3.14 by commit
      4ac7249e
      (nfsd: use get_acl and ->set_acl).
      The ->set_acl methods are already able to cope with a NULL argument.
      Signed-off-by: default avatarSergio Gelato <Sergio.Gelato@astro.su.se>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      69724a60
    • Theodore Ts'o's avatar
      ext4: avoid trying to kfree an ERR_PTR pointer · efdbbff6
      Theodore Ts'o authored
      commit a9cfcd63 upstream.
      
      Thanks to Dan Carpenter for extending smatch to find bugs like this.
      (This was found using a development version of smatch.)
      
      Fixes: 36de9286
      Reported-by: Dan Carpenter <dan.carpenter@oracle.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
      - s/new.bh/new_bh/
      - drop the change to ext4_cross_rename()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      efdbbff6
    • Theodore Ts'o's avatar
      ext4: propagate errors up to ext4_find_entry()'s callers · df22b9eb
      Theodore Ts'o authored
      commit 36de9286 upstream.
      
      If we run into some kind of error, such as ENOMEM, while calling
      ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error
      gets propagated up to ext4_find_entry() and then to its callers.  This
      way, transient errors such as ENOMEM can get propagated to the VFS.
      This is important so that the system calls return the appropriate
      error, and also so that in the case of ext4_lookup(), we return an
      error instead of a NULL inode, since that will result in a negative
      dentry cache entry that will stick around long past the OOM condition
      which caused a transient ENOMEM error.
      
      Google-Bug-Id: #17142205
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
      - adjust context
      - s/old.bh/old_bh/g
      - s/new.bh/new_bh/g
      - drop the changes to ext4_find_delete_entry() and ext4_cross_rename()
      - add return value check for one more exr4_find_entry() in ext4_rename()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      df22b9eb
    • Johannes Berg's avatar
      nl80211: clear skb cb before passing to netlink · 50d6b91a
      Johannes Berg authored
      commit bd8c78e7 upstream.
      
      In testmode and vendor command reply/event SKBs we use the
      skb cb data to store nl80211 parameters between allocation
      and sending. This causes the code for CONFIG_NETLINK_MMAP
      to get confused, because it takes ownership of the skb cb
      data when the SKB is handed off to netlink, and it doesn't
      explicitly clear it.
      
      Clear the skb cb explicitly when we're done and before it
      gets passed to netlink to avoid this issue.
      Reported-by: default avatarAssaf Azulay <assaf.azulay@intel.com>
      Reported-by: default avatarDavid Spinadel <david.spinadel@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      50d6b91a
    • Jens Axboe's avatar
      genhd: fix leftover might_sleep() in blk_free_devt() · 62db31b5
      Jens Axboe authored
      commit 46f341ff upstream.
      
      Commit 2da78092 changed the locking from a mutex to a spinlock,
      so we now longer sleep in this context. But there was a leftover
      might_sleep() in there, which now triggers since we do the final
      free from an RCU callback. Get rid of it.
      Reported-by: default avatarPontus Fuchs <pontus.fuchs@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      62db31b5
    • Josh Triplett's avatar
      init/Kconfig: Hide printk log config if CONFIG_PRINTK=n · 48dd9ce7
      Josh Triplett authored
      commit 361e9dfb upstream.
      
      The buffers sized by CONFIG_LOG_BUF_SHIFT and
      CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't
      ask about their size at all.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      [lizf: Backported to 3.4:
       - drop the change to CONFIG_LOG_CPU_MAX_BUF_SHIFT as it doesn't exist in 3.4]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      48dd9ce7
    • Peter Zijlstra's avatar
      perf: fix perf bug in fork() · 232beb6e
      Peter Zijlstra authored
      commit 6c72e350 upstream.
      
      Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
      calling perf_event_free_task() when failing sched_fork() we will not yet
      have done the memset() on ->perf_event_ctxp[] and will therefore try and
      'free' the inherited contexts, which are still in use by the parent
      process.  This is bad..
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      232beb6e
    • Mel Gorman's avatar
      mm: migrate: Close race between migration completion and mprotect · ea38cd41
      Mel Gorman authored
      commit d3cb8bf6 upstream.
      
      A migration entry is marked as write if pte_write was true at the time the
      entry was created. The VMA protections are not double checked when migration
      entries are being removed as mprotect marks write-migration-entries as
      read. It means that potentially we take a spurious fault to mark PTEs write
      again but it's straight-forward. However, there is a race between write
      migrations being marked read and migrations finishing. This potentially
      allows a PTE to be write that should have been read. Close this race by
      double checking the VMA permissions using maybe_mkwrite when migration
      completes.
      
      [torvalds@linux-foundation.org: use maybe_mkwrite]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ea38cd41
    • Xiubo Li's avatar
      ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error. · e5b74135
      Xiubo Li authored
      commit 6596aa04 upstream.
      
      Since we cannot make sure the 'params->num_regs' will always be none
      zero here, and then if it equals to zero, the kmemdup() will return
      ZERO_SIZE_PTR, which equals to ((void *)16).
      
      So this patch fix this with just doing the zero check before calling
      kmemdup().
      Signed-off-by: default avatarXiubo Li <Li.Xiubo@freescale.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e5b74135
    • Robin Murphy's avatar
      ARM: 8165/1: alignment: don't break misaligned NEON load/store · 3efcfa54
      Robin Murphy authored
      commit 5ca918e5 upstream.
      
      The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn
      instructions (where the optional alignment hint is given but incorrect)
      as LDR/STR, leading to register corruption. Detect these and correctly
      treat them as unhandled, so that userspace gets the fault it expects.
      Reported-by: default avatarSimon Hosie <simon.hosie@arm.com>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3efcfa54
    • Miklos Szeredi's avatar
      shmem: fix nlink for rename overwrite directory · 8c3c6b9e
      Miklos Szeredi authored
      commit b928095b upstream.
      
      If overwriting an empty directory with rename, then need to drop the extra
      nlink.
      
      Test prog:
      
      #include <stdio.h>
      #include <fcntl.h>
      #include <err.h>
      #include <sys/stat.h>
      
      int main(void)
      {
      	const char *test_dir1 = "test-dir1";
      	const char *test_dir2 = "test-dir2";
      	int res;
      	int fd;
      	struct stat statbuf;
      
      	res = mkdir(test_dir1, 0777);
      	if (res == -1)
      		err(1, "mkdir(\"%s\")", test_dir1);
      
      	res = mkdir(test_dir2, 0777);
      	if (res == -1)
      		err(1, "mkdir(\"%s\")", test_dir2);
      
      	fd = open(test_dir2, O_RDONLY);
      	if (fd == -1)
      		err(1, "open(\"%s\")", test_dir2);
      
      	res = rename(test_dir1, test_dir2);
      	if (res == -1)
      		err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);
      
      	res = fstat(fd, &statbuf);
      	if (res == -1)
      		err(1, "fstat(%i)", fd);
      
      	if (statbuf.st_nlink != 0) {
      		fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
      		return 1;
      	}
      
      	return 0;
      }
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8c3c6b9e
    • Joseph Qi's avatar
      ocfs2/dlm: do not get resource spinlock if lockres is new · 8b467510
      Joseph Qi authored
      commit 5760a97c upstream.
      
      There is a deadlock case which reported by Guozhonghua:
        https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
      
      This case is caused by &res->spinlock and &dlm->master_lock
      misordering in different threads.
      
      It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap
      helpers").  Since lockres is new, it doesn't not require the
      &res->spinlock.  So remove it.
      
      Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reported-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8b467510
    • Andreas Rohner's avatar
      nilfs2: fix data loss with mmap() · 78d8eefd
      Andreas Rohner authored
      commit 56d7acc7 upstream.
      
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      78d8eefd
    • Markos Chandras's avatar
      MIPS: mcount: Adjust stack pointer for static trace in MIPS32 · 504c4611
      Markos Chandras authored
      commit 8a574cfa upstream.
      
      Every mcount() call in the MIPS 32-bit kernel is done as follows:
      
      [...]
      move at, ra
      jal _mcount
      addiu sp, sp, -8
      [...]
      
      but upon returning from the mcount() function, the stack pointer
      is not adjusted properly. This is explained in details in 58b69401
      (MIPS: Function tracer: Fix broken function tracing).
      
      Commit ad8c3969 ("MIPS: Unbreak function tracer for 64-bit kernel.)
      fixed the stack manipulation for 64-bit but it didn't fix it completely
      for MIPS32.
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/7792/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      504c4611
    • Zefan Li's avatar
      cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags · 4fae6cca
      Zefan Li authored
      commit 2ad654bc upstream.
      
      When we change cpuset.memory_spread_{page,slab}, cpuset will flip
      PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
      This should be done using atomic bitops, but currently we don't,
      which is broken.
      
      Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
      when one thread tried to clear PF_USED_MATH while at the same time another
      thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
      the same task.
      
      Here's the full report:
      https://lkml.org/lkml/2014/9/19/230
      
      To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
      
      v4:
      - updated mm/slab.c. (Fengguang Wu)
      - updated Documentation.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      [lizf: Backported to 3.4:
       - adjust context
       - check current->flags & PF_MEMPOLICY rather than current->mempolicy]
      4fae6cca
    • Zefan Li's avatar
      sched: add macros to define bitops for task atomic flags · 699c06b3
      Zefan Li authored
      commit e0e5070b upstream.
      
      This will simplify code when we add new flags.
      
      v3:
      - Kees pointed out that no_new_privs should never be cleared, so we
      shouldn't define task_clear_no_new_privs(). we define 3 macros instead
      of a single one.
      
      v2:
      - updated scripts/tags.sh, suggested by Peter
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      [lizf: Backported to 3.4:
       - adjust context
       - remove no_new_priv code
       - add atomic_flags to struct task_struct]
      699c06b3
    • Wanpeng Li's avatar
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · f4d8504c
      Wanpeng Li authored
      commit 03bd4e1f upstream.
      
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: default avatarLinn Crosetto <linn@hp.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarToshi Kani <toshi.kani@hp.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f4d8504c
    • John David Anglin's avatar
      parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds · 6a77b9b1
      John David Anglin authored
      commit d26a7730 upstream.
      
      In spite of what the GCC manual says, the -mfast-indirect-calls has
      never been supported in the 64-bit parisc compiler. Indirect calls have
      always been done using function descriptors irrespective of the
      -mfast-indirect-calls option.
      
      Recently, it was noticed that a function descriptor was always requested
      when the -mfast-indirect-calls option was specified. This caused
      problems when the option was used in  application code and doesn't make
      any sense because the whole point of the option is to avoid using a
      function descriptor for indirect calls.
      
      Fixing this broke 64-bit kernel builds.
      
      I will fix GCC but for now we need the attached change. This results in
      the same kernel code as before.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6a77b9b1
    • Anton Altaparmakov's avatar
      Fix nasty 32-bit overflow bug in buffer i/o code. · 7433a2a7
      Anton Altaparmakov authored
      commit f2d5a944 upstream.
      
      On 32-bit architectures, the legacy buffer_head functions are not always
      handling the sector number with the proper 64-bit types, and will thus
      fail on 4TB+ disks.
      
      Any code that uses __getblk() (and thus bread(), breadahead(),
      sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
      block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
      in __getblk_slow() with an infinite stream of errors logged to dmesg
      like this:
      
        __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
        b_state=0x00000020, b_size=512
        device sda1 blocksize: 512
      
      Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
      top 32-bits are missing (in this case the 0x1 at the top).
      
      This is because grow_dev_page() is broken and has a 32-bit overflow due
      to shifting the page index value (a pgoff_t - which is just 32 bits on
      32-bit architectures) left-shifted as the block number.  But the top
      bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
      before the shift.
      
      This patch fixes this issue by type casting "index" to sector_t before
      doing the left shift.
      
      Note this is not a theoretical bug but has been seen in the field on a
      4TiB hard drive with logical sector size 512 bytes.
      
      This patch has been verified to fix the infinite loop problem on 3.17-rc5
      kernel using a 4TB disk image mounted using "-o loop".  Without this patch
      doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
      100% reproducibly whilst with the patch it works fine as expected.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7433a2a7
    • Clemens Ladisch's avatar
      ALSA: pcm: fix fifo_size frame calculation · d7bbf15e
      Clemens Ladisch authored
      commit a9960e6a upstream.
      
      The calculated frame size was wrong because snd_pcm_format_physical_width()
      actually returns the number of bits, not bytes.
      
      Use snd_pcm_format_size() instead, which not only returns bytes, but also
      simplifies the calculation.
      
      Fixes: 8bea869c ("ALSA: PCM midlevel: improve fifo_size handling")
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d7bbf15e
    • David Dueck's avatar
      can: at91_can: add missing prepare and unprepare of the clock · 8fcf8e69
      David Dueck authored
      commit e77980e5 upstream.
      
      In order to make the driver work with the common clock framework, this patch
      converts the clk_enable()/clk_disable() to
      clk_prepare_enable()/clk_disable_unprepare(). While there, add the missing
      error handling.
      Signed-off-by: default avatarDavid Dueck <davidcdueck@googlemail.com>
      Signed-off-by: default avatarAnthony Harivel <anthony.harivel@emtrion.de>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8fcf8e69
    • Marc Kleine-Budde's avatar
      can: flexcan: put TX mailbox into TX_INACTIVE mode after tx-complete · a125f73d
      Marc Kleine-Budde authored
      commit de594488 upstream.
      
      After sending a RTR frame the TX mailbox becomes a RX_EMPTY mailbox. To avoid
      side effects when the RX-FIFO is full, this patch puts the TX mailbox into
      TX_INACTIVE mode in the transmission complete interrupt handler. This, of
      course, leaves a race window between the actual completion of the transmission
      and the handling of tx-complete interrupt. However this is the best we can do
      without busy polling the tx complete interrupt.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a125f73d
    • David Jander's avatar
      can: flexcan: implement workaround for errata ERR005829 · a6cdbaec
      David Jander authored
      commit 25e92445 upstream.
      
      This patch implements the workaround mentioned in ERR005829:
      
          ERR005829: FlexCAN: FlexCAN does not transmit a message that is enabled to
          be transmitted in a specific moment during the arbitration process.
      
      Workaround: The workaround consists of two extra steps after setting up a
      message for transmission:
      
      Step 8: Reserve the first valid mailbox as an inactive mailbox (CODE=0b1000).
      If RX FIFO is disabled, this mailbox must be message buffer 0. Otherwise, the
      first valid mailbox can be found using the "RX FIFO filters" table in the
      FlexCAN chapter of the chip reference manual.
      
      Step 9: Write twice INACTIVE code (0b1000) into the first valid mailbox.
      Signed-off-by: default avatarDavid Jander <david@protonic.nl>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a6cdbaec
    • David Jander's avatar
      can: flexcan: correctly initialize mailboxes · 12c6d3d6
      David Jander authored
      commit fc05b884 upstream.
      
      Apparently mailboxes may contain random data at startup, causing some of them
      being prepared for message reception. This causes overruns being missed or even
      confusing the IRQ check for trasmitted messages, increasing the transmit
      counter instead of the error counter.
      
      This patch initializes all mailboxes after the FIFO as RX_INACTIVE.
      Signed-off-by: default avatarDavid Jander <david@protonic.nl>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      12c6d3d6
    • Marc Kleine-Budde's avatar
      can: flexcan: mark TX mailbox as TX_INACTIVE · b06c98e5
      Marc Kleine-Budde authored
      commit c32fe4ad upstream.
      
      This patch fixes the initialization of the TX mailbox. It is now correctly
      initialized as TX_INACTIVE not RX_EMPTY.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b06c98e5
    • Mark's avatar
      USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters · c228332a
      Mark authored
      commit c80b4495 upstream.
      
      This patch adds quirks for Entrega Technologies (later Xircom PortGear) USB-
      SCSI converters. They use Shuttle Technology EUSB-01/EUSB-S1 chips. The
      US_FL_SCM_MULT_TARG quirk is needed to allow multiple devices on the SCSI
      chain to be accessed. Without it only the (single) device with SCSI ID 0
      can be used.
      
      The standalone converter sold by Entrega had model number U1-SC25. Xircom
      acquired Entrega and re-branded the product line PortGear. The PortGear USB
      to SCSI Converter (model PGSCSI) is internally identical to the Entrega
      product, but later models may use a different USB ID. The Entrega-branded
      units have USB ID 1645:0007, as does my Xircom PGSCSI, but the Windows and
      Macintosh drivers also support 085A:0028.
      
      Entrega also sold the "Mac USB Dock", which provides two USB ports, a Mac
      (8-pin mini-DIN) serial port and a SCSI port. It appears to the computer as
      a four-port hub, USB-serial, and USB-SCSI converters. The USB-SCSI part may
      have initially used the same ID as the standalone U1-SC25 (1645:0007), but
      later production used 085A:0026.
      
      My Xircom PortGear PGSCSI has bcdDevice=0x0100. Units with bcdDevice=0x0133
      probably also exist.
      
      This patch adds quirks for 1645:0007, 085A:0026 and 085A:0028. The Windows
      driver INF file also mentions 085A:0032 "PortStation SCSI Module", but I
      couldn't find any mention of that actually existing in the wild; perhaps it
      was cancelled before release?
      Signed-off-by: default avatarMark Knibbs <markk@clara.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c228332a