1. 06 Mar, 2013 40 commits
    • Alexey Klimov's avatar
      usb hid quirks for Masterkit MA901 usb radio · 91668663
      Alexey Klimov authored
      commit 0322bd39 upstream.
      
      Don't let Masterkit MA901 USB radio be handled by usb hid drivers.
      This device will be handled by radio-ma901.c driver.
      Signed-off-by: default avatarAlexey Klimov <klimov.linux@gmail.com>
      Acked-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      91668663
    • James Ralston's avatar
      ata_piix: Add Device IDs for Intel Wellsburg PCH · 2a1b9ce3
      James Ralston authored
      commit 3aee8bc5 upstream.
      
      This patch adds the IDE-mode SATA Device IDs for the Intel Wellsburg PCH
      Signed-off-by: default avatarJames Ralston <james.d.ralston@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2a1b9ce3
    • Seth Heasley's avatar
      ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs · 117d2e32
      Seth Heasley authored
      commit aaa51527 upstream.
      
      This patch adds the IDE-mode SATA DeviceIDs for the Intel Avoton SOC.
      Signed-off-by: default avatarSeth Heasley <seth.heasley@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      117d2e32
    • James Ralston's avatar
      ata_piix: Add Device IDs for Intel Lynx Point-LP PCH · d3493114
      James Ralston authored
      commit 389cd784 upstream.
      
      This patch adds the IDE-mode SATA Device IDs for the Intel Lynx Point-LP PCH
      Signed-off-by: default avatarJames Ralston <james.d.ralston@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d3493114
    • Seth Heasley's avatar
      ata_piix: IDE-mode SATA patch for Intel DH89xxCC DeviceIDs · 745544cb
      Seth Heasley authored
      commit 96d5d96a upstream.
      
      This patch adds the IDE-mode SATA DeviceIDs for the Intel DH89xxCC PCH.
      Signed-off-by: default avatarSeth Heasley <seth.heasley@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      745544cb
    • Seth Heasley's avatar
      ata_piix: IDE-mode SATA patch for Intel Lynx Point DeviceIDs · 18e7222c
      Seth Heasley authored
      commit 78140cfe upstream.
      
      This patch adds the IDE-mode SATA DeviceIDs for the Intel Lynx Point PCH.
      Signed-off-by: default avatarSeth Heasley <seth.heasley@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      18e7222c
    • Seiji Aguchi's avatar
      pstore: Avoid deadlock in panic and emergency-restart path · 019c74a9
      Seiji Aguchi authored
      commit 9f244e9c upstream.
      
      [Issue]
      
      When pstore is in panic and emergency-restart paths, it may be blocked
      in those paths because it simply takes spin_lock.
      
      This is an example scenario which pstore may hang up in a panic path:
      
       - cpuA grabs psinfo->buf_lock
       - cpuB panics and calls smp_send_stop
       - smp_send_stop sends IRQ to cpuA
       - after 1 second, cpuB gives up on cpuA and sends an NMI instead
       - cpuA is now in an NMI handler while still holding buf_lock
       - cpuB is deadlocked
      
      This case may happen if a firmware has a bug and
      cpuA is stuck talking with it more than one second.
      
      Also, this is a similar scenario in an emergency-restart path:
      
       - cpuA grabs psinfo->buf_lock and stucks in a firmware
       - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
         And then, cpuB is deadlocked by taking psinfo->buf_lock again.
      
      [Solution]
      
      This patch avoids the deadlocking issues in both panic and emergency_restart
      paths by introducing a function, is_non_blocking_path(), to check if a cpu
      can be blocked in current path.
      
      With this patch, pstore is not blocked even if another cpu has
      taken a spin_lock, in those paths by changing from spin_lock_irqsave
      to spin_trylock_irqsave.
      
      In addition, according to a comment of emergency_restart() in kernel/sys.c,
      spin_lock shouldn't be taken in an emergency_restart path to avoid
      deadlock. This patch fits the comment below.
      
      <snip>
      /**
       *      emergency_restart - reboot the system
       *
       *      Without shutting down any hardware or taking any locks
       *      reboot the system.  This is called when we know we are in
       *      trouble so this is our best effort to reboot.  This is
       *      safe to call in interrupt context.
       */
      void emergency_restart(void)
      <snip>
      Signed-off-by: default avatarSeiji Aguchi <seiji.aguchi@hds.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - Add #include <linux/kmsg_dump.h>]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      019c74a9
    • Ian Abbott's avatar
      staging: comedi: ni_labpc: set up command4 register *after* command3 · 56546c8c
      Ian Abbott authored
      commit 22056e2b upstream.
      
      Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
      meaningful output from a Lab-PC+ in differential mode for AI cmds, but
      AI insn reads gave correct readings.  He tracked it down to two
      problems, one of which is addressed by this patch.
      
      It seems that writing to the command3 register after writing to the
      command4 register in `labpc_ai_cmd()` messes up the differential
      reference bit setting in the command4 register.  Set up the command4
      register after the command3 register (as in `labpc_ai_rinsn()`) to avoid
      the problem.
      
      Thanks to Tuomas for suggesting the fix.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      56546c8c
    • Ian Abbott's avatar
      staging: comedi: ni_labpc: correct differential channel sequence for AI commands · 8e53d294
      Ian Abbott authored
      commit 4c4bc25d upstream.
      
      Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
      meaningful output from a Lab-PC+ in differential mode for AI cmds, but
      AI insn reads gave correct readings.  He tracked it down to two
      problems, one of which is addressed by this patch.
      
      It seems the setting of the channel bits for particular scanning modes
      was incorrect for differential mode.  (Only half the number of channels
      are available in differential mode; comedi refers to them as channels 0,
      1, 2 and 3, but the hardware documentation refers to them as channels 0,
      2, 4 and 6.)  In differential mode, the setting of the channel enable
      bits in the command1 register should depend on whether the scan enable
      bit is set.  Effectively, we need to double the comedi channel number
      when the scan enable bit is not set in differential mode.  The scan
      enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or
      `MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is
      `MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`.  The existing test
      for whether the comedi channel number needs to be doubled in
      differential mode is incorrect in `labpc_ai_cmd()`.  This patch corrects
      the test.
      
      Thanks to Tuomas for suggesting the fix.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8e53d294
    • Eric Dumazet's avatar
      ipv6: use a stronger hash for tcp · 0fd0ff7e
      Eric Dumazet authored
      [ Upstream commit 08dcdbf6 ]
      
      It looks like its possible to open thousands of TCP IPv6
      sessions on a server, all landing in a single slot of TCP hash
      table. Incoming packets have to lookup sockets in a very
      long list.
      
      We should hash all bits from foreign IPv6 addresses, using
      a salt and hash mix, not a simple XOR.
      
      inet6_ehashfn() can also separately use the ports, instead
      of xoring them.
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0fd0ff7e
    • Li Wei's avatar
      ipv4: fix a bug in ping_err(). · 52430c06
      Li Wei authored
      [ Upstream commit b531ed61 ]
      
      We should get 'type' and 'code' from the outer ICMP header.
      Signed-off-by: default avatarLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      52430c06
    • David Vrabel's avatar
      xen-netback: cancel the credit timer when taking the vif down · db3521fa
      David Vrabel authored
      [ Upstream commit 3e55f8b3 ]
      
      If the credit timer is left armed after calling
      xen_netbk_remove_xenvif(), then it may fire and attempt to schedule
      the vif which will then oops as vif->netbk == NULL.
      
      This may happen both in the fatal error path and during normal
      disconnection from the front end.
      
      The sequencing during shutdown is critical to ensure that: a)
      vif->netbk doesn't become unexpectedly NULL; and b) the net device/vif
      is not freed.
      
      1. Mark as unschedulable (netif_carrier_off()).
      2. Synchronously cancel the timer.
      3. Remove the vif from the schedule list.
      4. Remove it from it netback thread group.
      5. Wait for vif->refcnt to become 0.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Reported-by: default avatarChristopher S. Aker <caker@theshore.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db3521fa
    • David Vrabel's avatar
      xen-netback: correctly return errors from netbk_count_requests() · aa067cfc
      David Vrabel authored
      [ Upstream commit 35876b5f ]
      
      netbk_count_requests() could detect an error, call
      netbk_fatal_tx_error() but return 0.  The vif may then be used
      afterwards (e.g., in a call to netbk_tx_error().
      
      Since netbk_fatal_tx_error() could set vif->refcnt to 1, the vif may
      be freed immediately after the call to netbk_fatal_tx_error() (e.g.,
      if the vif is also removed).
      
      Netback thread              Xenwatch thread
      -------------------------------------------
      netbk_fatal_tx_err()        netback_remove()
                                    xenvif_disconnect()
                                      ...
                                      free_netdev()
      netbk_tx_err() Oops!
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: default avatarChristopher S. Aker <caker@theshore.net>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      aa067cfc
    • Stephen Hemminger's avatar
      bridge: set priority of STP packets · 1642974c
      Stephen Hemminger authored
      [ Upstream commit 547b4e71 ]
      
      Spanning Tree Protocol packets should have always been marked as
      control packets, this causes them to get queued in the high prirority
      FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
      gets overloaded and can't communicate. This is a long-standing bug back
      to the first versions of Linux bridge.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1642974c
    • Jan Beulich's avatar
      xen-pciback: rate limit error messages from xen_pcibk_enable_msi{,x}() · ecb1d58c
      Jan Beulich authored
      commit 51ac8893 upstream.
      
      ... as being guest triggerable (e.g. by invoking
      XEN_PCI_OP_enable_msi{,x} on a device not being MSI/MSI-X capable).
      
      This is CVE-2013-0231 / XSA-43.
      
      Also make the two messages uniform in both their wording and severity.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: add #include <linux/ratelimited.h>, needed by
       printk_ratelimited()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ecb1d58c
    • Helge Deller's avatar
      unbreak automounter support on 64-bit kernel with 32-bit userspace (v2) · f20a819c
      Helge Deller authored
      commit 4f4ffc3a upstream.
      
      automount-support is broken on the parisc architecture, because the existing
      #if list does not include a check for defined(__hppa__). The HPPA (parisc)
      architecture is similiar to other 64bit Linux targets where we have to define
      autofs_wqt_t (which is passed back and forth to user space) as int type which
      has a size of 32bit across 32 and 64bit kernels.
      
      During the discussion on the mailing list, H. Peter Anvin suggested to invert
      the #if list since only specific platforms (specifically those who do not have
      a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned
      long type.
      
      This suggestion is probably the best way to go, since Arm64 (and maybe others?)
      seems to have a non-working automounter. So in the long run even for other new
      upcoming architectures this inverted check seem to be the best solution, since
      it will not require them to change this #if again (unless they are 64bit only).
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Acked-by: default avatarIan Kent <raven@themaw.net>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      CC: James Bottomley <James.Bottomley@HansenPartnership.com>
      CC: Rolf Eike Beer <eike-kernel@sf-tec.de>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f20a819c
    • Heiko Carstens's avatar
      s390/timer: avoid overflow when programming clock comparator · 903fba3b
      Heiko Carstens authored
      commit d911e03d upstream.
      
      Since ed4f2094 "s390/time: fix sched_clock() overflow" a new helper function
      is used to avoid overflows when converting TOD format values to nanosecond
      values.
      The kvm interrupt code formerly however only worked by accident because of
      an overflow. It tried to program a timer that would expire in more than ~29
      years. Because of the old TOD-to-nanoseconds overflow bug the real expiry
      value however was much smaller, but now it isn't anymore.
      This however triggers yet another bug in the function that programs the clock
      comparator s390_next_ktime(): if the absolute "expires" value is after 2042
      this will result in an overflow and the programmed value is lower than the
      current TOD value which immediatly triggers a clock comparator (= timer)
      interrupt.
      Since the timer isn't expired it will be programmed immediately again and so
      on... the result is a dead system.
      To fix this simply program the maximum possible value if an overflow is
      detected.
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      903fba3b
    • Alex Deucher's avatar
      drm/radeon/evergreen+: wait for the MC to settle after MC blackout · 39ca2020
      Alex Deucher authored
      commit ed39fadd upstream.
      
      Some chips seem to need a little delay after blacking out
      the MC before the requests actually stop.
      
      May fix:
      https://bugs.freedesktop.org/show_bug.cgi?id=56139
      https://bugs.freedesktop.org/show_bug.cgi?id=57567Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      39ca2020
    • Alexander Duyck's avatar
      igb: Remove artificial restriction on RQDPC stat reading · 74d789d8
      Alexander Duyck authored
      commit ae1c07a6 upstream.
      
      For some reason the reading of the RQDPC register was being artificially
      limited to 4K.  Instead of limiting the value we should read the value and
      add the full amount.  Otherwise this can lead to a misleading number of
      dropped packets when the actual value is in fact much higher.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarJeff Pieper   <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      74d789d8
    • Paolo Bonzini's avatar
      nbd: fsync and kill block device on shutdown · 4189aa4c
      Paolo Bonzini authored
      commit 3a2d63f8 upstream.
      
      There are two problems with shutdown in the NBD driver.
      
      1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.
      
         This patch adds the sync operation into __nbd_ioctl()'s
         NBD_DISCONNECT handler.  This is useful because BLKFLSBUF is restricted
         to processes that have CAP_SYS_ADMIN, and the NBD client may not
         possess it (fsync of the block device does not sync the filesystem,
         either).
      
      2: Once we clear the socket we have no guarantee that later reads will
         come from the same backing storage.
      
         The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
         clearing code so the page cache is cleaned, lest reads that hit on the
         page cache will return stale data from the previously-accessible disk.
      
      Example:
      
          # qemu-nbd -r -c/dev/nbd0 /dev/sr0
          # file -s /dev/nbd0
          /dev/stdin: # UDF filesystem data (version 1.5) etc.
          # qemu-nbd -d /dev/nbd0
          # qemu-nbd -r -c/dev/nbd0 /dev/sda
          # file -s /dev/nbd0
          /dev/stdin: # UDF filesystem data (version 1.5) etc.
      
      While /dev/sda has:
      
          # file -s /dev/sda
          /dev/sda: x86 boot sector; etc.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarPaul Clements <Paul.Clements@steeleye.com>
      Cc: Alex Bligh <alex@alex.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Adjusted context
       - s/\bnbd\b/lo/
       - Incorporate export of kill_bdev() from commit ff01bb48
         ('fs: move code out of buffer.c')]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4189aa4c
    • Xi Wang's avatar
      sysctl: fix null checking in bin_dn_node_address() · 0ba15a92
      Xi Wang authored
      commit df1778be upstream.
      
      The null check of `strchr() + 1' is broken, which is always non-null,
      leading to OOB read.  Instead, check the result of strchr().
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ba15a92
    • Tejun Heo's avatar
      idr: fix top layer handling · a2a9af6c
      Tejun Heo authored
      commit 326cf0f0 upstream.
      
      Most functions in idr fail to deal with the high bits when the idr
      tree grows to the maximum height.
      
      * idr_get_empty_slot() stops growing idr tree once the depth reaches
        MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
        cover the whole range.  The function doesn't even notice that it
        didn't grow the tree enough and ends up allocating the wrong ID
        given sufficiently high @starting_id.
      
        For example, on 64 bit, if the starting id is 0x7fffff01,
        idr_get_empty_slot() will grow the tree 5 layer deep, which only
        covers the 30 bits and then proceed to allocate as if the bit 30
        wasn't specified.  It ends up allocating 0x3fffff01 without the bit
        30 but still returns 0x7fffff01.
      
      * __idr_remove_all() will not remove anything if the tree is fully
        grown.
      
      * idr_find() can't find anything if the tree is fully grown.
      
      * idr_for_each() and idr_get_next() can't iterate anything if the tree
        is fully grown.
      
      Fix it by introducing idr_max() which returns the maximum possible ID
      given the depth of tree and replacing the id limit checks in all
      affected places.
      
      As the idr_layer pointer array pa[] needs to be 1 larger than the
      maximum depth, enlarge pa[] arrays by one.
      
      While this plugs the discovered issues, the whole code base is
      horrible and in desparate need of rewrite.  It's fragile like hell,
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Adjust context
       - s/MAX_IDR_LEVEL/MAX_LEVEL/; s/MAX_IDR_SHIFT/MAX_ID_SHIFT/
       - Drop change to idr_alloc()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a2a9af6c
    • Hugh Dickins's avatar
      idr: make idr_get_next() good for rcu_read_lock() · 6195dff3
      Hugh Dickins authored
      commit 9f7de827 upstream.
      
      Make one small adjustment to idr_get_next(): take the height from the top
      layer (stable under RCU) instead of from the root (unprotected by RCU), as
      idr_find() does: so that it can be used with RCU locking.  Copied comment
      on RCU locking from idr_find().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6195dff3
    • Tejun Heo's avatar
      firewire: add minor number range check to fw_device_init() · 507b2ac9
      Tejun Heo authored
      commit 3bec60d5 upstream.
      
      fw_device_init() didn't check whether the allocated minor number isn't
      too large.  Fail if it goes overflows MINORBITS.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      507b2ac9
    • Tejun Heo's avatar
      block: fix synchronization and limit check in blk_alloc_devt() · a51db52c
      Tejun Heo authored
      commit ce23bba8 upstream.
      
      idr allocation in blk_alloc_devt() wasn't synchronized against lookup
      and removal, and its limit check was off by one - 1 << MINORBITS is
      the number of minors allowed, not the maximum allowed minor.
      
      Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit
      checking.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a51db52c
    • Tejun Heo's avatar
      idr: fix a subtle bug in idr_get_next() · 18536d61
      Tejun Heo authored
      commit 6cdae741 upstream.
      
      The iteration logic of idr_get_next() is borrowed mostly verbatim from
      idr_for_each().  It walks down the tree looking for the slot matching
      the current ID.  If the matching slot is not found, the ID is
      incremented by the distance of single slot at the given level and
      repeats.
      
      The implementation assumes that during the whole iteration id is aligned
      to the layer boundaries of the level closest to the leaf, which is true
      for all iterations starting from zero or an existing element and thus is
      fine for idr_for_each().
      
      However, idr_get_next() may be given any point and if the starting id
      hits in the middle of a non-existent layer, increment to the next layer
      will end up skipping the same offset into it.  For example, an IDR with
      IDs filled between [64, 127] would look like the following.
      
                [  0  64 ... ]
             /----/   |
             |        |
            NULL    [ 64 ... 127 ]
      
      If idr_get_next() is called with 63 as the starting point, it will try
      to follow down the pointer from 0.  As it is NULL, it will then try to
      proceed to the next slot in the same level by adding the slot distance
      at that level which is 64 - making the next try 127.  It goes around the
      loop and finds and returns 127 skipping [64, 126].
      
      Note that this bug also triggers in idr_for_each_entry() loop which
      deletes during iteration as deletions can make layers go away leaving
      the iteration with unaligned ID into missing layers.
      
      Fix it by ensuring proceeding to the next slot doesn't carry over the
      unaligned offset - ie.  use round_up(id + 1, slot_distance) instead of
      id += slot_distance.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDavid Teigland <teigland@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      18536d61
    • Tomas Henzl's avatar
      block: fix ext_devt_idr handling · 2abb7d3a
      Tomas Henzl authored
      commit 7b74e912 upstream.
      
      While adding and removing a lot of disks disks and partitions this
      sometimes shows up:
      
        WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
        Hardware name:
        sysfs: cannot create duplicate filename '/dev/block/259:751'
        Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
        Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
        Call Trace:
          warn_slowpath_common+0x87/0xc0
          warn_slowpath_fmt+0x46/0x50
          sysfs_add_one+0xc9/0x130
          sysfs_do_create_link+0x12b/0x170
          sysfs_create_link+0x13/0x20
          device_add+0x317/0x650
          idr_get_new+0x13/0x50
          add_partition+0x21c/0x390
          rescan_partitions+0x32b/0x470
          sd_open+0x81/0x1f0 [sd_mod]
          __blkdev_get+0x1b6/0x3c0
          blkdev_get+0x10/0x20
          register_disk+0x155/0x170
          add_disk+0xa6/0x160
          sd_probe_async+0x13b/0x210 [sd_mod]
          add_wait_queue+0x46/0x60
          async_thread+0x102/0x250
          default_wake_function+0x0/0x20
          async_thread+0x0/0x250
          kthread+0x96/0xa0
          child_rip+0xa/0x20
          kthread+0x0/0xa0
          child_rip+0x0/0x20
      
      This most likely happens because dev_t is freed while the number is
      still used and idr_get_new() is not protected on every use.  The fix
      adds a mutex where it wasn't before and moves the dev_t free function so
      it is called after device del.
      Signed-off-by: default avatarTomas Henzl <thenzl@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2abb7d3a
    • Xiaowei.Hu's avatar
      ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink · 0ebab908
      Xiaowei.Hu authored
      commit 309a85b6 upstream.
      
      ocfs2_block_group_alloc_discontig() disables chain relink by setting
      ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple
      cluster groups.
      
      It doesn't keep the credits for all chain relink,but
      ocfs2_claim_suballoc_bits overrides this in this call trace:
      ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()->
      __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits()
      ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call
      ocfs2_search_chain() one time and disable it again, and then we run out
      of credits.
      
      Fix is to allow relink by default and disable it in
      ocfs2_block_group_alloc_discontig.
      
      Without this patch, End-users will run into a crash due to run out of
      credits, backtrace like this:
      
        RIP: 0010:[<ffffffffa0808b14>]  [<ffffffffa0808b14>]
        jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
        RSP: 0018:ffff8801b919b5b8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
        RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
        RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
        R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
        FS:  00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
        Call Trace:
          ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
          ocfs2_relink_block_group+0x111/0x480 [ocfs2]
          ocfs2_search_chain+0x455/0x9a0 [ocfs2]
          ...
      Signed-off-by: default avatarXiaowei.Hu <xiaowei.hu@oracle.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ebab908
    • Jeff Liu's avatar
      ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly · 98e4a531
      Jeff Liu authored
      commit 32918dd9 upstream.
      
      We need to re-initialize the security for a new reflinked inode with its
      parent dirs if it isn't specified to be preserved for ocfs2_reflink().
      However, the code logic is broken at ocfs2_init_security_and_acl()
      although ocfs2_init_security_get() succeed.  As a result,
      ocfs2_acl_init() does not involked and therefore the default ACL of
      parent dir was missing on the new inode.
      
      Note this was introduced by 9d8f13ba ("security: new
      security_inode_init_security API adds function callback")
      
      To reproduce:
      
          set default ACL for the parent dir(ocfs2 in this case):
          $ setfacl -m default:user:jeff:rwx ../ocfs2/
          $ getfacl ../ocfs2/
          # file: ../ocfs2/
          # owner: jeff
          # group: jeff
          user::rwx
          group::r-x
          other::r-x
          default:user::rwx
          default:user:jeff:rwx
          default:group::r-x
          default:mask::rwx
          default:other::r-x
      
          $ touch a
          $ getfacl a
          # file: a
          # owner: jeff
          # group: jeff
          user::rw-
          group::rw-
          other::r--
      
      Before patching, create reflink file b from a, the user
      default ACL entry(user:jeff:rwx)was missing:
      
          $ ./ocfs2_reflink a b
          $ getfacl b
          # file: b
          # owner: jeff
          # group: jeff
          user::rw-
          group::rw-
          other::r--
      
      In this case, the end user can also observed an error message at syslog:
      
        (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0
      
      After applying this patch, create reflink file c from a:
      
          $ ./ocfs2_reflink a c
          $ getfacl c
          # file: c
          # owner: jeff
          # group: jeff
          user::rw-
          user:jeff:rwx			#effective:rw-
          group::r-x			#effective:r--
          mask::rw-
          other::r--
      
      Test program:
      /* Usage: reflink <source> <dest> */
      #include <stdio.h>
      #include <stdint.h>
      #include <stdbool.h>
      #include <string.h>
      #include <errno.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <sys/ioctl.h>
      
      static int
      reflink_file(char const *src_name, char const *dst_name,
      	     bool preserve_attrs)
      {
      	int fd;
      
      #ifndef REFLINK_ATTR_NONE
      #  define REFLINK_ATTR_NONE 0
      #endif
      #ifndef REFLINK_ATTR_PRESERVE
      #  define REFLINK_ATTR_PRESERVE 1
      #endif
      #ifndef OCFS2_IOC_REFLINK
      	struct reflink_arguments {
      		uint64_t old_path;
      		uint64_t new_path;
      		uint64_t preserve;
      	};
      
      #  define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments)
      #endif
      	struct reflink_arguments args = {
      		.old_path = (unsigned long) src_name,
      		.new_path = (unsigned long) dst_name,
      		.preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE :
      					     REFLINK_ATTR_NONE,
      	};
      
      	fd = open(src_name, O_RDONLY);
      	if (fd < 0) {
      		fprintf(stderr, "Failed to open %s: %s\n",
      			src_name, strerror(errno));
      		return -1;
      	}
      
      	if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) {
      		fprintf(stderr, "Failed to reflink %s to %s: %s\n",
      			src_name, dst_name, strerror(errno));
      		return -1;
      	}
      }
      
      int
      main(int argc, char *argv[])
      {
      	if (argc != 3) {
      		fprintf(stdout, "Usage: %s source dest\n", argv[0]);
      		return 1;
      	}
      
      	return reflink_file(argv[1], argv[2], 0);
      }
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarTao Ma <boyu.mt@taobao.com>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      98e4a531
    • H. Peter Anvin's avatar
      x86: Make sure we can boot in the case the BDA contains pure garbage · c2581d3d
      H. Peter Anvin authored
      commit 7c100936 upstream.
      
      On non-BIOS platforms it is possible that the BIOS data area contains
      garbage instead of being zeroed or something equivalent (firmware
      people: we are talking of 1.5K here, so please do the sane thing.)
      
      We need on the order of 20-30K of low memory in order to boot, which
      may grow up to < 64K in the future.  We probably want to avoid the
      lowest of the low memory.  At the same time, it seems extremely
      unlikely that a legitimate EBDA would ever reach down to the 128K
      (which would require it to be over half a megabyte in size.)  Thus,
      pick 128K as the cutoff for "this is insane, ignore."  We may still
      end up reserving a bunch of extra memory on the low megabyte, but that
      is not really a major issue these days.  In the worst case we lose
      512K of RAM.
      
      This code really should be merged with trim_bios_range() in
      arch/x86/kernel/setup.c, but that is a bigger patch for a later merge
      window.
      Reported-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.orgSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c2581d3d
    • Jan Kara's avatar
      ocfs2: fix possible use-after-free with AIO · d7763498
      Jan Kara authored
      commit 9b171e0c upstream.
      
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d7763498
    • Konrad Rzeszutek Wilk's avatar
      doc, kernel-parameters: Document 'console=hvc<n>' · 7479d5dd
      Konrad Rzeszutek Wilk authored
      commit a2fd6419 upstream.
      
      Both the PowerPC hypervisor and Xen hypervisor can utilize the
      hvc driver.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.comSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7479d5dd
    • Konrad Rzeszutek Wilk's avatar
      doc, xen: Mention 'earlyprintk=xen' in the documentation. · 56070188
      Konrad Rzeszutek Wilk authored
      commit 2482a92e upstream.
      
      The earlyprintk for Xen PV guests utilizes a simple hypercall
      (console_io) to provide output to Xen emergency console.
      
      Note that the Xen hypervisor should be booted with 'loglevel=all'
      to output said information.
      Reported-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      56070188
    • Shawn Guo's avatar
      mmc: sdhci-esdhc-imx: fix host version read · e3efb0af
      Shawn Guo authored
      commit ef4d0888 upstream.
      
      When commit 95a2482a (mmc: sdhci-esdhc-imx: add basic imx6q usdhc
      support) works around host version issue on imx6q, it gets the
      register address fixup "reg ^= 2" lost for imx25/35/51/53 esdhc.
      Thus, the controller version on these SoCs is wrongly identified
      as v1 while it's actually v2.
      
      Add the address fixup back and take a different approach to correct
      imx6q host version, so that the host version read gets back to work
      for all SoCs.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e3efb0af
    • Greg Thelen's avatar
      tmpfs: fix use-after-free of mempolicy object · 2b82b58d
      Greg Thelen authored
      commit 5f00110f upstream.
      
      The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
      option is not specified in the remount request.  A new policy can be
      specified if mpol=M is given.
      
      Before this patch remounting an mpol bound tmpfs without specifying
      mpol= mount option in the remount request would set the filesystem's
      mempolicy object to a freed mempolicy object.
      
      To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
          # mkdir /tmp/x
      
          # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
      
          # mount -o remount,size=200M nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
              # note ? garbage in mpol=... output above
      
          # dd if=/dev/zero of=/tmp/x/f count=1
              # panic here
      
      Panic:
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          [...]
          Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
          Call Trace:
            mpol_shared_policy_init+0xa5/0x160
            shmem_get_inode+0x209/0x270
            shmem_mknod+0x3e/0xf0
            shmem_create+0x18/0x20
            vfs_create+0xb5/0x130
            do_last+0x9a1/0xea0
            path_openat+0xb3/0x4d0
            do_filp_open+0x42/0xa0
            do_sys_open+0xfe/0x1e0
            compat_sys_open+0x1b/0x20
            cstar_dispatch+0x7/0x1f
      
      Non-debug kernels will not crash immediately because referencing the
      dangling mpol will not cause a fault.  Instead the filesystem will
      reference a freed mempolicy object, which will cause unpredictable
      behavior.
      
      The problem boils down to a dropped mpol reference below if
      shmem_parse_options() does not allocate a new mpol:
      
          config = *sbinfo
          shmem_parse_options(data, &config, true)
          mpol_put(sbinfo->mpol)
          sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */
      
      This patch avoids the crash by not releasing the mempolicy if
      shmem_parse_options() doesn't create a new mpol.
      
      How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
      not look back further.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2b82b58d
    • Mel Gorman's avatar
      mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages · 9fb42c9a
      Mel Gorman authored
      commit 67d46b29 upstream.
      
      Rob van der Heij reported the following (paraphrased) on private mail.
      
      	The scenario is that I want to avoid backups to fill up the page
      	cache and purge stuff that is more likely to be used again (this is
      	with s390x Linux on z/VM, so I don't give it as much memory that
      	we don't care anymore). So I have something with LD_PRELOAD that
      	intercepts the close() call (from tar, in this case) and issues
      	a posix_fadvise() just before closing the file.
      
      	This mostly works, except for small files (less than 14 pages)
      	that remains in page cache after the face.
      
      Unfortunately Rob has not had a chance to test this exact patch but the
      test program below should be reproducing the problem he described.
      
      The issue is the per-cpu pagevecs for LRU additions.  If the pages are
      added by one CPU but fadvise() is called on another then the pages
      remain resident as the invalidate_mapping_pages() only drains the local
      pagevecs via its call to pagevec_release().  The user-visible effect is
      that a program that uses fadvise() properly is not obeyed.
      
      A possible fix for this is to put the necessary smarts into
      invalidate_mapping_pages() to globally drain the LRU pagevecs if a
      pagevec page could not be discarded.  The downside with this is that an
      inode cache shrink would send a global IPI and memory pressure
      potentially causing global IPI storms is very undesirable.
      
      Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
      check if invalidate_mapping_pages() discarded all the requested pages.
      If a subset of pages are discarded it drains the LRU pagevecs and tries
      again.  If the second attempt fails, it assumes it is due to the pages
      being mapped, locked or dirty and does not care.  With this patch, an
      application using fadvise() correctly will be obeyed but there is a
      downside that a malicious application can force the kernel to send
      global IPIs and increase overhead.
      
      If accepted, I would like this to be considered as a -stable candidate.
      It's not an urgent issue but it's a system call that is not working as
      advertised which is weak.
      
      The following test program demonstrates the problem.  It should never
      report that pages are still resident but will without this patch.  It
      assumes that CPU 0 and 1 exist.
      
      int main() {
      	int fd;
      	int pagesize = getpagesize();
      	ssize_t written = 0, expected;
      	char *buf;
      	unsigned char *vec;
      	int resident, i;
      	cpu_set_t set;
      
      	/* Prepare a buffer for writing */
      	expected = FILESIZE_PAGES * pagesize;
      	buf = malloc(expected + 1);
      	if (buf == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      	buf[expected] = 0;
      	memset(buf, 'a', expected);
      
      	/* Prepare the mincore vec */
      	vec = malloc(FILESIZE_PAGES);
      	if (vec == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Bind ourselves to CPU 0 */
      	CPU_ZERO(&set);
      	CPU_SET(0, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* open file, unlink and write buffer */
      	fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
      	if (fd == -1) {
      		perror("open");
      		exit(EXIT_FAILURE);
      	}
      	unlink("fadvise-test-file");
      	while (written < expected) {
      		ssize_t this_write;
      		this_write = write(fd, buf + written, expected - written);
      
      		if (this_write == -1) {
      			perror("write");
      			exit(EXIT_FAILURE);
      		}
      
      		written += this_write;
      	}
      	free(buf);
      
      	/*
      	 * Force ourselves to another CPU. If fadvise only flushes the local
      	 * CPUs pagevecs then the fadvise will fail to discard all file pages
      	 */
      	CPU_ZERO(&set);
      	CPU_SET(1, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* sync and fadvise to discard the page cache */
      	fsync(fd);
      	if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
      		perror("posix_fadvise");
      		exit(EXIT_FAILURE);
      	}
      
      	/* map the file and use mincore to see which parts of it are resident */
      	buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
      	if (buf == NULL) {
      		perror("mmap");
      		exit(EXIT_FAILURE);
      	}
      	if (mincore(buf, expected, vec) == -1) {
      		perror("mincore");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Check residency */
      	for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
      		if (vec[i])
      			resident++;
      	}
      	if (resident != 0) {
      		printf("Nr unexpected pages resident: %d\n", resident);
      		exit(EXIT_FAILURE);
      	}
      
      	munmap(buf, expected);
      	close(fd);
      	free(vec);
      	exit(EXIT_SUCCESS);
      }
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Tested-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9fb42c9a
    • Robin Holt's avatar
      mmu_notifier_unregister NULL Pointer deref and multiple ->release() callouts · 09454abb
      Robin Holt authored
      commit 751efd86 upstream.
      
      There is a race condition between mmu_notifier_unregister() and
      __mmu_notifier_release().
      
      Assume two tasks, one calling mmu_notifier_unregister() as a result of a
      filp_close() ->flush() callout (task A), and the other calling
      mmu_notifier_release() from an mmput() (task B).
      
                      A                               B
      t1                                              srcu_read_lock()
      t2              if (!hlist_unhashed())
      t3                                              srcu_read_unlock()
      t4              srcu_read_lock()
      t5                                              hlist_del_init_rcu()
      t6                                              synchronize_srcu()
      t7              srcu_read_unlock()
      t8              hlist_del_rcu()  <--- NULL pointer deref.
      
      Additionally, the list traversal in __mmu_notifier_release() is not
      protected by the by the mmu_notifier_mm->hlist_lock which can result in
      callouts to the ->release() notifier from both mmu_notifier_unregister()
      and __mmu_notifier_release().
      
      -stable suggestions:
      
      The stable trees prior to 3.7.y need commits 21a92735 and
      70400303 cherry-picked in that order prior to cherry-picking this
      commit.  The 3.7.y tree already has those two commits.
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Sagi Grimberg <sagig@mellanox.co.il>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      09454abb
    • Andrea Arcangeli's avatar
      mm: mmu_notifier: make the mmu_notifier srcu static · 3d2b066c
      Andrea Arcangeli authored
      commit 70400303 upstream.
      
      The variable must be static especially given the variable name.
      
      s/RCU/SRCU/ over a few comments.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3d2b066c
    • Sagi Grimberg's avatar
      mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely schedule · 80da2799
      Sagi Grimberg authored
      commit 21a92735 upstream.
      
      With an RCU based mmu_notifier implementation, any callout to
      mmu_notifier_invalidate_range_{start,end}() or
      mmu_notifier_invalidate_page() would not be allowed to call schedule()
      as that could potentially allow a modification to the mmu_notifier
      structure while it is currently being used.
      
      Since srcu allocs 4 machine words per instance per cpu, we may end up
      with memory exhaustion if we use srcu per mm.  So all mms share a global
      srcu.  Note that during large mmu_notifier activity exit & unregister
      paths might hang for longer periods, but it is tolerable for current
      mmu_notifier clients.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.co.il>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      80da2799
    • Phileas Fogg's avatar
      powerpc/kexec: Disable hard IRQ before kexec · 819a56e8
      Phileas Fogg authored
      commit 8520e443 upstream.
      
      Disable hard IRQ before kexec a new kernel image.
      Not doing it can result in corrupted data in the memory segments
      reserved for the new kernel.
      Signed-off-by: default avatarPhileas Fogg <phileas-fogg@mail.ru>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      819a56e8