1. 18 Sep, 2015 40 commits
    • Naoya Horiguchi's avatar
      mm/memory-failure: call shake_page() when error hits thp tail page · 350b59e3
      Naoya Horiguchi authored
      commit 09789e5d upstream.
      
      Currently memory_failure() calls shake_page() to sweep pages out from
      pcplists only when the victim page is 4kB LRU page or thp head page.
      But we should do this for a thp tail page too.
      
      Consider that a memory error hits a thp tail page whose head page is on
      a pcplist when memory_failure() runs.  Then, the current kernel skips
      shake_pages() part, so hwpoison_user_mappings() returns without calling
      split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
      still cleared due to the skip of shake_page().
      
      As a result, me_huge_page() runs for the thp, which is broken behavior.
      
      One effect is a leak of the thp.  And another is to fail to isolate the
      memory error, so later access to the error address causes another MCE,
      which kills the processes which used the thp.
      
      This patch fixes this problem by calling shake_page() for thp tail case.
      
      Fixes: 385de357 ("thp: allow a hwpoisoned head page to be put back to LRU")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarDean Nelson <dnelson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      350b59e3
    • Boris Ostrovsky's avatar
      xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq() · 95f92efe
      Boris Ostrovsky authored
      commit 16e6bd59 upstream.
      
      .. because bind_evtchn_to_cpu(evtchn, cpu) will map evtchn to
      'info' and pass 'info' down to xen_evtchn_port_bind_to_cpu().
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Tested-by: default avatarAnnie Li <annie.li@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      [lizf: Backported to 3.4: adjust filename and context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      95f92efe
    • Boris Ostrovsky's avatar
      xen/console: Update console event channel on resume · da911401
      Boris Ostrovsky authored
      commit b9d934f2 upstream.
      
      After a resume the hypervisor/tools may change console event
      channel number. We should re-query it.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      da911401
    • Jason Gunthorpe's avatar
      RDMA/CMA: Canonize IPv4 on IPV6 sockets properly · bbb55274
      Jason Gunthorpe authored
      commit 28521440 upstream.
      
      When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to
      canonize the address family to IPv4, but does not properly process
      the listening sockaddr to get the listening port, and does not properly
      set the address family of the canonized sockaddr.
      
      Fixes: e51060f0 ("IB: IP address based RDMA connection manager")
      Reported-By: default avatarYotam Kenneth <yotamke@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Tested-by: default avatarHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [lizf: Backported to 3.4:
       - there's no cma_save_ip4_info() and cma_save_ip6_info(), and instead
         we apply the changes to cma_save_net_info()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      bbb55274
    • Grygorii Strashko's avatar
      mmc: core: add missing pm event in mmc_pm_notify to fix hib restore · df48a548
      Grygorii Strashko authored
      commit 184af16b upstream.
      
      The PM_RESTORE_PREPARE is not handled now in mmc_pm_notify(),
      as result mmc_rescan() could be scheduled and executed at
      late hibernation restore stages when MMC device is suspended
      already - which, in turn, will lead to system crash on TI dra7-evm board:
      
      WARNING: CPU: 0 PID: 3188 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x258/0x374()
      44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in User mode during Functional access
      
      Hence, add missed PM_RESTORE_PREPARE PM event in mmc_pm_notify().
      
      Fixes: 4c2ef25f (mmc: fix all hangs related to mmc/sd card...)
      Signed-off-by: default avatarGrygorii Strashko <Grygorii.Strashko@linaro.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      df48a548
    • Robert Jarzmik's avatar
      ARM: pxa: lubbock: use new pxa_cplds driver · 5a16f737
      Robert Jarzmik authored
      commit fc9e38c0 upstream.
      
      As the interrupt handling was transferred to the pxa_cplds driver,
      make the switch in lubbock platform code.
      
      Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5a16f737
    • Robert Jarzmik's avatar
      ARM: pxa: mainstone: use new pxa_cplds driver · 7d5e6688
      Robert Jarzmik authored
      commit 27768863 upstream.
      
      As the interrupt handling was transferred to the pxa_cplds driver,
      make the switch in mainstone platform code.
      
      Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7d5e6688
    • Robert Jarzmik's avatar
      ARM: pxa: pxa_cplds: add lubbock and mainstone IO · f6922538
      Robert Jarzmik authored
      commit aa8d6b73 upstream.
      
      Historically, this support was in arch/arm/mach-pxa/lubbock.c and
      arch/arm/mach-pxa/mainstone.c. When gpio-pxa was moved to drivers/pxa,
      it became a driver, and its initialization and probing happened at
      postcore initcall. The lubbock code used to install the chained lubbock
      interrupt handler at init_irq() time.
      
      The consequence of the gpio-pxa change is that the installed chained irq
      handler lubbock_irq_handler() was overwritten in pxa_gpio_probe(_dt)(),
      removing :
       - the handler
       - the falling edge detection setting of GPIO0, which revealed the
         interrupt request from the lubbock IO board.
      
      As a fix, move the gpio0 chained handler setup to a place where we have
      the guarantee that pxa_gpio_probe() was called before, so that lubbock
      handler becomes the true IRQ chained handler of GPIO0, demuxing the
      lubbock IO board interrupts.
      
      This patch moves all that handling to a mfd driver. It's only purpose
      for the time being is the interrupt handling, but in the future it
      should encompass all the motherboard CPLDs handling :
       - leds
       - switches
       - hexleds
      
      The same logic applies to mainstone board.
      
      Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f6922538
    • Davide Italiano's avatar
      ext4: move check under lock scope to close a race. · 0c797892
      Davide Italiano authored
      commit 280227a7 upstream.
      
      fallocate() checks that the file is extent-based and returns
      EOPNOTSUPP in case is not. Other tasks can convert from and to
      indirect and extent so it's safe to check only after grabbing
      the inode mutex.
      Signed-off-by: default avatarDavide Italiano <dccitaliano@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
       - adjust context
       - return -EOPNOTSUPP instead of jumping to the "out" label]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      0c797892
    • Nathan Fontenot's avatar
      powerpc/pseries: Correct cpu affinity for dlpar added cpus · c79a5426
      Nathan Fontenot authored
      commit f32393c9 upstream.
      
      The incorrect ordering of operations during cpu dlpar add results in invalid
      affinity for the cpu being added. The ibm,associativity property in the
      device tree is populated with all zeroes for the added cpu which results in
      invalid affinity mappings and all cpus appear to belong to node 0.
      
      This occurs because rtas configure-connector is called prior to making the
      rtas set-indicator calls. Phyp does not assign affinity information
      for a cpu until the rtas set-indicator calls are made to set the isolation
      and allocation state.
      
      Correct the order of operations to make the rtas set-indicator
      calls (done in dlpar_acquire_drc) before calling rtas configure-connector.
      
      Fixes: 1a8061c4 ("powerpc/pseries: Add kernel based CPU DLPAR handling")
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      [lizf: Backported to 3.4:
       - adjust context
       - jump to the "out" lable instead of returning -EINVAL]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c79a5426
    • Peter Zubaj's avatar
      ALSA: emu10k1: Emu10k2 32 bit DMA mode · 9ae05197
      Peter Zubaj authored
      commit 7241ea55 upstream.
      
      Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two
      modes for DMA. Second mode is useful for 64 bit os with more then 2 GB
      of ram (fixes problems with big soundfont loading)
      
      1) 32MB from 2 GB address space using 8192 pages (used now as default)
      2) 16MB from 4 GB address space using 4096 pages
      
      Mode is set using HCFG_EXPANDED_MEM flag in HCFG register.
      Also format of emu10k2 page table is then different.
      Signed-off-by: default avatarPeter Zubaj <pzubaj@marticonet.sk>
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9ae05197
    • Takashi Iwai's avatar
      ALSA: emux: Fix mutex deadlock in OSS emulation · dbaa20e3
      Takashi Iwai authored
      commit 1c94e65c upstream.
      
      The OSS emulation in synth-emux helper has a potential AB/BA deadlock
      at the simultaneous closing and opening:
      
        close ->
          snd_seq_release() ->
            sne_seq_free_client() ->
              snd_seq_delete_all_ports(): takes client->ports_mutex ->
      	  port_delete() ->
      	    snd_emux_unuse(): takes emux->register_mutex
      
        open ->
          snd_seq_oss_open() ->
            snd_emux_open_seq_oss(): takes emux->register_mutex ->
              snd_seq_event_port_attach() ->
      	  snd_seq_create_port(): takes client->ports_mutex
      
      This patch addresses the deadlock by reducing the rance taking
      emux->register_mutex in snd_emux_open_seq_oss().  The lock is needed
      for the refcount handling, so move it locally.  The calls in
      emux_seq.c are already with the mutex, thus they are replaced with the
      version without mutex lock/unlock.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      dbaa20e3
    • Michal Simek's avatar
      serial: of-serial: Remove device_type = "serial" registration · 475aba9a
      Michal Simek authored
      commit 6befa9d8 upstream.
      
      Do not probe all serial drivers by of_serial.c which are using
      device_type = "serial"; property. Only drivers which have valid
      compatible strings listed in the driver should be probed.
      
      When PORT_UNKNOWN is setup probe will fail anyway.
      
      Arnd quotation about driver historical background:
      "when I wrote that driver initially, the idea was that it would
      get used as a stub to hook up all other serial drivers but after
      that, the common code learned to create platform devices from DT"
      
      This patch fix the problem with on the system with xilinx_uartps and
      16550a where of_serial failed to register for xilinx_uartps and because
      of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking
      for irq with request_irq() EINVAL is returned.
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      475aba9a
    • Michal Simek's avatar
      serial: xilinx: Use platform_get_irq to get irq description structure · 9aabcbc0
      Michal Simek authored
      commit 5c90c07b upstream.
      
      For systems with CONFIG_SERIAL_OF_PLATFORM=y and device_type =
      "serial"; property in DT of_serial.c driver maps and unmaps IRQ (because
      driver probe fails). Then a driver is called but irq mapping is not
      created that's why driver is failing again in again on request_irq().
      Based on this use platform_get_irq() instead of platform_get_resource()
      which is doing irq_desc allocation and driver itself can request IRQ.
      
      Fix both xilinx serial drivers in the tree.
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9aabcbc0
    • Christoph Hellwig's avatar
      3w-9xxx: fix command completion race · c2f1b991
      Christoph Hellwig authored
      commit 118c855b upstream.
      
      The 3w-9xxx driver needs to tear down the dma mappings before returning
      the command to the midlayer, as there is no guarantee the sglist and
      count are valid after that point.  Also remove the dma mapping helpers
      which have another inherent race due to the request_id index.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarAdam Radford <aradford@gmail.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c2f1b991
    • Christoph Hellwig's avatar
      3w-xxxx: fix command completion race · 4d5e62fd
      Christoph Hellwig authored
      commit 9cd95546 upstream.
      
      The 3w-xxxx driver needs to tear down the dma mappings before returning
      the command to the midlayer, as there is no guarantee the sglist and
      count are valid after that point.  Also remove the dma mapping helpers
      which have another inherent race due to the request_id index.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarAdam Radford <aradford@gmail.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4d5e62fd
    • Christoph Hellwig's avatar
      3w-sas: fix command completion race · e456c24e
      Christoph Hellwig authored
      commit 579d69bc upstream.
      
      The 3w-sas driver needs to tear down the dma mappings before returning
      the command to the midlayer, as there is no guarantee the sglist and
      count are valid after that point.  Also remove the dma mapping helpers
      which have another inherent race due to the request_id index.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarTorsten Luettgert <ml-lkml@enda.eu>
      Tested-by: default avatarBernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de>
      Acked-by: default avatarAdam Radford <aradford@gmail.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e456c24e
    • Mike Christie's avatar
      SCSI: add 1024 max sectors black list flag · 3fe48d03
      Mike Christie authored
      commit 35e9a9f9 upstream.
      
      This works around a issue with qnap iscsi targets not handling large IOs
      very well.
      
      The target returns:
      
      VPD INQUIRY: Block limits page (SBC)
        Maximum compare and write length: 1 blocks
        Optimal transfer length granularity: 1 blocks
        Maximum transfer length: 4294967295 blocks
        Optimal transfer length: 4294967295 blocks
        Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
        Maximum unmap LBA count: 8388607
        Maximum unmap block descriptor count: 1
        Optimal unmap granularity: 16383
        Unmap granularity alignment valid: 0
        Unmap granularity alignment: 0
        Maximum write same length: 0xffffffff blocks
        Maximum atomic transfer length: 0
        Atomic alignment: 0
        Atomic transfer length granularity: 0
      
      and it is *sometimes* able to handle at least one IO of size up to 8 MB. We
      have seen in traces where it will sometimes work, but other times it
      looks like it fails and it looks like it returns failures if we send
      multiple large IOs sometimes. Also it looks like it can return 2 different
      errors. It will sometimes send iscsi reject errors indicating out of
      resources or it will send invalid cdb illegal requests check conditions.
      And then when it sends iscsi rejects it does not seem to handle retries
      when there are command sequence holes, so I could not just add code to
      try and gracefully handle that error code.
      
      The problem is that we do not have a good contact for the company,
      so we are not able to determine under what conditions it returns
      which error and why it sometimes works.
      
      So, this patch just adds a new black list flag to set targets like this to
      the old max safe sectors of 1024. The max_hw_sectors changes added in 3.19
      caused this regression, so I also ccing stable.
      Reported-by: default avatarChristian Hesse <list@eworm.de>
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3fe48d03
    • Michel Dänzer's avatar
      drm/radeon: Use drm_calloc_ab for CS relocs · 961bd135
      Michel Dänzer authored
      commit b421ed15 upstream.
      
      The number of relocs is passed in by userspace and can be large. It has
      been observed to cause kcalloc failures in the wild.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      961bd135
    • Takashi Iwai's avatar
      ALSA: emux: Fix mutex deadlock at unloading · 0243c10d
      Takashi Iwai authored
      commit 07b0e5d4 upstream.
      
      The emux-synth driver has a possible AB/BA mutex deadlock at unloading
      the emu10k1 driver:
      
        snd_emux_free() ->
          snd_emux_detach_seq(): mutex_lock(&emu->register_mutex) ->
            snd_seq_delete_kernel_client() ->
              snd_seq_free_client(): mutex_lock(&register_mutex)
      
        snd_seq_release() ->
          snd_seq_free_client(): mutex_lock(&register_mutex) ->
            snd_seq_delete_all_ports() ->
              snd_emux_unuse(): mutex_lock(&emu->register_mutex)
      
      Basically snd_emux_detach_seq() doesn't need a protection of
      emu->register_mutex as it's already being unregistered.  So, we can
      get rid of this for avoiding the deadlock.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      0243c10d
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix card shortname string buffer overflow · 96a987df
      Takashi Iwai authored
      commit d0226082 upstream.
      
      Some models provide too long string for the shortname that has 32bytes
      including the terminator, and it results in a non-terminated string
      exposed to the user-space.  This isn't too critical, though, as the
      string is stopped at the succeeding longname string.
      
      This patch fixes such entries by dropping "SB" prefix (it's enough to
      fit within 32 bytes, so far).  Meanwhile, it also changes strcpy()
      with strlcpy() to make sure that this kind of problem won't happen in
      future, too.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      96a987df
    • Takashi Iwai's avatar
      ALSA: hda - Fix mute-LED fixed mode · e35facb7
      Takashi Iwai authored
      commit ee52e56e upstream.
      
      The mute-LED mode control has the fixed on/off states that are
      supposed to remain on/off regardless of the master switch.  However,
      this doesn't work actually because the vmaster hook is called in the
      vmaster code itself.
      
      This patch fixes it by calling the hook indirectly after checking the
      mute LED mode.
      Reported-and-tested-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e35facb7
    • Al Viro's avatar
      RCU pathwalk breakage when running into a symlink overmounting something · a26f33c5
      Al Viro authored
      commit 3cab989a upstream.
      
      Calling unlazy_walk() in walk_component() and do_last() when we find
      a symlink that needs to be followed doesn't acquire a reference to vfsmount.
      That's fine when the symlink is on the same vfsmount as the parent directory
      (which is almost always the case), but it's not always true - one _can_
      manage to bind a symlink on top of something.  And in such cases we end up
      with excessive mntput().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [lizf: Backported to 3.4: drop the changes to do_last()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a26f33c5
    • Jeff Layton's avatar
      nfs: fix high load average due to callback thread sleeping · 20db5788
      Jeff Layton authored
      commit 5d05e54a upstream.
      
      Chuck pointed out a problem that crept in with commit 6ffa30d3 (nfs:
      don't call blocking operations while !TASK_RUNNING). Linux counts tasks
      in uninterruptible sleep against the load average, so this caused the
      system's load average to be pinned at at least 1 when there was a
      NFSv4.1+ mount active.
      
      Not a huge problem, but it's probably worth fixing before we get too
      many complaints about it. This patch converts the code back to use
      TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
      iteration. In practice no one should really be signalling this thread at
      all, so I think this is reasonably safe.
      
      With this change, there's also no need to game the hung task watchdog so
      we can also convert the schedule_timeout call back to a normal schedule.
      Reported-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Tested-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: commit 6ffa30d3 (“nfs: don't call blocking . . .”)
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      20db5788
    • Jeff Layton's avatar
      nfs: don't call blocking operations while !TASK_RUNNING · c9e5b3b7
      Jeff Layton authored
      commit 6ffa30d3 upstream.
      
      Bruce reported seeing this warning pop when mounting using v4.1:
      
           ------------[ cut here ]------------
           WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
          do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
          Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
          CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
           0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
           0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
           ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
          Call Trace:
           [<ffffffff8186ac78>] dump_stack+0x4c/0x65
           [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
           [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
           [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
           [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
           [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
           [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
           [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
           [<ffffffff810d941e>] groups_alloc+0x3e/0x130
           [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
           [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
           [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
           [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
           [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
           [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
           [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
           [<ffffffff810d45bf>] kthread+0x11f/0x140
           [<ffffffff810ea815>] ? local_clock+0x15/0x30
           [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
           [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
           [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
          ---[ end trace 675220a11e30f4f2 ]---
      
      nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
      which is just wrong. Fix that by finishing the wait immediately if we've
      found that the list has something on it.
      
      Also, we don't expect this kthread to accept signals, so we should be
      using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
      hung task warnings from the watchdog, so have the schedule_timeout
      wake up every 60s if there's no callback activity.
      Reported-by: default avatar"J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c9e5b3b7
    • Giuseppe Cantavenera's avatar
      nfsd: fix nsfd startup race triggering BUG_ON · c1515815
      Giuseppe Cantavenera authored
      commit bb7ffbf2 upstream.
      
      nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
      in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
      The following was observed on a MIPS 32-core processor:
      kernel: Call Trace:
      kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
      kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
      kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
      kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
      kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
      kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
      kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
      kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
      kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
      kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
      kernel:
      kernel:
              Code: 0040f809  00000000  2e020001 <00020336> 3c12c00d
                      3c02801a  de100000 6442eb98  0040f809
      kernel: ---[ end trace 7471374335809536 ]---
      
      Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
      registering rpc_pipefs_event(...) with the notifier chain.
      Signed-off-by: default avatarGiuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
      Signed-off-by: default avatarLorenzo Restelli <lorenzo.restelli.ext@nokia.com>
      Reviewed-by: default avatarKinlong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c1515815
    • Dan Carpenter's avatar
      memstick: mspro_block: add missing curly braces · 4897576f
      Dan Carpenter authored
      commit 13f6b191 upstream.
      
      Using the indenting we can see the curly braces were obviously intended.
      This is a static checker fix, but my guess is that we don't read enough
      bytes, because we don't calculate "t_len" correctly.
      
      Fixes: f1d82698 ('memstick: use fully asynchronous request processing')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4897576f
    • Oleg Nesterov's avatar
      ptrace: fix race between ptrace_resume() and wait_task_stopped() · a12cb100
      Oleg Nesterov authored
      commit b72c1869 upstream.
      
      ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
      tracee->exit_code and then wake_up_state() changes tracee->state.  If the
      tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
      wrongly looks like another report from tracee.
      
      This confuses debugger, and since wait_task_stopped() clears ->exit_code
      the tracee can miss a signal.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	#include <sys/ptrace.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	int pid;
      
      	void *waiter(void *arg)
      	{
      		int stat;
      
      		for (;;) {
      			assert(pid == wait(&stat));
      			assert(WIFSTOPPED(stat));
      			if (WSTOPSIG(stat) == SIGHUP)
      				continue;
      
      			assert(WSTOPSIG(stat) == SIGCONT);
      			printf("ERR! extra/wrong report:%x\n", stat);
      		}
      	}
      
      	int main(void)
      	{
      		pthread_t thread;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			for (;;)
      				kill(getpid(), SIGHUP);
      		}
      
      		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
      
      		for (;;)
      			ptrace(PTRACE_CONT, pid, 0, SIGCONT);
      
      		return 0;
      	}
      
      Note for stable: the bug is very old, but without 9899d11f "ptrace:
      ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
      should use lock_task_sighand(child).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarPavel Labath <labath@google.com>
      Tested-by: default avatarPavel Labath <labath@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a12cb100
    • Nicolas Iooss's avatar
      firmware/ihex2fw.c: restore missing default in switch statement · 241cb823
      Nicolas Iooss authored
      commit d43698e8 upstream.
      
      Commit 2473238e ("ihex: add support for CS:IP/EIP records") removes
      the "default:" statement in the switch block, making the "return
      usage();" line dead code and ihex2fw silently ignoring unknown options.
      Restore this statement.
      
      This bug was found by building with HOSTCC=clang and adding
      -Wunreachable-code-return to HOSTCFLAGS.
      
      Fixes: 2473238e ("ihex: add support for CS:IP/EIP records")
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      241cb823
    • Christoph Hellwig's avatar
      megaraid_sas: use raw_smp_processor_id() · b7cecd38
      Christoph Hellwig authored
      commit 16b8528d upstream.
      
      We only want to steer the I/O completion towards a queue, but don't
      actually access any per-CPU data, so the raw_ version is fine to use
      and avoids the warnings when using smp_processor_id().
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Tested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      [lizf: Backported to 3.4: drop the changes to megasas_build_dcdb_fusion()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b7cecd38
    • Erez Shitrit's avatar
      IB/mlx4: Fix WQE LSO segment calculation · b0772ad6
      Erez Shitrit authored
      commit ca9b590c upstream.
      
      The current code decreases from the mss size (which is the gso_size
      from the kernel skb) the size of the packet headers.
      
      It shouldn't do that because the mss that comes from the stack
      (e.g IPoIB) includes only the tcp payload without the headers.
      
      The result is indication to the HW that each packet that the HW sends
      is smaller than what it could be, and too many packets will be sent
      for big messages.
      
      An easy way to demonstrate one more aspect of the problem is by
      configuring the ipoib mtu to be less than 2*hlen (2*56) and then
      run app sending big TCP messages. This will tell the HW to send packets
      with giant (negative value which under unsigned arithmetics becomes
      a huge positive one) length and the QP moves to SQE state.
      
      Fixes: b832be1e ('IB/mlx4: Add IPoIB LSO support')
      Reported-by: default avatarMatthew Finlay <matt@mellanox.com>
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b0772ad6
    • Yann Droneaud's avatar
      IB/core: disallow registering 0-sized memory region · c5498948
      Yann Droneaud authored
      commit 8abaae62 upstream.
      
      If ib_umem_get() is called with a size equal to 0 and an
      non-page aligned address, one page will be pinned and a
      0-sized umem will be returned to the caller.
      
      This should not be allowed: it's not expected for a memory
      region to have a size equal to 0.
      
      This patch adds a check to explicitly refuse to register
      a 0-sized region.
      
      Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Jack Morgenstein <jackm@mellanox.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c5498948
    • Ben Collins's avatar
      dm crypt: fix deadlock when async crypto algorithm returns -EBUSY · ebcf545d
      Ben Collins authored
      commit 0618764c upstream.
      
      I suspect this doesn't show up for most anyone because software
      algorithms typically don't have a sense of being too busy.  However,
      when working with the Freescale CAAM driver it will return -EBUSY on
      occasion under heavy -- which resulted in dm-crypt deadlock.
      
      After checking the logic in some other drivers, the scheme for
      crypt_convert() and it's callback, kcryptd_async_done(), were not
      correctly laid out to properly handle -EBUSY or -EINPROGRESS.
      
      Fix this by using the completion for both -EBUSY and -EINPROGRESS.  Now
      crypt_convert()'s use of completion is comparable to
      af_alg_wait_for_completion().  Similarly, kcryptd_async_done() follows
      the pattern used in af_alg_complete().
      
      Before this fix dm-crypt would lockup within 1-2 minutes running with
      the CAAM driver.  Fix was regression tested against software algorithms
      on PPC32 and x86_64, and things seem perfectly happy there as well.
      Signed-off-by: default avatarBen Collins <ben.c@servergy.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ebcf545d
    • Michael Davidson's avatar
      fs/binfmt_elf.c: fix bug in loading of PIE binaries · a425d56c
      Michael Davidson authored
      commit a87938b2 upstream.
      
      With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
      address allocation strategy, load_elf_binary() will attempt to map a PIE
      binary into an address range immediately below mm->mmap_base.
      
      Unfortunately, load_elf_ binary() does not take account of the need to
      allocate sufficient space for the entire binary which means that, while
      the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
      PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
      that is supposed to be the "gap" between the stack and the binary.
      
      Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
      means that binaries with large data segments > 128MB can end up mapping
      part of their data segment over their stack resulting in corruption of the
      stack (and the data segment once the binary starts to run).
      
      Any PIE binary with a data segment > 128MB is vulnerable to this although
      address randomization means that the actual gap between the stack and the
      end of the binary is normally greater than 128MB.  The larger the data
      segment of the binary the higher the probability of failure.
      
      Fix this by calculating the total size of the binary in the same way as
      load_elf_interp().
      Signed-off-by: default avatarMichael Davidson <md@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a425d56c
    • Lv Zheng's avatar
      ACPICA: Utilities: split IO address types from data type models. · c77676b7
      Lv Zheng authored
      commit 2b876010 upstream.
      
      ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451
      
      It is reported that on a physically 64-bit addressed machine, 32-bit kernel
      can trigger crashes in accessing the memory regions that are beyond the
      32-bit boundary. The region field's start address should still be 32-bit
      compliant, but after a calculation (adding some offsets), it may exceed the
      32-bit boundary. This case is rare and buggy, but there are real BIOSes
      leaked with such issues (see References below).
      
      This patch fixes this gap by always defining IO addresses as 64-bit, and
      allows OSPMs to optimize it for a real 32-bit machine to reduce the size of
      the internal objects.
      
      Internal acpi_physical_address usages in the structures that can be fixed
      by this change include:
       1. struct acpi_object_region:
          acpi_physical_address		address;
       2. struct acpi_address_range:
          acpi_physical_address		start_address;
          acpi_physical_address		end_address;
       3. struct acpi_mem_space_context;
          acpi_physical_address		address;
       4. struct acpi_table_desc
          acpi_physical_address		address;
      See known issues 1 for other usages.
      
      Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer
      from same problem, so this patch changes it accordingly.
      
      For iasl, it will enforce acpi_physical_address as 32-bit to generate
      32-bit OSPM compatible tables on 32-bit platforms, we need to define
      ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h.
      
      Known issues:
       1. Cleanup of mapped virtual address
         In struct acpi_mem_space_context, acpi_physical_address is used as a virtual
         address:
          acpi_physical_address                   mapped_physical_address;
         It is better to introduce acpi_virtual_address or use acpi_size instead.
         This patch doesn't make such a change. Because this should be done along
         with a change to acpi_os_map_memory()/acpi_os_unmap_memory().
         There should be no functional problem to leave this unchanged except
         that only this structure is enlarged unexpectedly.
      
      Link: https://github.com/acpica/acpica/commit/aacf863c
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501Reported-and-tested-by: default avatarPaul Menzel <paulepanter@users.sourceforge.net>
      Reported-and-tested-by: default avatarSial Nije <sialnije@gmail.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c77676b7
    • Anton Blanchard's avatar
      powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH · 3e6102f9
      Anton Blanchard authored
      commit 9a5cbce4 upstream.
      
      We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
      (currently 127), but we forgot to do the same for 64bit backtraces.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3e6102f9
    • Filipe Manana's avatar
      Btrfs: fix inode eviction infinite loop after cloning into it · c9ff0e39
      Filipe Manana authored
      commit ccccf3d6 upstream.
      
      If we attempt to clone a 0 length region into a file we can end up
      inserting a range in the inode's extent_io tree with a start offset
      that is greater then the end offset, which triggers immediately the
      following warning:
      
      [ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
      [ 3914.620886] BTRFS: end < start 4095 4096
      (...)
      [ 3914.638093] Call Trace:
      [ 3914.638636]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [ 3914.639620]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [ 3914.640789]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
      [ 3914.642041]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
      [ 3914.643236]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
      [ 3914.644441]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
      [ 3914.645711]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
      [ 3914.646914]  [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
      [ 3914.648058]  [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
      [ 3914.650105]  [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
      [ 3914.651361]  [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
      [ 3914.652761]  [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
      [ 3914.654128]  [<ffffffff811226dd>] ? might_fault+0x58/0xb5
      [ 3914.655320]  [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
      (...)
      [ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---
      
      This later makes the inode eviction handler enter an infinite loop that
      keeps dumping the following warning over and over:
      
      [ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
      [ 3915.119913] BTRFS: end < start 4095 4096
      (...)
      [ 3915.137394] Call Trace:
      [ 3915.137913]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [ 3915.139154]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [ 3915.140316]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
      [ 3915.141505]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
      [ 3915.142709]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
      [ 3915.143849]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
      [ 3915.145120]  [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
      [ 3915.146352]  [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
      [ 3915.147565]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
      [ 3915.148785]  [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
      [ 3915.149931]  [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
      [ 3915.151154]  [<ffffffff81168904>] evict+0xa0/0x148
      [ 3915.152094]  [<ffffffff811689e5>] dispose_list+0x39/0x43
      [ 3915.153081]  [<ffffffff81169564>] evict_inodes+0xdc/0xeb
      [ 3915.154062]  [<ffffffff81154418>] generic_shutdown_super+0x49/0xef
      [ 3915.155193]  [<ffffffff811546d1>] kill_anon_super+0x13/0x1e
      [ 3915.156274]  [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
      (...)
      [ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---
      
      So just bail out of the clone ioctl if the length of the region to clone
      is zero, without locking any extent range, in order to prevent this issue
      (same behaviour as a pwrite with a 0 length for example).
      
      This is trivial to reproduce. For example, the steps for the test I just
      made for fstests:
      
        mkfs.btrfs -f SCRATCH_DEV
        mount SCRATCH_DEV $SCRATCH_MNT
      
        touch $SCRATCH_MNT/foo
        touch $SCRATCH_MNT/bar
      
        $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
        umount $SCRATCH_MNT
      
      A test case for fstests follows soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c9ff0e39
    • Heiko Carstens's avatar
      s390/hibernate: fix save and restore of kernel text section · c773eee8
      Heiko Carstens authored
      commit d7441949 upstream.
      
      Sebastian reported a crash caused by a jump label mismatch after resume.
      This happens because we do not save the kernel text section during suspend
      and therefore also do not restore it during resume, but use the kernel image
      that restores the old system.
      
      This means that after a suspend/resume cycle we lost all modifications done
      to the kernel text section.
      The reason for this is the pfn_is_nosave() function, which incorrectly
      returns that read-only pages don't need to be saved. This is incorrect since
      we mark the kernel text section read-only.
      We still need to make sure to not save and restore pages contained within
      NSS and DCSS segment.
      To fix this add an extra case for the kernel text section and only save
      those pages if they are not contained within an NSS segment.
      
      Fixes the following crash (and the above bugs as well):
      
      Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
      Found:    c0 04 00 00 00 00
      Expected: c0 f4 00 00 00 11
      New:      c0 04 00 00 00 00
      Kernel panic - not syncing: Corrupted kernel text
      CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4
      Call Trace:
        [<0000000000113972>] show_stack+0x72/0xf0
        [<000000000081f15e>] dump_stack+0x6e/0x90
        [<000000000081c4e8>] panic+0x108/0x2b0
        [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108
        [<0000000000112176>] __jump_label_transform+0x9e/0xd0
        [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
        [<00000000001d1136>] multi_cpu_stop+0x12e/0x170
        [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168
        [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
        [<0000000000158baa>] kthread+0x10a/0x110
        [<0000000000824a86>] kernel_thread_starter+0x6/0xc
      Reported-and-tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [lizf: Backported to 3.4: add necessary includes]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c773eee8
    • Nicolas Dichtel's avatar
      selinux/nlmsg: add XFRM_MSG_MAPPING · 6ebf0333
      Nicolas Dichtel authored
      commit bd2cba07 upstream.
      
      This command is missing.
      
      Fixes: 3a2dfbe8 ("xfrm: Notify changes in UDP encapsulation via netlink")
      CC: Martin Willi <martin@strongswan.org>
      Reported-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6ebf0333
    • Nicolas Dichtel's avatar
      selinux/nlmsg: add XFRM_MSG_MIGRATE · fa0ed226
      Nicolas Dichtel authored
      commit 8d465bb7 upstream.
      
      This command is missing.
      
      Fixes: 5c79de6e ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
      Reported-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fa0ed226