1. 12 Feb, 2011 28 commits
    • Nicholas Bellinger's avatar
      [SCSI] target: Fix SCF_SCSI_CONTROL_SG_IO_CDB breakage · e63af958
      Nicholas Bellinger authored
      This patch fixes a bug introduced during the v4 control CDB emulation
      refactoring that broke SCF_SCSI_CONTROL_SG_IO_CDB operation within
      transport_map_control_cmd_to_task().  It moves the BUG_ON() into
      transport_do_se_mem_map() after the TRANSPORT(dev)->do_se_mem_map()
      RAMDISK_DR special case, and adds the proper struct se_mem assignment
      when !list_empty() for normal non RAMDISK_DR backend device cases.
      Reported-by: default avatarKai-Thorsten Hambrecht <kai@hambrecht.org>
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      e63af958
    • Nicholas Bellinger's avatar
      [SCSI] target: Fix top-level configfs_subsystem default_group shutdown breakage · 7c2bf6e9
      Nicholas Bellinger authored
      This patch fixes two bugs uncovered during testing with
      slub_debug=FPUZ during module_exit() -> target_core_exit_configfs()
      with release of configfs subsystem consumer default groups, namely how
      this should be working with
      fs/configfs/dir.c:configfs_unregister_subsystem() release logic for
      struct config_group->default_group.
      
      The first issue involves configfs_unregister_subsystem() expecting to
      walk+drain the top-level subsys->su_group.default_groups directly in
      unlink_group(), and not directly from the configfs subsystem consumer
      for the top level struct config_group->default_groups.  This patch
      drops the walk+drain of subsys->su_group.default_groups from TCM
      configfs subsystem consumer code, and moves the top-level
      ->default_groups kfree() after configfs_unregister_subsystem() has
      been called.
      
      The second issue involves calling
      core_alua_free_lu_gp(se_global->default_lu_gp) to release the
      default_lu_gp->lu_gp_group before configfs_unregister_subsystem() has
      been called.  This patches also moves the core_alua_free_lu_gp() call
      to release default_lu_group->lu_gp_group after the subsys has been
      unregistered.
      
      Finally, this patch explictly clears the
      [lu_gp,alua,hba]_cg->default_groups pointers after kfree() to ensure
      that no stale memory is picked up from child struct
      config_group->default_group[] while configfs_unregister_subsystem() is
      called.
      Reported-by: default avatarFubo Chen <fubo.chen@gmail.com>
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      7c2bf6e9
    • Fubo Chen's avatar
      [SCSI] target: fixed missing lock drop in error path · 85dc98d9
      Fubo Chen authored
      The struct se_node_acl->device_list_lock needs to be released if either
      sanity check for struct se_dev_entry->se_lun_acl or deve->se_lun fails.
      Signed-off-by: default avatarFubo Chen <fubo.chen@gmail.com>
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      85dc98d9
    • Nicholas Bellinger's avatar
      [SCSI] target: Fix demo-mode MappedLUN shutdown UA/PR breakage · 29fe609d
      Nicholas Bellinger authored
      This patch fixes a bug in core_update_device_list_for_node() where
      individual demo-mode generated MappedLUN's UA + Persistent
      Reservations metadata where being leaked, instead of falling through
      and calling existing core_scsi3_ua_release_all() and
      core_scsi3_free_pr_reg_from_nacl() at the end of
      core_update_device_list_for_node().
      
      This bug would manifest itself with the following OOPs w/ TPG
      demo-mode endpoints (tfo->tpg_check_demo_mode()=1), and PROUT
      REGISTER+RESERVE -> explict struct se_session logout -> struct
      se_device shutdown:
      
      [  697.021139] LIO_iblock used greatest stack depth: 2704 bytes left
      [  702.235017] general protection fault: 0000 [#1] SMP
      [  702.235074] last sysfs file: /sys/devices/virtual/net/lo/operstate
      [  704.372695] CPU 0
      [  704.372725] Modules linked in: crc32c target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs sr_mod cdrom sd_mod ata_piix mptspi mptscsih libata mptbase [last unloaded: iscsi_target_mod]
      [  704.375442]
      [  704.375563] Pid: 4964, comm: tcm_node Not tainted 2.6.37+ #1 440BX Desktop Reference Platform/VMware Virtual Platform
      [  704.375912] RIP: 0010:[<ffffffffa00aaa16>]  [<ffffffffa00aaa16>] __core_scsi3_complete_pro_release+0x31/0x133 [target_core_mod]
      [  704.376017] RSP: 0018:ffff88001e5ffcb8  EFLAGS: 00010296
      [  704.376017] RAX: 6d32335b1b0a0d0a RBX: ffff88001d952cb0 RCX: 0000000000000015
      [  704.376017] RDX: ffff88001b428000 RSI: ffff88001da5a4c0 RDI: ffff88001e5ffcd8
      [  704.376017] RBP: ffff88001e5ffd28 R08: ffff88001e5ffcd8 R09: ffff88001d952080
      [  704.377116] R10: ffff88001dfc5480 R11: ffff88001df8abb0 R12: ffff88001d952cb0
      [  704.377319] R13: 0000000000000000 R14: ffff88001df8abb0 R15: ffff88001b428000
      [  704.377521] FS:  00007f033d15c6e0(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
      [  704.377861] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  704.378043] CR2: 00007fff09281510 CR3: 000000001e5db000 CR4: 00000000000006f0
      [  704.378110] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  704.378110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  704.378110] Process tcm_node (pid: 4964, threadinfo ffff88001e5fe000, task ffff88001d99c260)
      [  704.378110] Stack:
      [  704.378110]  ffffea0000678980 ffff88001da5a4c0 ffffea0000678980 ffff88001f402b00
      [  704.378110]  ffff88001e5ffd08 ffffffff810ea236 ffff88001e5ffd18 0000000000000282
      [  704.379772]  ffff88001d952080 ffff88001d952cb0 ffff88001d952cb0 ffff88001dc79010
      [  704.380082] Call Trace:
      [  704.380220]  [<ffffffff810ea236>] ? __slab_free+0x89/0x11c
      [  704.380403]  [<ffffffffa00ab781>] core_scsi3_free_all_registrations+0x3e/0x157 [target_core_mod]
      [  704.380479]  [<ffffffffa00a752b>] se_release_device_for_hba+0xa6/0xd8 [target_core_mod]
      [  704.380479]  [<ffffffffa00a7598>] se_free_virtual_device+0x3b/0x45 [target_core_mod]
      [  704.383750]  [<ffffffffa00a3177>] target_core_drop_subdev+0x13a/0x18d [target_core_mod]
      [  704.384068]  [<ffffffffa00960db>] client_drop_item+0x25/0x31 [configfs]
      [  704.384263]  [<ffffffffa00967b5>] configfs_rmdir+0x1a1/0x223 [configfs]
      [  704.384459]  [<ffffffff810fa8cd>] vfs_rmdir+0x7e/0xd3
      [  704.384631]  [<ffffffff810fc3be>] do_rmdir+0xa3/0xf4
      [  704.384895]  [<ffffffff810eed15>] ? filp_close+0x67/0x72
      [  704.386485]  [<ffffffff810fc446>] sys_rmdir+0x11/0x13
      [  704.387893]  [<ffffffff81002a92>] system_call_fastpath+0x16/0x1b
      [  704.388083] Code: 4c 8d 45 b0 41 56 49 89 d7 41 55 41 89 cd 41 54 b9 15 00 00 00 53 48 89 fb 48 83 ec 48 4c 89 c7 48 89 75 98 48 8b 86 28 01 00 00 <48> 8b 80 90 01 00 00 48 89 45 a0 31 c0 f3 aa c7 45 ac 00 00 00
      [  704.388763] RIP  [<ffffffffa00aaa16>] __core_scsi3_complete_pro_release+0x31/0x133 [target_core_mod]
      [  704.389142]  RSP <ffff88001e5ffcb8>
      [  704.389572] ---[ end trace 2a3614f3cd6261a5 ]---
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      29fe609d
    • Nicholas Bellinger's avatar
      [SCSI] target/iblock: Fix failed bd claim NULL pointer dereference · bc665524
      Nicholas Bellinger authored
      This patch adds an explict check for struct iblock_dev->ibd_bd in
      iblock_free_device() before calling blkdev_put(), which will otherwise hit
      the following NULL pointer dereference @ ib_dev->ibd_bd when iblock_create_virtdevice()
      fails to claim an already in-use struct block_device via blkdev_get_by_path().
      
      [  112.528578] Target_Core_ConfigFS: Allocated struct se_subsystem_dev: ffff88001e750000 se_dev_su_ptr: ffff88001dd05d70
      [  112.534681] Target_Core_ConfigFS: Calling t->free_device() for se_dev_su_ptr: ffff88001dd05d70
      [  112.535029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [  112.535029] IP: [<ffffffff814987a3>] mutex_lock+0x14/0x35
      [  112.535029] PGD 1e5d0067 PUD 1e274067 PMD 0
      [  112.535029] Oops: 0002 [#1] SMP
      [  112.535029] last sysfs file: /sys/devices/pci0000:00/0000:00:07.1/host2/target2:0:0/2:0:0:0/type
      [  112.535029] CPU 0
      [  112.535029] Modules linked in: iscsi_target_mod target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs sr_mod cdrom sd_mod ata_piix mptspi mptscsih libata mptbase [last unloaded: scsi_wait_scan]
      [  112.535029]
      [  112.535029] Pid: 3345, comm: python2.5 Not tainted 2.6.37+ #1 440BX Desktop Reference Platform/VMware Virtual Platform
      [  112.535029] RIP: 0010:[<ffffffff814987a3>]  [<ffffffff814987a3>] mutex_lock+0x14/0x35
      [  112.535029] RSP: 0018:ffff88001e6d7d58  EFLAGS: 00010246
      [  112.535029] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000082
      [  112.535029] RDX: ffff88001e6d7fd8 RSI: 0000000000000083 RDI: 0000000000000020
      [  112.535029] RBP: ffff88001e6d7d68 R08: 0000000000000000 R09: 0000000000000000
      [  112.535029] R10: ffff8800000be860 R11: ffff88001f420000 R12: 0000000000000020
      [  112.535029] R13: 0000000000000083 R14: ffff88001d809430 R15: ffff88001d8094f8
      [  112.535029] FS:  00007ff17ca7d6e0(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
      [  112.535029] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  112.535029] CR2: 0000000000000020 CR3: 000000001e5d2000 CR4: 00000000000006f0
      [  112.535029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  112.535029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  112.535029] Process python2.5 (pid: 3345, threadinfo ffff88001e6d6000, task ffff88001e2d0760)
      [  112.535029] Stack:
      [  112.535029]  ffff88001e6d7d88 0000000000000000 ffff88001e6d7d98 ffffffff811187fc
      [  112.535029]  ffff88001d809430 ffff88001dd05d70 ffff88001e750860 ffff88001e750000
      [  112.535029]  ffff88001e6d7db8 ffffffffa00e3757 ffff88001e6d7db8 0000000000000004
      [  112.535029] Call Trace:
      [  112.535029]  [<ffffffff811187fc>] blkdev_put+0x28/0x107
      [  112.535029]  [<ffffffffa00e3757>] iblock_free_device+0x1d/0x36 [target_core_iblock]
      [  112.535029]  [<ffffffffa00a319c>] target_core_drop_subdev+0x15f/0x18d [target_core_mod]
      [  112.535029]  [<ffffffffa00960db>] client_drop_item+0x25/0x31 [configfs]
      [  112.535029]  [<ffffffffa00967b5>] configfs_rmdir+0x1a1/0x223 [configfs]
      [  112.535029]  [<ffffffff810fa8cd>] vfs_rmdir+0x7e/0xd3
      [  112.535029]  [<ffffffff810fc3be>] do_rmdir+0xa3/0xf4
      [  112.535029]  [<ffffffff810fc446>] sys_rmdir+0x11/0x13
      [  112.535029]  [<ffffffff81002a92>] system_call_fastpath+0x16/0x1b
      [  112.535029] Code: 8b 04 25 88 b5 00 00 48 2d d8 1f 00 00 48 89 43 18 31 c0 5e 5b c9 c3 55 48 89 e5 53 48 89 fb 48 83 ec 08 e8 c4 f7 ff ff 48 89 df <3e> ff 0f 79 05 e8 1e ff ff ff 65 48 8b 04 25 88 b5 00 00 48 2d
      [  112.535029] RIP  [<ffffffff814987a3>] mutex_lock+0x14/0x35
      [  112.535029]  RSP <ffff88001e6d7d58>
      [  112.535029] CR2: 0000000000000020
      [  132.679636] ---[ end trace 05754bb48eb828f0 ]---
      
      Note it also adds an second explict check for ib_dev->ibd_bio_set before calling
      bioset_free() to fix the same possible NULL pointer deference during an early
      iblock_create_virtdevice() failure.
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      bc665524
    • Dan Carpenter's avatar
      [SCSI] target: iblock/pscsi claim checking for NULL instead of IS_ERR · 3ae279d2
      Dan Carpenter authored
      blkdev_get_by_path() returns an ERR_PTR() or error and it doesn't return
      a NULL.  It looks like this bug would be easy to trigger by mistake.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      3ae279d2
    • Darrick J. Wong's avatar
      [SCSI] scsi_debug: Fix 32-bit overflow in do_device_access causing memory corruption · a361cc00
      Darrick J. Wong authored
      If I create a scsi_debug device that is larger than 4GB, the multiplication of
      (block * scsi_debug_sector_size) can produce a 64-bit value.  Unfortunately,
      the compiler sees two 32-bit quantities and performs a 32-bit multiplication,
      thus truncating the bits above 2^32.  This causes the wrong memory location to
      be read or written.  Change block and rest to be unsigned long long.
      Signed-off-by: default avatarDarrick J. Wong <djwong@us.ibm.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      a361cc00
    • Madhuranath Iyengar's avatar
      [SCSI] qla2xxx: Change from irq to irqsave with host_lock · 044d78e1
      Madhuranath Iyengar authored
      Make the driver safer by using irqsave/irqrestore with host_lock.
      Signed-off-by: default avatarMadhuranath Iyengar <Madhu.Iyengar@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      044d78e1
    • James Bottomley's avatar
      [SCSI] qla2xxx: Fix race that could hang kthread_stop() · 563585ec
      James Bottomley authored
      There is a small race window in qla2x00_do_dpc() between
      checking for kthread_should_stop() and going to sleep after
      setting TASK_INTERRUPTIBLE. If qla2x00_free_device() is called
      in this window, kthread_stop will wait forever because there
      will be no one to wake up the process.
      
      Fix by making sure we only set TASK_INTERRUPTIBLE before checking
      kthread_stop().
      Reported-by: default avatarBandan Das <bandan.das@stratus.com>
      Acked-by: default avatarMadhuranath Iyengar <Madhu.Iyengar@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      563585ec
    • Linus Torvalds's avatar
      Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3c6c0d6c
      Linus Torvalds authored
      * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index
      3c6c0d6c
    • Linus Torvalds's avatar
      5b49378e
    • Linus Torvalds's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 · 3aec46c1
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
        cifs: don't always drop malformed replies on the floor (try #3)
        cifs: clean up checks in cifs_echo_request
        [CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate
      3aec46c1
    • Linus Torvalds's avatar
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging · 68c3d4b2
      Linus Torvalds authored
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
        hwmon: (emc1403) Fix I2C address range
        hwmon: (lm63) Consider LM64 temperature offset
      68c3d4b2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of... · f7909fb8
      Linus Torvalds authored
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
        pci: use security_capable() when checking capablities during config space read
        security: add cred argument to security_capable()
        tpm_tis: Use timeouts returned from TPM
      f7909fb8
    • Linus Torvalds's avatar
      Merge branch 's5p-fixes-for-linus' of... · c41d40b5
      Linus Torvalds authored
      Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
      
      * 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
        ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h
        ARM: S5PV310: Cleanup System MMU
        ARM: S5PV310: Add support System MMU on SMDKV310
      c41d40b5
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze · a288465f
      Linus Torvalds authored
      * 'next' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Fix msr instruction detection
        microblaze: Fix pte_update function
        microblaze: Fix asm compilation warning
        microblaze: Fix IRQ flag handling for MSR=0
      a288465f
    • Julia Lawall's avatar
      drivers/w1/masters/omap_hdq.c: add missing clk_put · 80d02d27
      Julia Lawall authored
      This code makes two calls to clk_get, then test both return values and
      fails if either failed.
      
      The problem is that in the first inner if, where the first call to
      clk_get has failed, it don't know if the second call has failed as well.
      So it don't know whether clk_get should be called on the result of the
      second call.  Of course, it would be possible to test that value again.
      A simpler solution is just to test the result of calling clk_get
      directly after each call.
      
      The semantic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      position p1,p2;
      expression e;
      statement S;
      @@
      
      e = clk_get@p1(...)
      ...
      if@p2 (IS_ERR(e)) S
      
      @@
      expression e;
      statement S;
      identifier l;
      position r.p1, p2 != r.p2;
      @@
      
      *e = clk_get@p1(...)
      ... when != clk_put(e)
      *if@p2 (...)
      {
        ... when != clk_put(e)
      * return ...;
      }// </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarAmit Kucheria <amit.kucheria@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80d02d27
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix leak of accounting at failure path of hugepage collapsing · 678ff896
      KAMEZAWA Hiroyuki authored
      mem_cgroup_uncharge_page() should be called in all failure cases after
      mem_cgroup_charge_newpage() is called in huge_memory.c::collapse_huge_page()
      
       [ 4209.076861] BUG: Bad page state in process khugepaged  pfn:1e9800
       [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping:          (null) index:0x2800
       [ 4209.078674] page flags: 0x40000000004000(head)
       [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
       [ 4209.082177] (/A)
       [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
       [ 4209.083412] Call Trace:
       [ 4209.083678]  [<ffffffff810f4454>] ? bad_page+0xe4/0x140
       [ 4209.084240]  [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
       [ 4209.084837]  [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
       [ 4209.085509]  [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
       [ 4209.086110]  [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
       [ 4209.086699]  [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
       [ 4209.087333]  [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
       [ 4209.087935]  [<ffffffff810fb015>] ? put_page+0x45/0x50
       [ 4209.097361]  [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
       [ 4209.098364]  [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
       [ 4209.099121]  [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
       [ 4209.099780]  [<ffffffff8107c236>] ? kthread+0x96/0xa0
       [ 4209.100452]  [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
       [ 4209.101214]  [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
       [ 4209.101842]  [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      678ff896
    • Johannes Weiner's avatar
      vmscan: fix zone shrinking exit when scan work is done · f0fdc5e8
      Johannes Weiner authored
      Commit 3e7d3449 ("mm: vmscan: reclaim order-0 and use compaction
      instead of lumpy reclaim") introduced an indefinite loop in
      shrink_zone().
      
      It meant to break out of this loop when no pages had been reclaimed and
      not a single page was even scanned.  The way it would detect the latter
      is by taking a snapshot of sc->nr_scanned at the beginning of the
      function and comparing it against the new sc->nr_scanned after the scan
      loop.  But it would re-iterate without updating that snapshot, looping
      forever if sc->nr_scanned changed at least once since shrink_zone() was
      invoked.
      
      This is not the sole condition that would exit that loop, but it
      requires other processes to change the zone state, as the reclaimer that
      is stuck obviously can not anymore.
      
      This is only happening for higher-order allocations, where reclaim is
      run back to back with compaction.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
      Tested-by: Kent Overstreet<kent.overstreet@gmail.com>
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0fdc5e8
    • Michel Lespinasse's avatar
      mlock: do not munlock pages in __do_fault() · 419d8c96
      Michel Lespinasse authored
      If the page is going to be written to, __do_page needs to break COW.
      
      However, the old page (before breaking COW) was never mapped mapped into
      the current pte (__do_fault is only called when the pte is not present),
      so vmscan can't have marked the old page as PageMlocked due to being
      mapped in __do_fault's VMA.  Therefore, __do_fault() does not need to
      worry about clearing PageMlocked() on the old page.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      419d8c96
    • Michel Lespinasse's avatar
      mlock: fix race when munlocking pages in do_wp_page() · e15f8c01
      Michel Lespinasse authored
      vmscan can lazily find pages that are mapped within VM_LOCKED vmas, and
      set the PageMlocked bit on these pages, transfering them onto the
      unevictable list.  When do_wp_page() breaks COW within a VM_LOCKED vma,
      it may need to clear PageMlocked on the old page and set it on the new
      page instead.
      
      This change fixes an issue where do_wp_page() was clearing PageMlocked
      on the old page while the pte was still pointing to it (as well as
      rmap).  Therefore, we were not protected against vmscan immediately
      transfering the old page back onto the unevictable list.  This could
      cause pages to get stranded there forever.
      
      I propose to move the corresponding code to the end of do_wp_page(),
      after the pte (and rmap) have been pointed to the new page.
      Additionally, we can use munlock_vma_page() instead of
      clear_page_mlock(), so that the old page stays mlocked if there are
      still other VM_LOCKED vmas mapping it.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e15f8c01
    • Yinghai Lu's avatar
      memblock: don't adjust size in memblock_find_base() · e6d2e2b2
      Yinghai Lu authored
      While applying patch to use memblock to find aperture for 64bit x86.
      Ingo found system with 1g + force_iommu
      
      > No AGP bridge found
      > Node 0: aperture @ 38000000 size 32 MB
      > Aperture pointing to e820 RAM. Ignoring.
      > Your BIOS doesn't leave a aperture memory hole
      > Please enable the IOMMU option in the BIOS setup
      > This costs you 64 MB of RAM
      > Cannot allocate aperture memory hole (0,65536K)
      
      the corresponding code:
      
      	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
      	if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) {
      		printk(KERN_ERR
      			"Cannot allocate aperture memory hole (%lx,%uK)\n",
      				addr, aper_size>>10);
      		return 0;
      	}
      	memblock_x86_reserve_range(addr, addr + aper_size, "aperture64")
      
      fails because memblock core code align the size with 512M.  That could
      make size way too big.
      
      So don't align the size in that case.
      
      actually __memblock_alloc_base, the another caller already align that
      before calling that function.
      
      BTW. x86 does not use __memblock_alloc_base...
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6d2e2b2
    • Soren Hansen's avatar
      nbd: remove module-level ioctl mutex · de1f016f
      Soren Hansen authored
      Commit 2a48fc0a ("block: autoconvert trivial BKL users to private
      mutex") replaced uses of the BKL in the nbd driver with mutex
      operations.  Since then, I've been been seeing these lock ups:
      
       INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       qemu-nbd      D 0000000000000001     0 16115  16114 0x00000004
        ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000
        0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80
        ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020
       Call Trace:
        [<ffffffff815b9997>] __mutex_lock_slowpath+0xf7/0x180
        [<ffffffff815b93eb>] mutex_lock+0x2b/0x50
        [<ffffffffa071a21c>] nbd_ioctl+0x6c/0x1c0 [nbd]
        [<ffffffff812cb970>] blkdev_ioctl+0x230/0x730
        [<ffffffff811967a1>] block_ioctl+0x41/0x50
        [<ffffffff81175c03>] do_vfs_ioctl+0x93/0x370
        [<ffffffff81175f61>] sys_ioctl+0x81/0xa0
        [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b
      
      Instrumenting the nbd module's ioctl handler with some extra logging
      clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
      ioctl in the sense that it doesn't return until another ioctl asks the
      driver to disconnect.  However, that other ioctl blocks, waiting for the
      module-level mutex that replaced the BKL, and then we're stuck.
      
      This patch removes the module-level mutex altogether.  It's clearly
      wrong, and as far as I can see, it's entirely unnecessary, since the nbd
      driver maintains per-device mutexes, and I don't see anything that would
      require a module-level (or kernel-level, for that matter) mutex.
      Signed-off-by: default avatarSoren Hansen <soren@linux2go.dk>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@kernel.org>		[2.6.37.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de1f016f
    • Alexander Strakh's avatar
      drivers/rtc/rtc-proc.c: add module_put on error path in rtc_proc_open() · 24a6f5b8
      Alexander Strakh authored
      In file drivers/rtc/rtc-proc.c seq_open() can return -ENOMEM.
      
       86        if (!try_module_get(THIS_MODULE))
       87                return -ENODEV;
       88
       89        return single_open(file, rtc_proc_show, rtc);
      
      In this case before exiting (line 89) from rtc_proc_open the
      module_put(THIS_MODULE) must be called.
      
      Found by Linux Device Drivers Verification Project
      Signed-off-by: default avatarAlexander Strakh <strakh@ispras.ru>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24a6f5b8
    • Roland Stigge's avatar
      drivers/gpio/pca953x.c: add a mutex to fix race condition · 6e20fb18
      Roland Stigge authored
      Add a mutex to register communication and handling.  Without the mutex,
      GPIOs didn't switch as expected when toggled in a fast sequence of
      status changes of multiple outputs.
      Signed-off-by: default avatarRoland Stigge <stigge@antcom.de>
      Acked-by: default avatarEric Miao <eric.y.miao@gmail.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Marc Zyngier <maz@misterjones.org>
      Cc: Ben Gardner <bgardner@wabtec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e20fb18
    • Tejun Heo's avatar
      ptrace: use safer wake up on ptrace_detach() · 01e05e9a
      Tejun Heo authored
      The wake_up_process() call in ptrace_detach() is spurious and not
      interlocked with the tracee state.  IOW, the tracee could be running or
      sleeping in any place in the kernel by the time wake_up_process() is
      called.  This can lead to the tracee waking up unexpectedly which can be
      dangerous.
      
      The wake_up is spurious and should be removed but for now reduce its
      toxicity by only waking up if the tracee is in TRACED or STOPPED state.
      
      This bug can possibly be used as an attack vector.  I don't think it
      will take too much effort to come up with an attack which triggers oops
      somewhere.  Most sleeps are wrapped in condition test loops and should
      be safe but we have quite a number of places where sleep and wakeup
      conditions are expected to be interlocked.  Although the window of
      opportunity is tiny, ptrace can be used by non-privileged users and with
      some loading the window can definitely be extended and exploited.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01e05e9a
    • Boaz Harrosh's avatar
      vfs: call rcu_barrier after ->kill_sb() · d863b50a
      Boaz Harrosh authored
      In commit fa0d7e3d ("fs: icache RCU free inodes"), we use rcu free
      inode instead of freeing the inode directly.  It causes a crash when we
      rmmod immediately after we umount the volume[1].
      
      So we need to call rcu_barrier after we kill_sb so that the inode is
      freed before we do rmmod.  The idea is inspired by Aneesh Kumar.
      rcu_barrier will wait for all callbacks to end before preceding.  The
      original patch was done by Tao Ma, but synchronize_rcu() is not enough
      here.
      
      1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2Tested-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d863b50a
  2. 11 Feb, 2011 12 commits