1. 30 Jun, 2011 2 commits
  2. 24 May, 2011 16 commits
    • Namhyung Kim's avatar
      loop: handle on-demand devices correctly · a1c15c59
      Namhyung Kim authored
      When finding or allocating a loop device, loop_probe() did not take
      partition numbers into account so that it can result to a different
      device. Consider following example:
      
      $ sudo modprobe loop max_part=15
      $ ls -l /dev/loop*
      brw-rw---- 1 root disk 7,   0 2011-05-24 22:16 /dev/loop0
      brw-rw---- 1 root disk 7,  16 2011-05-24 22:16 /dev/loop1
      brw-rw---- 1 root disk 7,  32 2011-05-24 22:16 /dev/loop2
      brw-rw---- 1 root disk 7,  48 2011-05-24 22:16 /dev/loop3
      brw-rw---- 1 root disk 7,  64 2011-05-24 22:16 /dev/loop4
      brw-rw---- 1 root disk 7,  80 2011-05-24 22:16 /dev/loop5
      brw-rw---- 1 root disk 7,  96 2011-05-24 22:16 /dev/loop6
      brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
      $ sudo mknod /dev/loop8 b 7 128
      $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
      $ sudo losetup -a
      /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
      $ ls -l /dev/loop*
      brw-rw---- 1 root disk 7,    0 2011-05-24 22:16 /dev/loop0
      brw-rw---- 1 root disk 7,   16 2011-05-24 22:16 /dev/loop1
      brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
      brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
      brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
      brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
      brw-rw---- 1 root disk 7,   32 2011-05-24 22:16 /dev/loop2
      brw-rw---- 1 root disk 7,   48 2011-05-24 22:16 /dev/loop3
      brw-rw---- 1 root disk 7,   64 2011-05-24 22:16 /dev/loop4
      brw-rw---- 1 root disk 7,   80 2011-05-24 22:16 /dev/loop5
      brw-rw---- 1 root disk 7,   96 2011-05-24 22:16 /dev/loop6
      brw-rw---- 1 root disk 7,  112 2011-05-24 22:16 /dev/loop7
      brw-r--r-- 1 root root 7,  128 2011-05-24 22:17 /dev/loop8
      
      After this patch, /dev/loop8 - instead of /dev/loop128 - was
      accessed correctly.
      
      In addition, 'range' passed to blk_register_region() should
      include all range of dev_t that LOOP_MAJOR can address. It does
      not need to be limited by partition numbers unless 'max_loop'
      param was specified.
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Cc: Laurent Vivier <Laurent.Vivier@bull.net>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      a1c15c59
    • Namhyung Kim's avatar
      loop: limit 'max_part' module param to DISK_MAX_PARTS · 78f4bb36
      Namhyung Kim authored
      The 'max_part' parameter controls the number of maximum partition
      a loop block device can have. However if a user specifies very
      large value it would exceed the limitation of device minor number
      and can cause a kernel panic (or, at least, produce invalid
      device nodes in some cases).
      
      On my desktop system, following command kills the kernel. On qemu,
      it triggers similar oops but the kernel was alive:
      
      $ sudo modprobe loop max_part0000
       ------------[ cut here ]------------
       kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
       invalid opcode: 0000 [#1] SMP
       last sysfs file:
       CPU 0
       Modules linked in: loop(+)
      
       Pid: 43, comm: insmod Tainted: G        W   2.6.39-qemu+ #155 Bochs Bochs
       RIP: 0010:[<ffffffff8113ce61>]  [<ffffffff8113ce61>] internal_create_group=
      +0x2a/0x170
       RSP: 0018:ffff880007b3fde8  EFLAGS: 00000246
       RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
       RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
       RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
       R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
       R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
       FS:  0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
      00
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
       Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
      0)
       Stack:
        ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
        0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
        0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
       Call Trace:
        [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
        [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
        [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
        [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
        [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
        [<ffffffff8116f527>] add_disk+0xdf/0x290
        [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
        [<ffffffffa0006000>] ? 0xffffffffa0005fff
        [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
        [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
        [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
       Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
       48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
      48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
       RIP  [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
        RSP <ffff880007b3fde8>
       ---[ end trace a123eb592043acad ]---
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Cc: Laurent Vivier <Laurent.Vivier@bull.net>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      78f4bb36
    • Andrew Morton's avatar
      drbd: fix warning · 0ddf72be
      Andrew Morton authored
      In file included from drivers/block/drbd/drbd_main.c:54:                        drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type
      
      Forward declarations of enums do not work.
      
      Fix it unpleasantly by moving the prototype.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarLars Ellenberg <drbd-dev@lists.linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0ddf72be
    • Philipp Reisner's avatar
      drbd: fix warning · 9b2f61ae
      Philipp Reisner authored
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      9b2f61ae
    • Bart Van Assche's avatar
      drbd: Fix spelling · 24c4830c
      Bart Van Assche authored
      Found these with the help of ispell -l.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      24c4830c
    • Lars Ellenberg's avatar
      drbd: fix schedule in atomic · 9a0d9d03
      Lars Ellenberg authored
      An administrative detach used to request a state change directly to D_DISKLESS,
      first suspending IO to avoid the last put_ldev() occuring from an endio handler,
      potentially in irq context.
      
      This is not enough on the receiving side (typically secondary), we may miss
      some peer_req on the way to local disk, which then may do the last put_ldev()
      from their drbd_peer_request_endio().
      
      This patch makes the detach always go through the intermediate D_FAILED state.
      We may consider to rename it D_DETACHING.
      
      Alternative approach would be to create yet an other work item to be scheduled
      on the worker, do the destructor work from there, and get the timing right.
      
      manually picked commit 564040f from the drbd 8.4 branch.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      9a0d9d03
    • Philipp Reisner's avatar
      drbd: Take a more conservative approach when deciding max_bio_size · 99432fcc
      Philipp Reisner authored
      The old (optimistic) implementation could shrink the bio size
      on an primary device.
      
      Shrinking the bio size on a primary device is bad. Since there
      we might get BIOs with the old (bigger) size shortly after
      we published the new size.
      
      The new implementation is more conservative, and eventually
      increases the max_bio_size on a primary device (which is valid).
      It does so, when it knows the local limit AND the remote limit.
      
       We cache the last seen max_bio_size of the peer in the meta
       data, and rely on that, to make the operation of single
       nodes more efficient.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      99432fcc
    • Philipp Reisner's avatar
    • Philipp Reisner's avatar
    • Philipp Reisner's avatar
      drbd: Fix for the connection problems on high latency links · a8e40792
      Philipp Reisner authored
      It seems that the real cause of all the issues where that
      we did not noticed in drbd_try_connect() when the other
      guy closes one socket if the round trip time gets higher
      than 100ms. There were that 100ms hard coded!
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      a8e40792
    • Lars Ellenberg's avatar
      drbd: fix potential activity log refcount imbalance in error path · 76727f68
      Lars Ellenberg authored
      It is no longer sufficient to trigger on local WRITE,
      we need to check on (rq_state & RQ_IN_ACT_LOG)
      before calling drbd_al_complete_io also in the error path.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      76727f68
    • Philipp Reisner's avatar
    • Lars Ellenberg's avatar
      drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int · f36af18c
      Lars Ellenberg authored
      If there is no replication traffic within the idle timeout
      (ping-int seconds), DRBD will send a P_PING,
      and adjust the timeout to ping-timeout.
      
      If there is no P_PING_ACK received within this ping-timeout,
      DRBD finally drops the connection, and tries to re-establish it.
      
      To decide which timeout was active, we compared the current timeout
      with the ping-timeout, and dropped the connection, if that was the case.
      
      By default, ping-int is 10 seconds, ping-timeout is 500 ms.
      
      Unfortunately, if you configure ping-timeout to be the same as ping-int,
      expiry of the idle-timeout had been mistaken for a missing ping ack,
      and caused an immediate reconnection attempt.
      
      Fix:
      Allow both timeouts to be equal, use a local variable
      to store which timeout is active.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      f36af18c
    • Lars Ellenberg's avatar
      drbd: fix potential distributed deadlock · 53ea4331
      Lars Ellenberg authored
      We limit ourselves to a configurable maximum number of pages used as
      temporary bio pages.
      
      If the configured "max_buffers" is not big enough to match the bandwidth
      of the respective deployment, a distributed deadlock could be triggered
      by e.g. fast online verify and heavy application IO.
      
      TCP connections would block on congestion, because both receivers
      would wait on pages to become available.
      
      Fortunately the respective senders in this case would be able to give
      back some pages already. So do that.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      53ea4331
    • Lars Ellenberg's avatar
      lru_cache.h: fix comments referring to ts_ instead of lc_ · 600942e0
      Lars Ellenberg authored
      For some time we contemplated calling the "struct lru_cache"
      a "struct tracked_set", and some comments kept the ts_ prefix.
      
      Fix those to match the member field names.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      600942e0
    • Philipp Reisner's avatar
      drbd: Fix for application IO with the on-io-error=pass-on policy · 738a84b2
      Philipp Reisner authored
      In case a write failes on the local disk, go into D_INCONSISTENT
      disk state. That causes future reads of that block to be shipped
      to the peer.
      
      Read retry remote was already in place.
      
      Actually the documentation needs to get fixed now. Since the
      application is still shielded from the error. (as long as we have
      only a single disk failing) The difference to detach is that
      we keep the disk. And therefore might keep all the other, still
      working sectors up to date.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      738a84b2
  3. 19 May, 2011 1 commit
  4. 18 May, 2011 3 commits
    • Konrad Rzeszutek Wilk's avatar
      xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · c9ce9e43
      Konrad Rzeszutek Wilk authored
      If the backends, which use these two functions, are compiled as
      a module we need these two functions to be exported.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9ce9e43
    • Konrad Rzeszutek Wilk's avatar
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52
      Konrad Rzeszutek Wilk authored
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5431d52
    • Jan Beulich's avatar
      xen/blkback: don't fail empty barrier requests · 8ab52150
      Jan Beulich authored
      The sector number on empty barrier requests may (will?) be -1, which,
      given that it's being treated as unsigned 64-bit quantity, will almost
      always exceed the actual (virtual) disk's size.
      
      Inspired by Konrad's "When writting barriers set the sector number to
      zero...".
      
      While at it also add overflow checking to the math in vbd_translate().
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8ab52150
  5. 13 May, 2011 1 commit
  6. 12 May, 2011 17 commits