1. 10 Aug, 2013 7 commits
  2. 09 Jul, 2013 1 commit
    • Josh Durgin's avatar
      libceph: fix invalid unsigned->signed conversion for timespec encoding · 8b8cf891
      Josh Durgin authored
      __kernel_time_t is a long, which cannot hold a U32_MAX on 32-bit
      architectures.  Just drop this check as it has limited value.
      
      This fixes a crash like:
      
      [  957.905812] kernel BUG at /srv/autobuild-ceph/gitbuilder.git/build/include/linux/ceph/decode.h:164!
      [  957.914849] Internal error: Oops - BUG: 0 [#1] SMP ARM
      [  957.919978] Modules linked in: rbd libceph libcrc32c ipmi_devintf ipmi_si ipmi_msghandler nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
      [  957.932547] CPU: 1    Tainted: G        W     (3.9.0-ceph-19bb6a83-highbank #1)
      [  957.939881] PC is at ceph_osdc_build_request+0x8c/0x4f8 [libceph]
      [  957.945967] LR is at 0xec520904
      [  957.949103] pc : [<bf13e76c>]    lr : [<ec520904>]    psr: 20000153
      [  957.949103] sp : ec753df8  ip : 00000001  fp : ec53e100
      [  957.960571] r10: ebef25c0  r9 : ec5fa400  r8 : ecbcc000
      [  957.965788] r7 : 00000000  r6 : 00000000  r5 : ffffffff  r4 : 00000020
      [  957.972307] r3 : 51cc8143  r2 : ec520900  r1 : ec753e58  r0 : ec520908
      [  957.978827] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
      [  957.986039] Control: 10c5387d  Table: 2c59c04a  DAC: 00000015
      [  957.991777] Process rbd (pid: 2138, stack limit = 0xec752238)
      [  957.997514] Stack: (0xec753df8 to 0xec754000)
      [  958.001864] 3de0:                                                       00000001 00000001
      [  958.010032] 3e00: 00000001 bf139744 ecbcc000 ec55a0a0 00000024 00000000 ebef25c0 fffffffe
      [  958.018204] 3e20: ffffffff 00000000 00000000 00000001 ec5fa400 ebef25c0 ec53e100 bf166b68
      [  958.026377] 3e40: 00000000 0000220f fffffffe ffffffff ec753e58 bf13ff24 51cc8143 05b25ed2
      [  958.034548] 3e60: 00000001 00000000 00000000 bf1688d4 00000001 00000000 00000000 00000000
      [  958.042720] 3e80: 00000001 00000060 ec5fa400 ed53d200 ed439600 ed439300 00000001 00000060
      [  958.050888] 3ea0: ec5fa400 ed53d200 00000000 bf16a320 00000000 ec53e100 00000040 ec753eb8
      [  958.059059] 3ec0: ec51df00 ed53d7c0 ed53d200 ed53d7c0 00000000 ed53d7c0 ec5fa400 bf16ed70
      [  958.067230] 3ee0: 00000000 00000060 00000002 ed53d200 00000000 bf16acf4 ed53d7c0 ec752000
      [  958.075402] 3f00: ed980e50 e954f5d8 00000000 00000060 ed53d240 ed53d258 ec753f80 c04f44a8
      [  958.083574] 3f20: edb7910c ec664700 01ade920 c02e4c44 00000060 c016b3dc ec51de40 01adfb84
      [  958.091745] 3f40: 00000060 ec752000 ec753f80 ec752000 00000060 c0108444 00000007 ec51de48
      [  958.099914] 3f60: ed0eb8c0 00000000 00000000 ec51de40 01adfb84 00000001 00000060 c0108858
      [  958.108085] 3f80: 00000000 00000000 51cc8143 00000060 01adfb84 00000007 00000004 c000dd68
      [  958.116257] 3fa0: 00000000 c000dbc0 00000060 01adfb84 00000007 01adfb84 00000060 01adfb80
      [  958.124429] 3fc0: 00000060 01adfb84 00000007 00000004 beded1a8 00000000 01adf2f0 01ade920
      [  958.132599] 3fe0: 00000000 beded180 b6811324 b6811334 800f0010 00000007 2e7f5821 2e7f5c21
      [  958.140815] [<bf13e76c>] (ceph_osdc_build_request+0x8c/0x4f8 [libceph]) from [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd])
      [  958.152739] [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) from [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd])
      [  958.164486] [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) from [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd])
      [  958.175967] [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) from [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd])
      [  958.185975] [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) from [<c02e4c44>] (bus_attr_store+0x20/0x2c)
      [  958.194850] [<c02e4c44>] (bus_attr_store+0x20/0x2c) from [<c016b3dc>] (sysfs_write_file+0x168/0x198)
      [  958.203984] [<c016b3dc>] (sysfs_write_file+0x168/0x198) from [<c0108444>] (vfs_write+0x9c/0x170)
      [  958.212768] [<c0108444>] (vfs_write+0x9c/0x170) from [<c0108858>] (sys_write+0x3c/0x70)
      [  958.220768] [<c0108858>] (sys_write+0x3c/0x70) from [<c000dbc0>] (ret_fast_syscall+0x0/0x30)
      [  958.229199] Code: e59d1058 e5913000 e3530000 ba000114 (e7f001f2)
      
      CC: stable@vger.kernel.org  # 3.4+
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      8b8cf891
  3. 03 Jul, 2013 30 commits
    • Yan, Zheng's avatar
      libceph: call r_unsafe_callback when unsafe reply is received · 61c5d6bf
      Yan, Zheng authored
      We can't use !req->r_sent to check if OSD request is sent for the
      first time, this is because __cancel_request() zeros req->r_sent
      when OSD map changes. Rather than adding a new variable to struct
      ceph_osd_request to indicate if it's sent for the first time, We
      can call the unsafe callback only when unsafe OSD reply is received.
      If OSD's first reply is safe, just skip calling the unsafe callback.
      
      The purpose of unsafe callback is adding unsafe request to a list,
      so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
      to wait for a write(2) that hasn't returned yet. So it's OK to add
      request to the unsafe list when the first OSD reply is received.
      (ceph_sync_write() returns after receiving the first OSD reply)
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      61c5d6bf
    • Yan, Zheng's avatar
      ceph: fix race between cap issue and revoke · 6ee6b953
      Yan, Zheng authored
      If we receive new caps from the auth MDS and the non-auth MDS is
      revoking the newly issued caps, we should release the caps from
      the non-auth MDS. The scenario is filelock's state changes from
      SYNC to LOCK. Non-auth MDS revokes Fc cap, the client gets Fc cap
      from the auth MDS at the same time.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      6ee6b953
    • Yan, Zheng's avatar
      ceph: fix cap revoke race · b1530f57
      Yan, Zheng authored
      If caps are been revoking by the auth MDS, don't consider them as
      issued even they are still issued by non-auth MDS. The non-auth
      MDS should also be revoking/exporting these caps, the client just
      hasn't received the cap revoke/export message.
      
      The race I encountered is: When caps are exporting to new MDS, the
      client receives cap import message and cap revoke message from the
      new MDS, then receives cap export message from the old MDS. When
      the client receives cap revoke message from the new MDS, the revoking
      caps are still issued by the old MDS, so the client does nothing.
      Later when the cap export message is received, the client removes
      the caps issued by the old MDS. (Another way to fix the race is
      calling ceph_check_caps() in handle_cap_export())
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      b1530f57
    • Yan, Zheng's avatar
      ceph: fix pending vmtruncate race · b415bf4f
      Yan, Zheng authored
      The locking order for pending vmtruncate is wrong, it can lead to
      following race:
      
              write                  wmtruncate work
      ------------------------    ----------------------
      lock i_mutex
      check i_truncate_pending   check i_truncate_pending
      truncate_inode_pages()     lock i_mutex (blocked)
      copy data to page cache
      unlock i_mutex
                                 truncate_inode_pages()
      
      The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()
      
      Fixes: http://tracker.ceph.com/issues/5453Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      b415bf4f
    • Sasha Levin's avatar
      ceph: avoid accessing invalid memory · 54464296
      Sasha Levin authored
      when mounting ceph with a dev name that starts with a slash, ceph
      would attempt to access the character before that slash. Since we
      don't actually own that byte of memory, we would trigger an
      invalid access:
      
      [   43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff
      [   43.500984] IP: [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
      [   43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060
      [   43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [   43.503006] Dumping ftrace buffer:
      [   43.503596]    (ftrace buffer empty)
      [   43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G        W    3.10.0-sasha #1129
      [   43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000
      [   43.505608] RIP: 0010:[<ffffffff818f3884>]  [<ffffffff818f3884>] parse_mount_options$
      [   43.506552] RSP: 0018:ffff880fa3413d08  EFLAGS: 00010286
      [   43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000
      [   43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000
      [   43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0
      [   43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0
      [   43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90
      [   43.509792] FS:  00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$
      [   43.509792] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0
      [   43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   43.509792] Stack:
      [   43.509792]  0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98
      [   43.509792]  0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000
      [   43.509792]  ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72
      [   43.509792] Call Trace:
      [   43.509792]  [<ffffffff818f3c72>] ceph_mount+0xa2/0x390
      [   43.509792]  [<ffffffff81226314>] ? pcpu_alloc+0x334/0x3c0
      [   43.509792]  [<ffffffff81282f8d>] mount_fs+0x8d/0x1a0
      [   43.509792]  [<ffffffff812263d0>] ? __alloc_percpu+0x10/0x20
      [   43.509792]  [<ffffffff8129f799>] vfs_kern_mount+0x79/0x100
      [   43.509792]  [<ffffffff812a224d>] do_new_mount+0xcd/0x1c0
      [   43.509792]  [<ffffffff812a2e8d>] do_mount+0x15d/0x210
      [   43.509792]  [<ffffffff81220e55>] ? strndup_user+0x45/0x60
      [   43.509792]  [<ffffffff812a2fdd>] SyS_mount+0x9d/0xe0
      [   43.509792]  [<ffffffff83fd816c>] tracesys+0xdd/0xe2
      [   43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $
      [   43.509792] RIP  [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
      [   43.509792]  RSP <ffff880fa3413d08>
      [   43.509792] CR2: ffff880fa3a97fff
      [   43.509792] ---[ end trace 22469cd81e93af51 ]---
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Reviewed-by: default avatarSage Weil <sage@inktan.com>
      54464296
    • Tyler Hicks's avatar
      libceph: Fix NULL pointer dereference in auth client code · 2cb33cac
      Tyler Hicks authored
      A malicious monitor can craft an auth reply message that could cause a
      NULL function pointer dereference in the client's kernel.
      
      To prevent this, the auth_none protocol handler needs an empty
      ceph_auth_client_ops->build_request() function.
      
      CVE-2013-1059
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Reported-by: default avatarChanam Park <chanam.park@hkpco.kr>
      Reviewed-by: default avatarSeth Arnold <seth.arnold@canonical.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      Cc: stable@vger.kernel.org
      2cb33cac
    • majianpeng's avatar
      ceph: Reconstruct the func ceph_reserve_caps. · 93faca6e
      majianpeng authored
      Drop ignored return value.  Fix allocation failure case to not leak.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      93faca6e
    • majianpeng's avatar
      fb3101b6
    • Jianpeng Ma's avatar
      ceph: remove sb_start/end_write in ceph_aio_write. · 0405a149
      Jianpeng Ma authored
      Either in vfs_write or io_submit,it call file_start/end_write.
      The different between file_start/end_write and sb_start/end_write is
      file_ only handle regular file.But i think in ceph_aio_write,it only
      for regular file.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Acked-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      0405a149
    • majianpeng's avatar
    • majianpeng's avatar
      ceph: fix sleeping function called from invalid context. · a1dc1937
      majianpeng authored
      [ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
      [ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
      [ 1121.231971] 1 lock held by mv/9831:
      [ 1121.231973]  #0:  (&(&ci->i_ceph_lock)->rlock){+.+...},at:[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]
      [ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
      [ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
      O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
      [ 1121.232027]  ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
      [ 1121.232045]  ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
      [ 1121.232052]  0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
      [ 1121.232056] Call Trace:
      [ 1121.232062]  [<ffffffff8168348c>] dump_stack+0x19/0x1b
      [ 1121.232067]  [<ffffffff81070435>] __might_sleep+0xe5/0x110
      [ 1121.232071]  [<ffffffff816899ba>] down_read+0x2a/0x98
      [ 1121.232080]  [<ffffffffa02baf70>] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
      [ 1121.232088]  [<ffffffffa02bbd7f>] ceph_getxattr+0x9f/0x1d0 [ceph]
      [ 1121.232093]  [<ffffffff81188d28>] vfs_getxattr+0xa8/0xd0
      [ 1121.232097]  [<ffffffff8118900b>] getxattr+0xab/0x1c0
      [ 1121.232100]  [<ffffffff811704f2>] ? final_putname+0x22/0x50
      [ 1121.232104]  [<ffffffff81155f80>] ? kmem_cache_free+0xb0/0x260
      [ 1121.232107]  [<ffffffff811704f2>] ? final_putname+0x22/0x50
      [ 1121.232110]  [<ffffffff8109e63d>] ? trace_hardirqs_on+0xd/0x10
      [ 1121.232114]  [<ffffffff816957a7>] ? sysret_check+0x1b/0x56
      [ 1121.232120]  [<ffffffff81189c9c>] SyS_fgetxattr+0x6c/0xc0
      [ 1121.232125]  [<ffffffff81695782>] system_call_fastpath+0x16/0x1b
      [ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
      [ 1121.232154] 1 lock held by mv/9831:
      [ 1121.232156]  #0:  (&(&ci->i_ceph_lock)->rlock){+.+...}, at:
      [<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]
      
      I think move the ci->i_ceph_lock down is safe because we can't free
      ceph_inode_info at there.
      
      CC: stable@vger.kernel.org  # 3.8+
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      a1dc1937
    • Yan, Zheng's avatar
    • Sage Weil's avatar
      rbd: fix a couple warnings · e976cad0
      Sage Weil authored
      gcc isn't quite smart enough and generates these warnings:
      
      drivers/block/rbd.c: In function 'rbd_img_request_fill':
      drivers/block/rbd.c:1266:22: warning: 'bio_list' may be used uninitialized in this function [-Wmaybe-uninitialized]
      drivers/block/rbd.c:2186:14: note: 'bio_list' was declared here
      drivers/block/rbd.c:2247:10: warning: 'pages' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      even though they are initialized for their respective code paths.
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      e976cad0
    • Yan, Zheng's avatar
      ceph: clear migrate seq when MDS restarts · 667ca05c
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      667ca05c
    • Yan, Zheng's avatar
      ceph: check migrate seq before changing auth cap · b8c2f3ae
      Yan, Zheng authored
      We may receive old request reply from the exporter MDS after receiving
      the importer MDS' cap import message.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      b8c2f3ae
    • Yan, Zheng's avatar
      ceph: fix race between page writeback and truncate · fc2744aa
      Yan, Zheng authored
      The client can receive truncate request from MDS at any time.
      So the page writeback code need to get i_size, truncate_seq and
      truncate_size atomically
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      fc2744aa
    • Yan, Zheng's avatar
      3803da49
    • Yan, Zheng's avatar
      ceph: fix cap release race · bb137f84
      Yan, Zheng authored
      ceph_encode_inode_release() can race with ceph_open() and release
      caps wanted by open files. So it should call __ceph_caps_wanted()
      to get the wanted caps.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      bb137f84
    • Yan, Zheng's avatar
      libceph: fix truncate size calculation · ccca4e37
      Yan, Zheng authored
      check the "not truncated yet" case
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ccca4e37
    • Yan, Zheng's avatar
      libceph: fix safe completion · eb845ff1
      Yan, Zheng authored
      handle_reply() calls complete_request() only if the first OSD reply
      has ONDISK flag.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      eb845ff1
    • Alex Elder's avatar
      rbd: take a little credit · d552c619
      Alex Elder authored
      Add a name to the list of authors.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      d552c619
    • Alex Elder's avatar
      rbd: use rwsem to protect header updates · cfbf6377
      Alex Elder authored
      Updating an image header needs to be protected to ensure it's
      done consistently.  However distinct headers can be updated
      concurrently without a problem.  Instead of using the global
      control lock to serialize headder updates, just rely on the header
      semaphore.  (It's already used, this just moves it out to cover
      a broader section of the code.)
      
      That leaves the control mutex protecting only the creation of rbd
      clients, so rename it.
      
      This resolves:
          http://tracker.ceph.com/issues/5222Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      cfbf6377
    • Alex Elder's avatar
      rbd: don't hold ctl_mutex to get/put device · 1ba0f1e7
      Alex Elder authored
      When an rbd device is first getting mapped, its device registration
      is protected the control mutex.  There is no need to do that though,
      because the device has already been assigned an id that's guaranteed
      to be unique.
      
      An unmap of an rbd device won't proceed if the device has a non-zero
      open count or is already being unmapped.  So there's no need to hold
      the control mutex in that case either.
      
      Finally, an rbd device can't be opened if it is being removed, and
      it won't go away if there is a non-zero open count.  So here too
      there's no need to hold the control mutex while getting or putting a
      reference to an rbd device's Linux device structure.
      
      Drop the mutex calls in these cases.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      1ba0f1e7
    • Alex Elder's avatar
      rbd: protect against concurrent unmaps · 82a442d2
      Alex Elder authored
      Make sure two concurrent unmap operations on the same rbd device
      won't collide, by only proceeding with the removal and cleanup of a
      device if is not already underway.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      82a442d2
    • Alex Elder's avatar
      rbd: set removing flag while holding list lock · 751cc0e3
      Alex Elder authored
      When unmapping a device, its id is supplied, and that is used to
      look up which rbd device should be unmapped.  Looking up the
      device involves searching the rbd device list while holding
      a spinlock that protects access to that list.
      
      Currently all of this is done under protection of the control lock,
      but that protection is going away soon.  To ensure the rbd_dev is
      still valid (still on the list) while setting its REMOVING flag, do
      so while still holding the list lock.  To do so, get rid of
      __rbd_get_dev(), and open code what it did in the one place it
      was used.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      751cc0e3
    • Alex Elder's avatar
      libceph: print more info for short message header · 4974341e
      Alex Elder authored
      If an osd client response message arrives that has a front section
      that's too big for the buffer set aside to receive it, a warning
      gets reported and a new buffer is allocated.
      
      The warning says nothing about which connection had the problem.
      Add the peer type and number to what gets reported, to be a bit more
      informative.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      4974341e
    • Alex Elder's avatar
      rbd: protect against duplicate client creation · 08f75463
      Alex Elder authored
      If more than one rbd image has the same ceph cluster configuration
      (same options, same set of monitors, same keys) they normally share
      a single rbd client.
      
      When an image is getting mapped, rbd looks to see if an existing
      client can be used, and creates a new one if not.
      
      The lookup and creation are not done under a common lock though, so
      mapping two images concurrently could lead to duplicate clients
      getting set up needlessly.  This isn't a major problem, but it's
      wasteful and different from what's intended.
      
      This patch fixes that by using the control mutex to protect
      both the lookup and (if needed) creation of the client.  It
      was previously used just when creating.
      
      This resolves:
          http://tracker.ceph.com/issues/3094Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      08f75463
    • Alex Elder's avatar
      rbd: clean up a few things in the refresh path · 3b5cf2a2
      Alex Elder authored
      This includes a few relatively small fixes I found while examining
      the code that refreshes image information.
      
      This resolves:
          http://tracker.ceph.com/issues/5040Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      3b5cf2a2
    • Alex Elder's avatar
      rbd: flush dcache after zeroing page data · e2156054
      Alex Elder authored
      Neither zero_bio_chain() nor zero_pages() contains a call to flush
      caches after zeroing a portion of a page.  This can cause problems
      on architectures that have caches that allow virtual address
      aliasing.
      
      This resolves:
          http://tracker.ceph.com/issues/4777Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e2156054
    • Alex Elder's avatar
      libceph: add lingering request reference when registered · 96e4dac6
      Alex Elder authored
      When an osd request is set to linger, the osd client holds onto the
      request so it can be re-submitted following certain osd map changes.
      The osd client holds a reference to the request until it is
      unregistered.  This is used by rbd for watch requests.
      
      Currently, the reference is taken when the request is marked with
      the linger flag.  This means that if an error occurs after that
      time but before the the request completes successfully, that
      reference is leaked.
      
      There's really no reason to take the reference until the request is
      registered in the the osd client's list of lingering requests, and
      that only happens when the lingering (watch) request completes
      successfully.
      
      So take that reference only when it gets registered following
      succesful completion, and drop it (as before) when the request
      gets unregistered.  This avoids the reference problem on error
      in rbd.
      
      Rearrange ceph_osdc_unregister_linger_request() to avoid using
      the request pointer after it may have been freed.
      
      And hold an extra reference in kick_requests() while handling
      a linger request that has not yet been registered, to ensure
      it doesn't go away.
      
      This resolves:
          http://tracker.ceph.com/issues/3859Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      96e4dac6
  4. 01 Jul, 2013 2 commits