1. 04 Mar, 2010 1 commit
    • Yehuda Sadeh's avatar
      ceph: reset osd after relevant messages timed out · 422d2cb8
      Yehuda Sadeh authored
      This simplifies the process of timing out messages. We
      keep lru of current messages that are in flight. If a
      timeout has passed, we reset the osd connection, so that
      messages will be retransmitted.  This is a failsafe in case
      we hit some sort of problem sending out message to the OSD.
      Normally, we'll get notification via an updated osdmap if
      there are problems.
      
      If a request is older than the keepalive timeout, send a
      keepalive to ensure we detect any breaks in the TCP connection.
      Signed-off-by: default avatarYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      422d2cb8
  2. 01 Mar, 2010 9 commits
  3. 26 Feb, 2010 2 commits
    • Sage Weil's avatar
      ceph: remove bogus mds forward warning · 080af17e
      Sage Weil authored
      The must_resend flag is always true, not false.  In any case, we can
      just ignore it anyway.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      080af17e
    • Sage Weil's avatar
      ceph: remove fragile __map_osds optimization · c99eb1c7
      Sage Weil authored
      We used to try to avoid freeing and then reallocating the osd
      struct.  This is a bit fragile due to potential interactions with
      other references (beyond o_requests), and may be the cause of
      this crash:
      
      [120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null)
      [120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277
      [120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0
      [120633.443292] Oops: 0000 [#1] PREEMPT SMP
      [120633.443292] last sysfs file: /sys/kernel/uevent_seqnum
      [120633.443292] CPU 1
      [120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button
      [120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL
      [120633.443292] RIP: 0010:[<ffffffff812549b6>]  [<ffffffff812549b6>] rb_erase+0x11d/0x277
      [120633.443292] RSP: 0018:ffff8800f7b13a50  EFLAGS: 00010246
      [120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000
      [120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000
      [120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004
      [120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48
      [120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001
      [120633.443292] FS:  00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000
      [120633.443292] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0
      [120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40)
      [120633.443292] Stack:
      [120633.443292]  ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86
      [120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b
      [120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0
      [120633.443292] Call Trace:
      [120633.443292]  [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph]
      [120633.443292]  [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph]
      [120633.443292]  [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph]
      [120633.443292]  [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph]
      [120633.443292]  [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph]
      
      Since we're probably screwed anyway if a small kmalloc is
      failing, don't bother with trying to be clever here.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      c99eb1c7
  4. 25 Feb, 2010 2 commits
  5. 23 Feb, 2010 6 commits
  6. 19 Feb, 2010 4 commits
  7. 17 Feb, 2010 12 commits
    • Sage Weil's avatar
      2c27c9a5
    • Sage Weil's avatar
      ceph: v0.19 release · a17d6473
      Sage Weil authored
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      a17d6473
    • Sage Weil's avatar
      ceph: use rbtree for pg pools; decode new osdmap format · 4fc51be8
      Sage Weil authored
      Since we can now create and destroy pg pools, the pool ids will be sparse,
      and an array no longer makes sense for looking up by pool id.  Use an
      rbtree instead.
      
      The OSDMap encoding also no longer has a max pool count (previously used to
      allocate the array).  There is a new pool_max, that is the largest pool id
      we've ever used, although we don't actually need it in the client.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      4fc51be8
    • Sage Weil's avatar
      ceph: fix memory leak when destroying osdmap with pg_temp mappings · 9794b146
      Sage Weil authored
      Also move _lookup_pg_mapping into a helper.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      9794b146
    • Sage Weil's avatar
      ceph: fix iterate_caps removal race · 7c1332b8
      Sage Weil authored
      We need to be able to iterate over all caps on a session with a
      possibly slow callback on each cap.  To allow this, we used to
      prevent cap reordering while we were iterating.  However, we were
      not safe from races with removal: removing the 'next' cap would
      make the next pointer from list_for_each_entry_safe be invalid,
      and cause a lock up or similar badness.
      
      Instead, we keep an iterator pointer in the session pointing to
      the current cap.  As before, we avoid reordering.  For removal,
      if the cap isn't the current cap we are iterating over, we are
      fine.  If it is, we clear cap->ci (to mark the cap as pending
      removal) but leave it in the session list.  In iterate_caps, we
      can safely finish removal and get the next cap pointer.
      
      While we're at it, clean up put_cap to not take a cap reservation
      context, as it was never used.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      7c1332b8
    • Sage Weil's avatar
      ceph: clean up readdir caps reservation · 85ccce43
      Sage Weil authored
      Use a global counter for the minimum number of allocated caps instead of
      hard coding a check against readdir_max.  This takes into account multiple
      client instances, and avoids examining the superblock mount options when a
      cap is dropped.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      85ccce43
    • Sage Weil's avatar
      ceph: fix authentication races, auth_none oops · 5ce6e9db
      Sage Weil authored
      Call __validate_auth() under monc->mutex, and use helper for
      initial hello so that the pending_auth flag is set.  This fixes
      possible races in which we have an authentication request (hello
      or otherwise) pending and send another one.  In particular, with
      auth_none, we _never_ want to call ceph_build_auth() from
      __validate_auth(), since the ->build_request() method is NULL.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      5ce6e9db
    • Sage Weil's avatar
      ceph: use rbtree for mon statfs requests · 85ff03f6
      Sage Weil authored
      An rbtree is lighter weight, particularly given we will generally have
      very few in-flight statfs requests.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      85ff03f6
    • Sage Weil's avatar
      ceph: use rbtree for snap_realms · a105f00c
      Sage Weil authored
      Switch from radix tree to rbtree for snap realms.  This is much more
      appropriate given that realm keys are few and far between.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      a105f00c
    • Sage Weil's avatar
      ceph: use rbtree for mds requests · 44ca18f2
      Sage Weil authored
      The rbtree is a more appropriate data structure than a radix_tree.  It
      avoids extra memory usage and simplifies the code.
      
      It also fixes a bug where the debugfs 'mdsc' file wasn't including the
      most recent mds request.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      44ca18f2
    • Sage Weil's avatar
      ceph: cancel delayed work when closing connection · 91e45ce3
      Sage Weil authored
      This ensures that if/when we reopen the connection, we can requeue work on
      the connection immediately, without waiting for an old timer to expire.
      Queue new delayed work inside con->mutex to avoid any race.
      
      This fixes problems with clients failing to reconnect to the MDS due to
      the client_reconnect message arriving too late (due to waiting for an old
      delayed work timeout to expire).
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      91e45ce3
    • Sage Weil's avatar
      ceph: allow connection to be reopened by fault callback · e2663ab6
      Sage Weil authored
      Fix the messenger to allow a ceph_con_open() during the fault callback.
      Previously the work wasn't getting queued on the connection because the
      fault path avoids requeued work (normally spurious).  Loop on reopening by
      checking for the OPENING state bit.
      
      This fixes OSD reconnects when a TCP connection drops.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      e2663ab6
  8. 15 Feb, 2010 1 commit
  9. 14 Feb, 2010 1 commit
  10. 11 Feb, 2010 2 commits