1. 25 Apr, 2016 1 commit
    • Ilya Dryomov's avatar
      libceph: make authorizer destruction independent of ceph_auth_client · 6c1ea260
      Ilya Dryomov authored
      Starting the kernel client with cephx disabled and then enabling cephx
      and restarting userspace daemons can result in a crash:
      
          [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
          [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130
          [262671.584334] PGD 0
          [262671.635847] Oops: 0000 [#1] SMP
          [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
          [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
          [262672.268937] Workqueue: ceph-msgr con_work [libceph]
          [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
          [262672.428330] RIP: 0010:[<ffffffff811cd04a>]  [<ffffffff811cd04a>] kfree+0x5a/0x130
          [262672.535880] RSP: 0018:ffff880149aeba58  EFLAGS: 00010286
          [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
          [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
          [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
          [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
          [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
          [262673.131722] FS:  0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
          [262673.245377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
          [262673.417556] Stack:
          [262673.472943]  ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
          [262673.583767]  ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
          [262673.694546]  ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
          [262673.805230] Call Trace:
          [262673.859116]  [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
          [262673.968705]  [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
          [262674.078852]  [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph]
          [262674.134249]  [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph]
          [262674.189124]  [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph]
          [262674.243749]  [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph]
          [262674.297485]  [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
          [262674.350813]  [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph]
          [262674.403312]  [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph]
          [262674.454712]  [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690
          [262674.505096]  [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph]
          [262674.555104]  [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0
          [262674.604072]  [<ffffffff810901ea>] worker_thread+0x11a/0x470
          [262674.652187]  [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310
          [262674.699022]  [<ffffffff810957a2>] kthread+0xd2/0xf0
          [262674.744494]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
          [262674.789543]  [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70
          [262674.834094]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
      
      What happens is the following:
      
          (1) new MON session is established
          (2) old "none" ac is destroyed
          (3) new "cephx" ac is constructed
          ...
          (4) old OSD session (w/ "none" authorizer) is put
                ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)
      
      osd->o_auth.authorizer in the "none" case is just a bare pointer into
      ac, which contains a single static copy for all services.  By the time
      we get to (4), "none" ac, freed in (2), is long gone.  On top of that,
      a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
      so we end up trying to destroy a "none" authorizer with a "cephx"
      destructor operating on invalid memory!
      
      To fix this, decouple authorizer destruction from ac and do away with
      a single static "none" authorizer by making a copy for each OSD or MDS
      session.  Authorizers themselves are independent of ac and so there is
      no reason for destroy_authorizer() to be an ac op.  Make it an op on
      the authorizer itself by turning ceph_authorizer into a real struct.
      
      Fixes: http://tracker.ceph.com/issues/15447Reported-by: default avatarAlan Zhang <alan.zhang@linux.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      6c1ea260
  2. 24 Apr, 2016 2 commits
  3. 23 Apr, 2016 10 commits
  4. 22 Apr, 2016 24 commits
  5. 21 Apr, 2016 3 commits
    • cpaul@redhat.com's avatar
      drm/dp/mst: Validate port in drm_dp_payload_send_msg() · deba0a2a
      cpaul@redhat.com authored
      With the joys of things running concurrently, there's always a chance
      that the port we get passed in drm_dp_payload_send_msg() isn't actually
      valid anymore. Because of this, we need to make sure we validate the
      reference to the port before we use it otherwise we risk running into
      various race conditions. For instance, on the Dell MST monitor I have
      here for testing, hotplugging it enough times causes us to kernel panic:
      
      [drm:intel_mst_enable_dp] 1
      [drm:drm_dp_update_payload_part2] payload 0 1
      [drm:intel_get_hpd_pins] hotplug event received, stat 0x00200000, dig 0x10101011, pins 0x00000020
      [drm:intel_hpd_irq_handler] digital hpd port B - short
      [drm:intel_dp_hpd_pulse] got hpd irq on port B - short
      [drm:intel_dp_check_mst_status] got esi 00 10 00
      [drm:drm_dp_update_payload_part2] payload 1 1
      general protection fault: 0000 [#1] SMP
      …
      Call Trace:
       [<ffffffffa012b632>] drm_dp_update_payload_part2+0xc2/0x130 [drm_kms_helper]
       [<ffffffffa032ef08>] intel_mst_enable_dp+0xf8/0x180 [i915]
       [<ffffffffa0310dbd>] haswell_crtc_enable+0x3ed/0x8c0 [i915]
       [<ffffffffa030c84d>] intel_atomic_commit+0x5ad/0x1590 [i915]
       [<ffffffffa01db877>] ? drm_atomic_set_crtc_for_connector+0x57/0xe0 [drm]
       [<ffffffffa01dc4e7>] drm_atomic_commit+0x37/0x60 [drm]
       [<ffffffffa0130a3a>] drm_atomic_helper_set_config+0x7a/0xb0 [drm_kms_helper]
       [<ffffffffa01cc482>] drm_mode_set_config_internal+0x62/0x100 [drm]
       [<ffffffffa01d02ad>] drm_mode_setcrtc+0x3cd/0x4e0 [drm]
       [<ffffffffa01c18e3>] drm_ioctl+0x143/0x510 [drm]
       [<ffffffffa01cfee0>] ? drm_mode_setplane+0x1b0/0x1b0 [drm]
       [<ffffffff810f79a7>] ? hrtimer_start_range_ns+0x1b7/0x3a0
       [<ffffffff81212962>] do_vfs_ioctl+0x92/0x570
       [<ffffffff81590852>] ? __sys_recvmsg+0x42/0x80
       [<ffffffff81212eb9>] SyS_ioctl+0x79/0x90
       [<ffffffff816b4e32>] entry_SYSCALL_64_fastpath+0x1a/0xa4
      RIP  [<ffffffffa012b026>] drm_dp_payload_send_msg+0x146/0x1f0 [drm_kms_helper]
      
      Which occurs because of the hotplug event shown in the log, which ends
      up causing DRM's dp helpers to drop the port we're updating the payload
      on and panic.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      Reviewed-by: default avatarDavid Airlie <airlied@linux.ie>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      deba0a2a
    • Dave Airlie's avatar
      Merge branch 'linux-4.6' of git://github.com/skeggsb/linux into drm-fixes · 3addc4e3
      Dave Airlie authored
      Single nouveau regression fix
      * 'linux-4.6' of git://github.com/skeggsb/linux:
        drm/nouveau/kms: fix setting of default values for dithering properties
      3addc4e3
    • Ben Skeggs's avatar