1. 07 Jun, 2017 36 commits
    • Bart Van Assche's avatar
      IB/multicast: Check ib_find_pkey() return value · 029555a5
      Bart Van Assche authored
      commit d3a2418e upstream.
      
      This patch avoids that Coverity complains about not checking the
      ib_find_pkey() return value.
      
      Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      029555a5
    • Bart Van Assche's avatar
      IB/mad: Fix an array index check · 67ae25b4
      Bart Van Assche authored
      commit 2fe2f378 upstream.
      
      The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
      (80) elements. Hence compare the array index with that value instead
      of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
      reports the following:
      
      Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
      
      Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      67ae25b4
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it · 882f8f11
      Steven Rostedt (Red Hat) authored
      commit 847fa1a6 upstream.
      
      With new binutils, gcc may get smart with its optimization and change a jmp
      from a 5 byte jump to a 2 byte one even though it was jumping to a global
      function. But that global function existed within a 2 byte radius, and gcc
      was able to optimize it. Unfortunately, that jump was also being modified
      when function graph tracing begins. Since ftrace expected that jump to be 5
      bytes, but it was only two, it overwrote code after the jump, causing a
      crash.
      
      This was fixed for x86_64 with commit 8329e818, with the same subject as
      this commit, but nothing was done for x86_32.
      
      Fixes: d61f82d0 ("ftrace: use dynamic patching for updating mcount calls")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      882f8f11
    • Steffen Maier's avatar
      scsi: zfcp: fix rport unblock race with LUN recovery · 1c2287b9
      Steffen Maier authored
      commit 6f2ce1c6 upstream.
      
      It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
      with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
      window when zfcp detected an unavailable rport but
      fc_remote_port_delete(), which is asynchronous via
      zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
      
      However, for the case when the rport becomes available again, we should
      prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
      zfcp has to open each LUN with the FCP channel hardware before it can
      send I/O to a LUN.  So if a port already has LUNs attached and we
      unblock the rport just after port recovery, recoveries of LUNs behind
      this port can still be pending which in turn force
      zfcp_scsi_queuecommand() to unnecessarily finish requests with
      DID_IMM_RETRY.
      
      This also opens a time window with unblocked rport (until the followup
      LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
      this time window fc_timed_out() cannot work as desired and such command
      would indeed time out and trigger scsi_eh. This prevents a clean and
      timely path failover.  This should not happen if the path issue can be
      recovered on FC transport layer such as path issues involving RSCNs.
      
      Fix this by only calling zfcp_scsi_schedule_rport_register(), to
      asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
      children of the rport have finished and no new recoveries of equal or
      higher order were triggered meanwhile.  Finished intentionally includes
      any recovery result no matter if successful or failed (still unblock
      rport so other successful LUNs work).  For simplicity, we check after
      each finished LUN recovery if there is another LUN recovery pending on
      the same port and then do nothing.  We handle the special case of a
      successful recovery of a port without LUN children the same way without
      changing this case's semantics.
      
      For debugging we introduce 2 new trace records written if the rport
      unblock attempt was aborted due to still unfinished or freshly triggered
      recovery. The records are only written above the default trace level.
      
      Benjamin noticed the important special case of new recovery that can be
      triggered between having given up the erp_lock and before calling
      zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
      following sequence:
      
      ERP thread                 rport_work      other context
      -------------------------  --------------  --------------------------------
      port is unblocked, rport still blocked,
       due to pending/running ERP action,
       so ((port->status & ...UNBLOCK) != 0)
       and (port->rport == NULL)
      unlock ERP
      zfcp_erp_action_cleanup()
      case ZFCP_ERP_ACTION_REOPEN_LUN:
      zfcp_erp_try_rport_unblock()
      ((status & ...UNBLOCK) != 0) [OLD!]
                                                 zfcp_erp_port_reopen()
                                                 lock ERP
                                                 zfcp_erp_port_block()
                                                 port->status clear ...UNBLOCK
                                                 unlock ERP
                                                 zfcp_scsi_schedule_rport_block()
                                                 port->rport_task = RPORT_DEL
                                                 queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task != RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_block()
                                 if (!port->rport) return
      zfcp_scsi_schedule_rport_register()
      port->rport_task = RPORT_ADD
      queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task == RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_register()
                                 (port->rport == NULL)
                                 rport = fc_remote_port_add()
                                 port->rport = rport;
      
      Now the rport was erroneously unblocked while the zfcp_port is blocked.
      This is another situation we want to avoid due to scsi_eh
      potential. This state would at least remain until the new recovery from
      the other context finished successfully, or potentially forever if it
      failed.  In order to close this race, we take the erp_lock inside
      zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
      LUN.  With that, the possible corresponding rport state sequences would
      be: (unblock[ERP thread],block[other context]) if the ERP thread gets
      erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
      (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
      after the other context has already cleard ...UNBLOCK from port->status.
      
      Since checking fields of struct erp_action is unsafe because they could
      have been overwritten (re-used for new recovery) meanwhile, we only
      check status of zfcp_port and LUN since these are only changed under
      erp_lock elsewhere. Regarding the check of the proper status flags (port
      or port_forced are similar to the shown adapter recovery):
      
      [zfcp_erp_adapter_shutdown()]
      zfcp_erp_adapter_reopen()
       zfcp_erp_adapter_block()
        * clear UNBLOCK ---------------------------------------+
       zfcp_scsi_schedule_rports_block()                       |
       write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
       zfcp_erp_action_enqueue()                            |  |
        zfcp_erp_setup_act()                                |  |
         * set ERP_INUSE -----------------------------------|--|--+
       write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
      .context-switch.                                         |  |
      zfcp_erp_thread()                                        |  |
       zfcp_erp_strategy()                                     |  |
        write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
        ...                                                 |  |  |
        zfcp_erp_strategy_check_target()                    |  |  |
         zfcp_erp_strategy_check_adapter()                  |  |  |
          zfcp_erp_adapter_unblock()                        |  |  |
           * set UNBLOCK -----------------------------------|--+  |
        zfcp_erp_action_dequeue()                           |     |
         * clear ERP_INUSE ---------------------------------|-----+
        ...                                                 |
        write_unlock_irqrestore(&adapter->erp_lock, flags);-+
      
      Hence, we should check for both UNBLOCK and ERP_INUSE because they are
      interleaved.  Also we need to explicitly check ERP_FAILED for the link
      down case which currently does not clear the UNBLOCK flag in
      zfcp_fsf_link_down_info_eval().
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
      Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors")
      Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
      Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
      Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1c2287b9
    • Steffen Maier's avatar
      scsi: zfcp: do not trace pure benign residual HBA responses at default level · 5d35a963
      Steffen Maier authored
      commit 56d23ed7 upstream.
      
      Since quite a while, Linux issues enough SCSI commands per scsi_device
      which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
      and SAM_STAT_GOOD.  This floods the HBA trace area and we cannot see
      other and important HBA trace records long enough.
      
      Therefore, do not trace HBA response errors for pure benign residual
      under counts at the default trace level.
      
      This excludes benign residual under count combined with other validity
      bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL.  For all those other
      cases, we still do want to see both the HBA record and the corresponding
      SCSI record by default.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5d35a963
    • Benjamin Block's avatar
      scsi: zfcp: fix use-after-"free" in FC ingress path after TMF · 54539504
      Benjamin Block authored
      commit dac37e15 upstream.
      
      When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
      eh_target_reset_handler(), it expects us to relent the ownership over
      the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
      or target - when returning with SUCCESS from the callback ('release'
      them).  SCSI EH can then reuse those commands.
      
      We did not follow this rule to release commands upon SUCCESS; and if
      later a reply arrived for one of those supposed to be released commands,
      we would still make use of the scsi_cmnd in our ingress tasklet. This
      will at least result in undefined behavior or a kernel panic because of
      a wrong kernel pointer dereference.
      
      To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
      *)->data in the matching scope if a TMF was successful. This is done
      under the locks (struct zfcp_adapter *)->abort_lock and (struct
      zfcp_reqlist *)->lock to prevent the requests from being removed from
      the request-hashtable, and the ingress tasklet from making use of the
      scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().
      
      For cases where a reply arrives during SCSI EH, but before we get a
      chance to NULLify the pointer - but before we return from the callback
      -, we assume that the code is protected from races via the CAS operation
      in blk_complete_request() that is called in scsi_done().
      
      The following stacktrace shows an example for a crash resulting from the
      previous behavior:
      
      Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
      Oops: 0038 [#1] SMP
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted
      task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
      Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
      Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
                 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
                 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
                 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
      Krnl Code: 00000000001156a2: a7190000        lghi    %r1,0
                 00000000001156a6: a7380015        lhi    %r3,21
                #00000000001156aa: e32050000008    ag    %r2,0(%r5)
                >00000000001156b0: 482022b0        lh    %r2,688(%r2)
                 00000000001156b4: ae123000        sigp    %r1,%r2,0(%r3)
                 00000000001156b8: b2220020        ipm    %r2
                 00000000001156bc: 8820001c        srl    %r2,28
                 00000000001156c0: c02700000001    xilf    %r2,1
      Call Trace:
      ([<0000000000000000>] 0x0)
       [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
       [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
       [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
       [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
       [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
       [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
       [<0000000000141fd4>] tasklet_action+0x9c/0x170
       [<0000000000141550>] __do_softirq+0xe8/0x258
       [<000000000010ce0a>] do_softirq+0xba/0xc0
       [<000000000014187c>] irq_exit+0xc4/0xe8
       [<000000000046b526>] do_IRQ+0x146/0x1d8
       [<00000000005d6a3c>] io_return+0x0/0x8
       [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
      ([<0000000000000000>] 0x0)
       [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
       [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
       [<0000000000114782>] smp_start_secondary+0xda/0xe8
       [<00000000005d6efe>] restart_int_handler+0x56/0x6c
       [<0000000000000000>] 0x0
      Last Breaking-Event-Address:
       [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0
      Suggested-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      54539504
    • Rabin Vincent's avatar
      block: protect iterate_bdevs() against concurrent close · 97c6c855
      Rabin Vincent authored
      commit af309226 upstream.
      
      If a block device is closed while iterate_bdevs() is handling it, the
      following NULL pointer dereference occurs because bdev->b_disk is NULL
      in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
      turn called by the mapping_cap_writeback_dirty() call in
      __filemap_fdatawrite_range()):
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
       IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       PGD 9e62067 PUD 9ee8067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
       task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
       RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
       RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
       RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
       R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
       FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
       Stack:
        ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
        0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
        ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
       Call Trace:
        [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
        [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
        [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
        [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
        [<ffffffff811b2763>] sys_sync+0x63/0x90
        [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
       Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
       RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
        RSP <ffff880009f5fe68>
       CR2: 0000000000000508
       ---[ end trace 2487336ceb3de62d ]---
      
      The crash is easily reproducible by running the following command, if an
      msleep(100) is inserted before the call to func() in iterate_devs():
      
       while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
      
      Fix it by holding the bd_mutex across the func() call and only calling
      func() if the bdev is opened.
      
      Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices")
      Reported-and-tested-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      97c6c855
    • Nicolai Stange's avatar
      f2fs: set ->owner for debugfs status file's file_operations · 49c82256
      Nicolai Stange authored
      commit 05e6ea26 upstream.
      
      The struct file_operations instance serving the f2fs/status debugfs file
      lacks an initialization of its ->owner.
      
      This means that although that file might have been opened, the f2fs module
      can still get removed. Any further operation on that opened file, releasing
      included,  will cause accesses to unmapped memory.
      
      Indeed, Mike Marshall reported the following:
      
        BUG: unable to handle kernel paging request at ffffffffa0307430
        IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
        <...>
        Call Trace:
         [] __fput+0xdf/0x1d0
         [] ____fput+0xe/0x10
         [] task_work_run+0x8e/0xc0
         [] do_exit+0x2ae/0xae0
         [] ? __audit_syscall_entry+0xae/0x100
         [] ? syscall_trace_enter+0x1ca/0x310
         [] do_group_exit+0x44/0xc0
         [] SyS_exit_group+0x14/0x20
         [] do_syscall_64+0x61/0x150
         [] entry_SYSCALL64_slow_path+0x25/0x25
        <...>
        ---[ end trace f22ae883fa3ea6b8 ]---
        Fixing recursive fault but reboot is needed!
      
      Fix this by initializing the f2fs/status file_operations' ->owner with
      THIS_MODULE.
      
      This will allow debugfs to grab a reference to the f2fs module upon any
      open on that file, thus preventing it from getting removed.
      
      Fixes: 902829aa ("f2fs: move proc files to debugfs")
      Reported-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reported-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      49c82256
    • Dan Carpenter's avatar
      ext4: return -ENOMEM instead of success · e74af998
      Dan Carpenter authored
      commit 578620f4 upstream.
      
      We should set the error code if kzalloc() fails.
      
      Fixes: 67cf5b09 ("ext4: add the basic function for inline data support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e74af998
    • Darrick J. Wong's avatar
      ext4: reject inodes with negative size · b70876f0
      Darrick J. Wong authored
      commit 7e6e1ef4 upstream.
      
      Don't load an inode with a negative size; this causes integer overflow
      problems in the VFS.
      
      [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]
      
      js: use EIO for 3.12 instead of EFSCORRUPTED.
      
      Fixes: a48380f7 (ext4: rename i_dir_acl to i_size_high)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b70876f0
    • Chandan Rajendra's avatar
      ext4: fix stack memory corruption with 64k block size · ac3d8fba
      Chandan Rajendra authored
      commit 30a9d7af upstream.
      
      The number of 'counters' elements needed in 'struct sg' is
      super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
      elements in the array. This is insufficient for block sizes >= 32k. In
      such cases the memcpy operation performed in ext4_mb_seq_groups_show()
      would cause stack memory corruption.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ac3d8fba
    • Chandan Rajendra's avatar
      ext4: fix mballoc breakage with 64k block size · cafd2159
      Chandan Rajendra authored
      commit 69e43e8c upstream.
      
      'border' variable is set to a value of 2 times the block size of the
      underlying filesystem. With 64k block size, the resulting value won't
      fit into a 16-bit variable. Hence this commit changes the data type of
      'border' to 'unsigned int'.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cafd2159
    • Alex Porosanu's avatar
      crypto: caam - fix AEAD givenc descriptors · 625208a0
      Alex Porosanu authored
      commit d128af17 upstream.
      
      The AEAD givenc descriptor relies on moving the IV through the
      output FIFO and then back to the CTX2 for authentication. The
      SEQ FIFO STORE could be scheduled before the data can be
      read from OFIFO, especially since the SEQ FIFO LOAD needs
      to wait for the SEQ FIFO LOAD SKIP to finish first. The
      SKIP takes more time when the input is SG than when it's
      a contiguous buffer. If the SEQ FIFO LOAD is not scheduled
      before the STORE, the DECO will hang waiting for data
      to be available in the OFIFO so it can be transferred to C2.
      In order to overcome this, first force transfer of IV to C2
      by starting the "cryptlen" transfer first and then starting to
      store data from OFIFO to the output buffer.
      
      Fixes: 1acebad3 ("crypto: caam - faster aead implementation")
      Signed-off-by: default avatarAlex Porosanu <alexandru.porosanu@nxp.com>
      Signed-off-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      625208a0
    • NeilBrown's avatar
      block_dev: don't test bdev->bd_contains when it is not stable · 725da55c
      NeilBrown authored
      commit bcc7f5b4 upstream.
      
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      725da55c
    • Johan Hovold's avatar
      USB: serial: kl5kusb105: fix open error path · 6017ea9d
      Johan Hovold authored
      commit 6774d5f5 upstream.
      
      Kill urbs and disable read before returning from open on failure to
      retrieve the line state.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6017ea9d
    • Robbie Ko's avatar
      Btrfs: fix tree search logic when replaying directory entry deletes · 6e97d31a
      Robbie Ko authored
      commit 2a7bf53f upstream.
      
      If a log tree has a layout like the following:
      
      leaf N:
              ...
              item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
                      dir log end 1275809046
      leaf N + 1:
              item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
                      dir log end 18446744073709551615
              ...
      
      When we pass the value 1275809046 + 1 as the parameter start_ret to the
      function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
      end up with path->slots[0] having the value 239 (points to the last item
      of leaf N, item 240). Because the dir log item in that position has an
      offset value smaller than *start_ret (1275809046 + 1) we need to move on
      to the next leaf, however the logic for that is wrong since it compares
      the current slot to the number of items in the leaf, which is smaller
      and therefore we don't lookup for the next leaf but instead we set the
      slot to point to an item that does not exist, at slot 240, and we later
      operate on that slot which has unexpected content or in the worst case
      can result in an invalid memory access (accessing beyond the last page
      of leaf N's extent buffer).
      
      So fix the logic that checks when we need to lookup at the next leaf
      by first incrementing the slot and only after to check if that slot
      is beyond the last item of the current leaf.
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Fixes: e02119d5 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      [Modified changelog for clarity and correctness]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6e97d31a
    • Michal Hocko's avatar
      hotplug: Make register and unregister notifier API symmetric · 57bf12f4
      Michal Hocko authored
      commit 777c6e0d upstream.
      
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      [js] backport to 3.12
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      57bf12f4
    • Boris Brezillon's avatar
      m68k: Fix ndelay() macro · 3d1b965c
      Boris Brezillon authored
      commit 7e251bb2 upstream.
      
      The current ndelay() macro definition has an extra semi-colon at the
      end of the line thus leading to a compilation error when ndelay is used
      in a conditional block without curly braces like this one:
      
      	if (cond)
      		ndelay(t);
      	else
      		...
      
      which, after the preprocessor pass gives:
      
      	if (cond)
      		m68k_ndelay(t);;
      	else
      		...
      
      thus leading to the following gcc error:
      
      	error: 'else' without a previous 'if'
      
      Remove this extra semi-colon.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: c8ee038b ("m68k: Implement ndelay() based on the existing udelay() logic")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3d1b965c
    • Thomas Gleixner's avatar
      locking/rtmutex: Prevent dequeue vs. unlock race · 07f03860
      Thomas Gleixner authored
      commit dbb26055 upstream.
      
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [wt: s/{READ,WRITE}_ONCE/ACCESS_ONCE/]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      07f03860
    • Jan Kara's avatar
      ext4: fix data exposure after a crash · 9488a477
      Jan Kara authored
      commit 06bd3c36 upstream.
      
      Huang has reported that in his powerfail testing he is seeing stale
      block contents in some of recently allocated blocks although he mounts
      ext4 in data=ordered mode. After some investigation I have found out
      that indeed when delayed allocation is used, we don't add inode to
      transaction's list of inodes needing flushing before commit. Originally
      we were doing that but commit f3b59291 removed the logic with a
      flawed argument that it is not needed.
      
      The problem is that although for delayed allocated blocks we write their
      contents immediately after allocating them, there is no guarantee that
      the IO scheduler or device doesn't reorder things and thus transaction
      allocating blocks and attaching them to inode can reach stable storage
      before actual block contents. Actually whenever we attach freshly
      allocated blocks to inode using a written extent, we should add inode to
      transaction's ordered inode list to make sure we properly wait for block
      contents to be written before committing the transaction. So that is
      what we do in this patch. This also handles other cases where stale data
      exposure was possible - like filling hole via mmap in
      data=ordered,nodelalloc mode.
      
      The only exception to the above rule are extending direct IO writes where
      blkdev_direct_IO() waits for IO to complete before increasing i_size and
      thus stale data exposure is not possible. For now we don't complicate
      the code with optimizing this special case since the overhead is pretty
      low. In case this is observed to be a performance problem we can always
      handle it using a special flag to ext4_map_blocks().
      
      Fixes: f3b59291Reported-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Tested-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9488a477
    • Eric Biggers's avatar
      KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings · d19182e2
      Eric Biggers authored
      commit c9f838d1 upstream.
      
      This fixes CVE-2017-7472.
      
      Running the following program as an unprivileged user exhausts kernel
      memory by leaking thread keyrings:
      
      	#include <keyutils.h>
      
      	int main()
      	{
      		for (;;)
      			keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
      	}
      
      Fix it by only creating a new thread keyring if there wasn't one before.
      To make things more consistent, make install_thread_keyring_to_cred()
      and install_process_keyring_to_cred() both return 0 if the corresponding
      keyring is already present.
      
      Fixes: d84f4f99 ("CRED: Inaugurate COW credentials")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d19182e2
    • David Howells's avatar
      KEYS: Change the name of the dead type to ".dead" to prevent user access · 5b22a8a5
      David Howells authored
      commit c1644fe0 upstream.
      
      This fixes CVE-2017-6951.
      
      Userspace should not be able to do things with the "dead" key type as it
      doesn't have some of the helper functions set upon it that the kernel
      needs.  Attempting to use it may cause the kernel to crash.
      
      Fix this by changing the name of the type to ".dead" so that it's rejected
      up front on userspace syscalls by key_get_type_from_user().
      
      Though this doesn't seem to affect recent kernels, it does affect older
      ones, certainly those prior to:
      
      	commit c06cfb08
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Tue Sep 16 17:36:06 2014 +0100
      	KEYS: Remove key_type::match in favour of overriding default by match_preparse
      
      which went in before 3.18-rc1.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5b22a8a5
    • David Howells's avatar
      KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings · 8e728d26
      David Howells authored
      commit ee8f844e upstream.
      
      This fixes CVE-2016-9604.
      
      Keyrings whose name begin with a '.' are special internal keyrings and so
      userspace isn't allowed to create keyrings by this name to prevent
      shadowing.  However, the patch that added the guard didn't fix
      KEYCTL_JOIN_SESSION_KEYRING.  Not only can that create dot-named keyrings,
      it can also subscribe to them as a session keyring if they grant SEARCH
      permission to the user.
      
      This, for example, allows a root process to set .builtin_trusted_keys as
      its session keyring, at which point it has full access because now the
      possessor permissions are added.  This permits root to add extra public
      keys, thereby bypassing module verification.
      
      This also affects kexec and IMA.
      
      This can be tested by (as root):
      
      	keyctl session .builtin_trusted_keys
      	keyctl add user a a @s
      	keyctl list @s
      
      which on my test box gives me:
      
      	2 keys in keyring:
      	180010936: ---lswrv     0     0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
      	801382539: --alswrv     0     0 user: a
      
      Fix this by rejecting names beginning with a '.' in the keyctl.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8e728d26
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder · cbd95d10
      Andy Whitcroft authored
      commit f843ee6d upstream.
      
      Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
      wrapping issues.  To ensure we are correctly ensuring that the two ESN
      structures are the same size compare both the overall size as reported
      by xfrm_replay_state_esn_len() and the internal length are the same.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cbd95d10
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window · ccf85441
      Andy Whitcroft authored
      commit 677e806d upstream.
      
      When a new xfrm state is created during an XFRM_MSG_NEWSA call we
      validate the user supplied replay_esn to ensure that the size is valid
      and to ensure that the replay_window size is within the allocated
      buffer.  However later it is possible to update this replay_esn via a
      XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
      buffer matches the existing state and if so inject the contents.  We do
      not at this point check that the replay_window is within the allocated
      memory.  This leads to out-of-bounds reads and writes triggered by
      netlink packets.  This leads to memory corruption and the potential for
      priviledge escalation.
      
      We already attempt to validate the incoming replay information in
      xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
      is not trying to change the size of the replay state buffer which
      includes the replay_esn.  It however does not check the replay_window
      remains within that buffer.  Add validation of the contained
      replay_window.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ccf85441
    • Eric Dumazet's avatar
      tcp: avoid infinite loop in tcp_splice_read() · 452655e7
      Eric Dumazet authored
      commit ccf7abb9 upstream.
      
      Splicing from TCP socket is vulnerable when a packet with URG flag is
      received and stored into receive queue.
      
      __tcp_splice_read() returns 0, and sk_wait_data() immediately
      returns since there is the problematic skb in queue.
      
      This is a nice way to burn cpu (aka infinite loop) and trigger
      soft lockups.
      
      Again, this gem was found by syzkaller tool.
      
      Fixes: 9c55e01c ("[TCP]: Splice receive support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      452655e7
    • Stephen Smalley's avatar
      selinux: fix off-by-one in setprocattr · a71b4196
      Stephen Smalley authored
      commit 0c461cb7 upstream.
      
      SELinux tries to support setting/clearing of /proc/pid/attr attributes
      from the shell by ignoring terminating newlines and treating an
      attribute value that begins with a NUL or newline as an attempt to
      clear the attribute.  However, the test for clearing attributes has
      always been wrong; it has an off-by-one error, and this could further
      lead to reading past the end of the allocated buffer since commit
      bb646cdb ("proc_pid_attr_write():
      switch to memdup_user()").  Fix the off-by-one error.
      
      Even with this fix, setting and clearing /proc/pid/attr attributes
      from the shell is not straightforward since the interface does not
      support multiple write() calls (so shells that write the value and
      newline separately will set and then immediately clear the attribute,
      requiring use of echo -n to set the attribute), whereas trying to use
      echo -n "" to clear the attribute causes the shell to skip the
      write() call altogether since POSIX says that a zero-length write
      causes no side effects. Thus, one must use echo -n to set and echo
      without -n to clear, as in the following example:
      $ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
      $ cat /proc/$$/attr/fscreate
      unconfined_u:object_r:user_home_t:s0
      $ echo "" > /proc/$$/attr/fscreate
      $ cat /proc/$$/attr/fscreate
      
      Note the use of /proc/$$ rather than /proc/self, as otherwise
      the cat command will read its own attribute value, not that of the shell.
      
      There are no users of this facility to my knowledge; possibly we
      should just get rid of it.
      
      UPDATE: Upon further investigation it appears that a local process
      with the process:setfscreate permission can cause a kernel panic as a
      result of this bug.  This patch fixes CVE-2017-2618.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      [PM: added the update about CVE-2017-2618 to the commit description]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a71b4196
    • Kees Cook's avatar
      fbdev: color map copying bounds checking · e39ab836
      Kees Cook authored
      commit 2dc705a9 upstream.
      
      Copying color maps to userspace doesn't check the value of to->start,
      which will cause kernel heap buffer OOB read due to signedness wraps.
      
      CVE-2016-8405
      
      Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: Peter Pi (@heisecode) of Trend Micro
      Cc: Min Chong <mchong@google.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e39ab836
    • Gu Zheng's avatar
      tmpfs: clear S_ISGID when setting posix ACLs · 1026a8ce
      Gu Zheng authored
      commit 497de07d upstream.
      
      This change was missed the tmpfs modification in In CVE-2016-7097
      commit 07393101 ("posix_acl: Clear SGID bit when setting
      file permissions")
      It can test by xfstest generic/375, which failed to clear
      setgid bit in the following test case on tmpfs:
      
        touch $testfile
        chown 100:100 $testfile
        chmod 2755 $testfile
        _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile
      Signed-off-by: default avatarGu Zheng <guzheng1@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1026a8ce
    • Jan Kara's avatar
      posix_acl: Clear SGID bit when setting file permissions · dd2421b5
      Jan Kara authored
      commit 07393101 upstream.
      
      When file permissions are modified via chmod(2) and the user is not in
      the owning group or capable of CAP_FSETID, the setgid bit is cleared in
      inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
      permissions as well as the new ACL, but doesn't clear the setgid bit in
      a similar way; this allows to bypass the check in chmod(2).  Fix that.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      [wt: dropped hfsplus changes : no xattr in 3.10]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd2421b5
    • Steve Rutherford's avatar
      KVM: x86: Introduce segmented_write_std · 27bf4b6e
      Steve Rutherford authored
      commit 129a72a0 upstream.
      
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      27bf4b6e
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 2dd7d7e4
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      
      [js] backport to 3.12
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2dd7d7e4
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · dd7d4be1
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd7d4be1
    • Ryan Ware's avatar
      EVM: Use crypto_memneq() for digest comparisons · 37510515
      Ryan Ware authored
      commit 613317bd upstream.
      
      This patch fixes vulnerability CVE-2016-2085.  The problem exists
      because the vm_verify_hmac() function includes a use of memcmp().
      Unfortunately, this allows timing side channel attacks; specifically
      a MAC forgery complexity drop from 2^128 to 2^12.  This patch changes
      the memcmp() to the cryptographically safe crypto_memneq().
      Reported-by: default avatarXiaofei Rex Guo <xiaofei.rex.guo@intel.com>
      Signed-off-by: default avatarRyan Ware <ware@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      37510515
    • James Yonan's avatar
      crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks · 5fd53819
      James Yonan authored
      commit 6bf37e5a upstream.
      
      When comparing MAC hashes, AEAD authentication tags, or other hash
      values in the context of authentication or integrity checking, it
      is important not to leak timing information to a potential attacker,
      i.e. when communication happens over a network.
      
      Bytewise memory comparisons (such as memcmp) are usually optimized so
      that they return a nonzero value as soon as a mismatch is found. E.g,
      on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch
      and up to ~850 cyc for a full match (cold). This early-return behavior
      can leak timing information as a side channel, allowing an attacker to
      iteratively guess the correct result.
      
      This patch adds a new method crypto_memneq ("memory not equal to each
      other") to the crypto API that compares memory areas of the same length
      in roughly "constant time" (cache misses could change the timing, but
      since they don't reveal information about the content of the strings
      being compared, they are effectively benign). Iow, best and worst case
      behaviour take the same amount of time to complete (in contrast to
      memcmp).
      
      Note that crypto_memneq (unlike memcmp) can only be used to test for
      equality or inequality, NOT for lexicographical order. This, however,
      is not an issue for its use-cases within the crypto API.
      
      We tried to locate all of the places in the crypto API where memcmp was
      being used for authentication or integrity checking, and convert them
      over to crypto_memneq.
      
      crypto_memneq is declared noinline, placed in its own source file,
      and compiled with optimizations that might increase code size disabled
      ("Os") because a smart compiler (or LTO) might notice that the return
      value is always compared against zero/nonzero, and might then
      reintroduce the same early-return optimization that we are trying to
      avoid.
      
      Using #pragma or __attribute__ optimization annotations of the code
      for disabling optimization was avoided as it seems to be considered
      broken or unmaintained for long time in GCC [1]. Therefore, we work
      around that by specifying the compile flag for memneq.o directly in
      the Makefile. We found that this seems to be most appropriate.
      
      As we use ("Os"), this patch also provides a loop-free "fast-path" for
      frequently used 16 byte digests. Similarly to kernel library string
      functions, leave an option for future even further optimized architecture
      specific assembler implementations.
      
      This was a joint work of James Yonan and Daniel Borkmann. Also thanks
      for feedback from Florian Weimer on this and earlier proposals [2].
      
        [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
        [2] https://lkml.org/lkml/2013/2/10/131Signed-off-by: default avatarJames Yonan <james@openvpn.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5fd53819
    • Philip Pettersson's avatar
      packet: fix race condition in packet_set_ring · 4bfb6dd7
      Philip Pettersson authored
      commit 84ac7260 upstream.
      
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: gregkh@linuxfoundation.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4bfb6dd7
  2. 10 Feb, 2017 4 commits