- 26 Jan, 2017 40 commits
-
-
Bart Van Assche authored
commit d3a2418e upstream. This patch avoids that Coverity complains about not checking the ib_find_pkey() return value. Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Bart Van Assche authored
commit 2fe2f378 upstream. The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS (80) elements. Hence compare the array index with that value instead of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity reports the following: Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127). Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Steven Rostedt (Red Hat) authored
commit 847fa1a6 upstream. With new binutils, gcc may get smart with its optimization and change a jmp from a 5 byte jump to a 2 byte one even though it was jumping to a global function. But that global function existed within a 2 byte radius, and gcc was able to optimize it. Unfortunately, that jump was also being modified when function graph tracing begins. Since ftrace expected that jump to be 5 bytes, but it was only two, it overwrote code after the jump, causing a crash. This was fixed for x86_64 with commit 8329e818, with the same subject as this commit, but nothing was done for x86_32. Fixes: d61f82d0 ("ftrace: use dynamic patching for updating mcount calls") Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jim Mattson authored
commit ef85b673 upstream. When L2 exits to L0 due to "exception or NMI", software exceptions (#BP and #OF) for which L1 has requested an intercept should be handled by L1 rather than L0. Previously, only hardware exceptions were forwarded to L1. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Konstantin Khlebnikov authored
commit e8d7c332 upstream. Current implementation employ 16bit counter of active stripes in lower bits of bio->bi_phys_segments. If request is big enough to overflow this counter bio will be completed and freed too early. Fortunately this not happens in default configuration because several other limits prevent that: stripe_cache_size * nr_disks effectively limits count of active stripes. And small max_sectors_kb at lower disks prevent that during normal read/write operations. Overflow easily happens in discard if it's enabled by module parameter "devices_handle_discard_safely" and stripe_cache_size is set big enough. This patch limits requests size with 256Mb - 8Kb to prevent overflows. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Shaohua Li <shli@kernel.org> Cc: Neil Brown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Gerald Schaefer authored
commit 5457e03d upstream. The buffer for iucv_message_receive() needs to be below 2 GB. In __iucv_message_receive(), the buffer address is casted to an u32, which would result in either memory corruption or an addressing exception when using addresses >= 2 GB. Fix this by using GFP_DMA for the buffer allocation. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Wei Fang authored
commit d2a14525 upstream. A race between scanning and fc_remote_port_delete() may result in a permanent stop if the device gets blocked before scsi_sysfs_add_sdev() and unblocked after. The reason is that blocking a device sets both the SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However, scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the device to be ignored by scsi_target_unblock() and thus never have its QUEUE_FLAG_STOPPED cleared leading to a device which is apparently running but has a stopped queue. We actually have two places where SDEV_RUNNING is set: once in scsi_add_lun() which respects the blocked flag and once in scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely spurious, simply remove it to fix the problem. Reported-by: Zengxi Chen <chenzengxi@huawei.com> Signed-off-by: Wei Fang <fangwei1@huawei.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Steffen Maier authored
commit 6f2ce1c6 upstream. It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() | write_lock_irqsave(&adapter->erp_lock, flags);-------+ | zfcp_erp_action_enqueue() | | zfcp_erp_setup_act() | | * set ERP_INUSE -----------------------------------|--|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | | .context-switch. | | zfcp_erp_thread() | | zfcp_erp_strategy() | | write_lock_irqsave(&adapter->erp_lock, flags);------+ | | ... | | | zfcp_erp_strategy_check_target() | | | zfcp_erp_strategy_check_adapter() | | | zfcp_erp_adapter_unblock() | | | * set UNBLOCK -----------------------------------|--+ | zfcp_erp_action_dequeue() | | * clear ERP_INUSE ---------------------------------|-----+ ... | write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Steffen Maier authored
commit 56d23ed7 upstream. Since quite a while, Linux issues enough SCSI commands per scsi_device which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE, and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see other and important HBA trace records long enough. Therefore, do not trace HBA response errors for pure benign residual under counts at the default trace level. This excludes benign residual under count combined with other validity bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other cases, we still do want to see both the HBA record and the corresponding SCSI record by default. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Benjamin Block authored
commit dac37e15 upstream. When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Rabin Vincent authored
commit af309226 upstream. If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Russell Currey authored
commit 298360af upstream. ast_get_dram_info() configures a window in order to access BMC memory. A BMC register can be configured to disallow this, and if so, causes an infinite loop in the ast driver which renders the system unusable. Fix this by erroring out if an error is detected. On powerpc systems with EEH, this leads to the device being fenced and the system continuing to operate. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.ccSigned-off-by: Jiri Slaby <jslaby@suse.cz>
-
Patrik Jakobsson authored
commit 0a97c81a upstream. Hook up drm_compat_ioctl to support 32-bit userspace on 64-bit kernels. It turns out that N2600 and N2800 comes with 64-bit enabled. We previously assumed there where no such systems out there. Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161101144315.2955-1-patrik.r.jakobsson@gmail.comSigned-off-by: Jiri Slaby <jslaby@suse.cz>
-
Alex Deucher authored
commit 8729675c upstream. New variant. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Krzysztof Kozlowski authored
commit f37fabb8 upstream. In the critical sysfs entry the thermal hwmon was returning wrong temperature to the user-space. It was reporting the temperature of the first trip point instead of the temperature of critical trip point. For example: /sys/class/hwmon/hwmon0/temp1_crit:50000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:50000 /sys/class/thermal/thermal_zone0/trip_point_0_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_3_type:critical Since commit e68b16ab ("thermal: add hwmon sysfs I/F") the driver have been registering a sysfs entry if get_crit_temp() callback was provided. However when accessed, it was calling get_trip_temp() instead of the get_crit_temp(). Fixes: e68b16ab ("thermal: add hwmon sysfs I/F") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Larry Finger authored
commit 8052d724 upstream. When there is a CRC error in the SPROM read from the device, the code attempts to handle a fallback SPROM. When this also fails, the driver returns zero rather than an error code. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Eric Sandeen authored
commit 6b10b23c upstream. xlog_recover_clear_agi_bucket didn't set the type to XFS_BLFT_AGI_BUF, so we got a warning during log replay (or an ASSERT on a debug build). XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 Fix this, as was done in f19b872b for 2 other locations with the same problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Julien Grall authored
commit 24d5373d upstream. The function xen_guest_init is using __alloc_percpu with an alignment which are not power of two. However, the percpu allocator never supported alignments which are not power of two and has always behaved incorectly in thise case. Commit 3ca45a46 "percpu: ensure requested alignment is power of two" introduced a check which trigger a warning [1] when booting linux-next on Xen. But in reality this bug was always present. This can be fixed by replacing the call to __alloc_percpu with alloc_percpu. The latter will use an alignment which are a power of two. [1] [ 0.023921] illegal size (48) or align (48) for percpu allocation [ 0.024167] ------------[ cut here ]------------ [ 0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0 [ 0.024584] Modules linked in: [ 0.024708] [ 0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc7-next-20161128 #473 [ 0.025012] Hardware name: Foundation-v8A (DT) [ 0.025162] task: ffff80003d870000 task.stack: ffff80003d844000 [ 0.025351] PC is at pcpu_alloc+0x88/0x6c0 [ 0.025490] LR is at pcpu_alloc+0x88/0x6c0 [ 0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>] pstate: 60000045 [ 0.025830] sp : ffff80003d847cd0 [ 0.025946] x29: ffff80003d847cd0 x28: 0000000000000000 [ 0.026147] x27: 0000000000000000 x26: 0000000000000000 [ 0.026348] x25: 0000000000000000 x24: 0000000000000000 [ 0.026549] x23: 0000000000000000 x22: 00000000024000c0 [ 0.026752] x21: ffff000008e97000 x20: 0000000000000000 [ 0.026953] x19: 0000000000000030 x18: 0000000000000010 [ 0.027155] x17: 0000000000000a3f x16: 00000000deadbeef [ 0.027357] x15: 0000000000000006 x14: ffff000088f79c3f [ 0.027573] x13: ffff000008f79c4d x12: 0000000000000041 [ 0.027782] x11: 0000000000000006 x10: 0000000000000042 [ 0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c [ 0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84 [ 0.028419] x5 : 0000000000000005 x4 : 0000000000000000 [ 0.028628] x3 : 0000000000000000 x2 : 000000000000017f [ 0.028840] x1 : ffff80003d870000 x0 : 0000000000000035 [ 0.029056] [ 0.029152] ---[ end trace 0000000000000000 ]--- [ 0.029297] Call trace: [ 0.029403] Exception stack(0xffff80003d847b00 to 0xffff80003d847c30) [ 0.029621] 7b00: 0000000000000030 0001000000000000 ffff80003d847cd0 ffff00000818e678 [ 0.029901] 7b20: 0000000000000002 0000000000000004 ffff000008f7c060 0000000000000035 [ 0.030153] 7b40: ffff000008f79000 ffff000008c4cd88 ffff80003d847bf0 ffff000008101778 [ 0.030402] 7b60: 0000000000000030 0000000000000000 ffff000008e97000 00000000024000c0 [ 0.030647] 7b80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.030895] 7ba0: 0000000000000035 ffff80003d870000 000000000000017f 0000000000000000 [ 0.031144] 7bc0: 0000000000000000 0000000000000005 ffff000008f79c84 6120757063726570 [ 0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40 0000000000000042 0000000000000006 [ 0.031643] 7c00: 0000000000000041 ffff000008f79c4d ffff000088f79c3f 0000000000000006 [ 0.031877] 7c20: 00000000deadbeef 0000000000000a3f [ 0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0 [ 0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20 [ 0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4 [ 0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130 [ 0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248 [ 0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100 [ 0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 Reported-by: Wei Chen <wei.chen@arm.com> Link: https://lkml.org/lkml/2016/11/28/669Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Boris Ostrovsky authored
commit 30faaafd upstream. Commit 9c17d965 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Reported-by: Olaf Hering <olaf@aepfle.de> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Pavel Shilovsky authored
commit e3d240e9 upstream. If maxBuf is not 0 but less than a size of SMB2 lock structure we can end up with a memory corruption. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Pavel Shilovsky authored
commit 4772c795 upstream. Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Pavel Shilovsky authored
commit 53e0e11e upstream. We can not unlock/lock cifs_tcp_ses_lock while walking through ses and tcon lists because it can corrupt list iterator pointers and a tcon structure can be released if we don't hold an extra reference. Fix it by moving a reconnect process to a separate delayed work and acquiring a reference to every tcon that needs to be reconnected. Also do not send an echo request on newly established connections. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Benjamin Marzinski authored
commit 314c25c5 upstream. In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. [js] no nr_blocks test in 3.12 yet Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Ondrej Kozina authored
commit 265e9098 upstream. In crypt_set_key(), if a failure occurs while replacing the old key (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag set. Otherwise, the crypto layer would have an invalid key that still has DM_CRYPT_KEY_VALID flag set. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Aleksa Sarai authored
commit 613cc2b6 upstream. If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Shaohua Li authored
commit 5f33a080 upstream. Our system uses significantly more slab memory with memcg enabled with the latest kernel. With 3.10 kernel, slab uses 2G memory, while with 4.6 kernel, 6G memory is used. The shrinker has problem. Let's see we have two memcg for one shrinker. In do_shrink_slab: 1. Check cg1. nr_deferred = 0, assume total_scan = 700. batch size is 1024, then no memory is freed. nr_deferred = 700 2. Check cg2. nr_deferred = 700. Assume freeable = 20, then total_scan = 10 or 40. Let's assume it's 10. No memory is freed. nr_deferred = 10. The deferred share of cg1 is lost in this case. kswapd will free no memory even run above steps again and again. The fix makes sure one memcg's deferred share isn't lost. Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.comSigned-off-by: Shaohua Li <shli@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nicolai Stange authored
commit 05e6ea26 upstream. The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: 902829aa ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Dan Carpenter authored
commit 578620f4 upstream. We should set the error code if kzalloc() fails. Fixes: 67cf5b09 ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Darrick J. Wong authored
commit 7e6e1ef4 upstream. Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] js: use EIO for 3.12 instead of EFSCORRUPTED. Fixes: a48380f7 (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Theodore Ts'o authored
commit c48ae41b upstream. The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Theodore Ts'o authored
commit 5aee0f8a upstream. Fix a large number of problems with how we handle mount options in the superblock. For one, if the string in the superblock is long enough that it is not null terminated, we could run off the end of the string and try to interpret superblocks fields as characters. It's unlikely this will cause a security problem, but it could result in an invalid parse. Also, parse_options is destructive to the string, so in some cases if there is a comma-separated string, it would be modified in the superblock. (Fortunately it only happens on file systems with a 1k block size.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Theodore Ts'o authored
commit cd6bb35b upstream. Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Chandan Rajendra authored
commit 30a9d7af upstream. The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: c9de560dSigned-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Chandan Rajendra authored
commit 69e43e8c upstream. 'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: c9de560dSigned-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Alex Porosanu authored
commit d128af17 upstream. The AEAD givenc descriptor relies on moving the IV through the output FIFO and then back to the CTX2 for authentication. The SEQ FIFO STORE could be scheduled before the data can be read from OFIFO, especially since the SEQ FIFO LOAD needs to wait for the SEQ FIFO LOAD SKIP to finish first. The SKIP takes more time when the input is SG than when it's a contiguous buffer. If the SEQ FIFO LOAD is not scheduled before the STORE, the DECO will hang waiting for data to be available in the OFIFO so it can be transferred to C2. In order to overcome this, first force transfer of IV to C2 by starting the "cryptlen" transfer first and then starting to store data from OFIFO to the output buffer. Fixes: 1acebad3 ("crypto: caam - faster aead implementation") Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
NeilBrown authored
commit bcc7f5b4 upstream. bdev->bd_contains is not stable before calling __blkdev_get(). When __blkdev_get() is called on a parition with ->bd_openers == 0 it sets bdev->bd_contains = bdev; which is not correct for a partition. After a call to __blkdev_get() succeeds, ->bd_openers will be > 0 and then ->bd_contains is stable. When FMODE_EXCL is used, blkdev_get() calls bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim() This call happens before __blkdev_get() is called, so ->bd_contains is not stable. So bd_may_claim() cannot safely use ->bd_contains. It currently tries to use it, and this can lead to a BUG_ON(). This happens when a whole device is already open with a bd_holder (in use by dm in my particular example) and two threads race to open a partition of that device for the first time, one opening with O_EXCL and one without. The thread that doesn't use O_EXCL gets through blkdev_get() to __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev; Immediately thereafter the other thread, using FMODE_EXCL, calls bd_start_claiming() from blkdev_get(). This should fail because the whole device has a holder, but because bdev->bd_contains == bdev bd_may_claim() incorrectly reports success. This thread continues and blocks on bd_mutex. The first thread then sets bdev->bd_contains correctly and drops the mutex. The thread using FMODE_EXCL then continues and when it calls bd_may_claim() again in: BUG_ON(!bd_may_claim(bdev, whole, holder)); The BUG_ON fires. Fix this by removing the dependency on ->bd_contains in bd_may_claim(). As bd_may_claim() has direct access to the whole device, it can simply test if the target bdev is the whole device. Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Liu Bo authored
commit 2571e739 upstream. So we can read a btree block via readahead or intentional read, and we can end up with a memory leak when something happens as follows, 1) readahead starts to read block A but does not wait for read completion, 2) btree_readpage_end_io_hook finds that block A is corrupted, and it needs to clear all block A's pages' uptodate bit. 3) meanwhile an intentional read kicks in and checks block A's pages' uptodate to decide which page needs to be read. 4) when some pages have the uptodate bit during 3)'s check so 3) doesn't count them for eb->io_pages, but they are later cleared by 2) so we has to readpage on the page, we get the wrong eb->io_pages which results in a memory leak of this block. This fixes the problem by firstly getting all pages's locking and then checking pages' uptodate bit. t1(readahead) t2(readahead endio) t3(the following read) read_extent_buffer_pages end_bio_extent_readpage for pg in eb: for page 0,1,2 in eb: if pg is uptodate: btree_readpage_end_io_hook(pg) num_reads++ if uptodate: eb->io_pages = num_reads SetPageUptodate(pg) _______________ for pg in eb: for page 3 in eb: read_extent_buffer_pages if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb: __extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate: clear_extent_buffer_uptodate(eb) num_reads++ for pg in eb: eb->io_pages = num_reads ClearPageUptodate(page) _______________ for pg in eb: if pg is NOT uptodate: __extent_read_full_page(pg) So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit f73cd43a upstream. HP Z1 Gen3 AiO with Conexant codec doesn't give an unsolicited event to the headset mic pin upon the jack plugging, it reports only to the headphone pin. It results in the missing mic switching. Let's fix up by simply gating the jack event. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jussi Laako authored
commit 995c6a7f upstream. Sampling rate changes after first set one are not reflected to the hardware, while driver and ALSA think the rate has been changed. Fix the problem by properly stopping the interface at the beginning of prepare call, allowing new rate to be set to the hardware. This keeps the hardware in sync with the driver. Signed-off-by: Jussi Laako <jussi@sonarnerd.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Con Kolivas authored
commit 82ffb6fc upstream. The Logitech QuickCam Communicate Deluxe/S7500 microphone fails with the following warning. [ 6.778995] usb 2-1.2.2.2: Warning! Unlikely big volume range (=3072), cval->res is probably wrong. [ 6.778996] usb 2-1.2.2.2: [5] FU [Mic Capture Volume] ch = 1, val = 4608/7680/1 Adding it to the list of devices in volume_control_quirks makes it work properly, fixing related typo. Signed-off-by: Con Kolivas <kernel@kolivas.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-