- 29 Aug, 2017 12 commits
-
-
Himanshu Jha authored
calling memcpy immediately after memset with the same region of memory makes memset redundant. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
When VBUS is not discovered within PD_T_PS_SOURCE_ON although Rp is detected on CC, TCPM switches the port to SNK_UNATTACHED state. SNK_UNATTACHED, however does not force TYPEC_CC_OPEN which makes the partner(source) to think that it is connected. To overcome this issue, force the port into PORT_RESET state to make sure the CC lines are open. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
PING messages are used to monitor the connect/disconnect. However, when PD is carried over CC, so this is not required. Also, the spec does not clearly say if PD is possible when Type-c is connected to Type-A/B. So, removing sending PING messages altogether. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
Once, Rp or Rd is switched, wait for PD_T_CC_DEBOUNCE. If not the PS_RDY message transmitted might result in failure. Also, Only wait for PD_T_SRCSWAPSTDBY while in PR_SWAP_SRC_SNK_TRANSITION_OFF. PD_T_PS_SOURCE_OFF is the overall time after which the initial sink would issue hard reset. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
In the case that the lower layer driver reports a cc change directly from SINK state to SOURCE state, TCPM doesn't handle these cc change in SRC_SEND_CAPABILITIES, SRC_READY states. And with SRC_ATTACHED state, the change is not handled as the port is still considered connected. [49606.131672] state change DRP_TOGGLING -> SRC_ATTACH_WAIT [49606.131701] pending state change SRC_ATTACH_WAIT -> SRC_ATTACHED @ 200 ms [49606.329952] state change SRC_ATTACH_WAIT -> SRC_ATTACHED [delayed 200 ms] [49606.329978] polarity 0 [49606.329989] Requesting mux mode 1, config 0, polarity 0 [49606.349416] vbus:=1 charge=0 [49606.372274] pending state change SRC_ATTACHED -> SRC_UNATTACHED @ 480 ms [49606.372431] VBUS on [49606.372488] state change SRC_ATTACHED -> SRC_STARTUP ... (the lower layer driver reports a direct change from source to sink) [49606.536927] pending state change SRC_SEND_CAPABILITIES -> SRC_SEND_CAPABILITIES @ 150 ms [49606.547244] CC1: 2 -> 5, CC2: 0 -> 0 [state SRC_SEND_CAPABILITIES, polarity 0, connected] This can happen when the lower layer driver and/or the hardware handles a portion of the Type-C state machine work, and quietly goes through the unattached state. Originally-from: Yueyao Zhu <yueyao@google.com> Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
While performing PORT_RESET, upon receiving the cc disconnect signal from the underlaying tcpc device, TCPM transitions into unattached state. Consider the current type of port while determining the unattached state. In the below logs, although the port_type was set to sink, TCPM transitioned into SRC_UNATTACHED. [ 762.290654] state change SRC_READY -> PORT_RESET [ 762.324531] Setting voltage/current limit 0 mV 0 mA [ 762.327912] polarity 0 [ 762.334864] cc:=0 [ 762.347193] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [ 762.347200] VBUS off [ 762.347203] CC1: 2 -> 0, CC2: 0 -> 0 [state PORT_RESET, polarity 0, disconnected] [ 762.347206] state change PORT_RESET -> SRC_UNATTACHED Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
According to the spec: "4.5.2.2.10.2 Exiting from TryWait.SNK State The port shall transition to Attached.SNK after tCCDebounce if or when VBUS is detected. Note the Source may initiate USB PD communications which will cause brief periods of the SNK.Open state on both the CC1 and CC2 pins, but this event will not exceed tPDDebounce. The port shall transition to Unattached.SNK when the state of both of the CC1 and CC2 pins is SNK.Open for at least tPDDebounce." Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
According to spec: " 4.5.2.2.9.2 Exiting from Try.SRC State: The port shall transition to Attached.SRC when the SRC.Rd state is detected on exactly one of the CC1 or CC2 pins for at least tPDDebounce. The port shall transition to TryWait.SNK after tDRPTry and the SRC.Rd state has not been detected." Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
According the spec, the following is the conditions for exiting Try.SNK state: "The port shall wait for tDRPTry and only then begin monitoring the CC1 and CC2 pins for the SNK.Rp state. The port shall then transition to Attached.SNK when the SNK.Rp state is detected on exactly one of the CC1 or CC2 pins for at least tPDDebounce and V BUS is detected. Alternatively, the port shall transition to TryWait.SRC if SNK.Rp state is not detected for tPDDebounce." Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
According to the spec the following is the condition for exiting TryWait.SRC: "The port shall transition to Attached.SRC when V BUS is at vSafe0V and the SRC.Rd state is detected on exactly one of the CC pins for at least tCCDebounce. The port shall transition to Unattached.SNK after tDRPTry if neither of the CC1 or CC2 pins are in the SRC.Rd state" TCPM at present keeps re-entering the SRC_TRYWAIT and keeps restarting tDRPTry if the CC presents Rp and disconnects within tCCDebounce. For example: [ 447.164308] pending state change SRC_TRYWAIT -> SRC_ATTACHED @ 200 ms [ 447.164386] CC1: 2 -> 0, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, disconnected] [ 447.164406] state change SRC_TRYWAIT -> SRC_TRYWAIT [ 447.164573] cc:=3 [ 447.191408] pending state change SRC_TRYWAIT -> SRC_TRYWAIT_UNATTACHED @ 100 ms [ 447.191478] CC1: 0 -> 0, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, disconnected] [ 447.207261] CC1: 0 -> 2, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, connected] [ 447.207306] state change SRC_TRYWAIT -> SRC_TRYWAIT [ 447.207485] cc:=3 [ 447.237283] pending state change SRC_TRYWAIT -> SRC_ATTACHED @ 200 ms [ 447.237357] CC1: 2 -> 0, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, disconnected] [ 447.237379] state change SRC_TRYWAIT -> SRC_TRYWAIT [ 447.237532] cc:=3 [ 447.263219] pending state change SRC_TRYWAIT -> SRC_TRYWAIT_UNATTACHED @ 100 ms [ 447.263289] CC1: 0 -> 0, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, disconnected] [ 447.280926] CC1: 0 -> 2, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, connected] [ 447.280970] state change SRC_TRYWAIT -> SRC_TRYWAIT [ 447.281158] cc:=3 [ 447.307767] pending state change SRC_TRYWAIT -> SRC_ATTACHED @ 200 ms [ 447.307838] CC1: 2 -> 0, CC2: 0 -> 0 [state SRC_TRYWAIT, polarity 0, disconnected] [ 447.307858] state change SRC_TRYWAIT -> SRC_TRYWAIT In TCPM, tDRPTry is set tp 100ms (min 75ms and max 150ms) and tCCdebounce is set to 200ms (min 100ms and max 200ms). To overcome the issue, record the time at which the port enters TryWait.SRC(SRC_TRYWAIT) and re-enter SRC_TRYWAIT only when CC keeps debouncing within tDRPTry. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
Enable Try.SRC or Try.SNK only when port_type is DRP. Try.SRC or Try.SNK state machines are not valid for SRC only or SNK only ports. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Badhri Jagan Sridharan authored
The port type callback call enquires the tcpc_dev if the requested port type is supported. If supported, then performs a tcpm reset if required after setting the tcpm internal port_type variable. Check against the tcpm port_type instead of checking against caps.type as port_type reflects the current configuration. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 Aug, 2017 26 commits
-
-
Okash Khawaja authored
Since we have tty_kopen, we no longer need to export tty_open_by_driver. This patch makes this function static. Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Okash Khawaja authored
This patch replaces call to tty_open_by_driver with a tty_kopen and uses tty_kclose instead of tty_release_struct to close it. Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Okash Khawaja authored
The commit 12e84c71 ("tty: export tty_open_by_driver") exports tty_open_by_device to allow tty to be opened from inside kernel which works fine except that it doesn't handle contention with user space or another kernel-space open of the same tty. For example, opening a tty from user space while it is kernel opened results in failure and a kernel log message about mismatch between tty->count and tty's file open count. This patch makes kernel access to tty exclusive, so that if a user process or kernel opens a kernel opened tty, it gets -EBUSY. It does this by adding TTY_KOPENED flag to tty->flags. When this flag is set, tty_open_by_driver returns -EBUSY. Instead of overloading tty_open_by_driver for both kernel and user space, this patch creates a separate function tty_kopen which closely follows tty_open_by_driver. tty_kclose closes the tty opened by tty_kopen. To address the mismatch between tty->count and #fd's, this patch adds #kopen's to the count before comparing it with tty->count. That way check_tty_count reflects correct usage count. Returning -EBUSY on tty open is a change in the interface. I have tested this with minicom, picocom and commands like "echo foo > /dev/ttyS0". They all correctly report "Device or resource busy" when the tty is already kernel opened. Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
We want the staging and iio fixes in here to handle the merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bhumika Goyal authored
Make this const as it is only used in a copy operation. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans de Goede authored
This results in a nice cleanup, and fixes link errors when fbdev support is disabled. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Perches authored
Adding __printf verification can help avoid format/argument mismatches. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Briskin authored
1. Remove module_init()/module_exit() macroes and visorbus_register_visor_driver/visorbus_unregister_visor_driver functions. 2. Replace with a short module_driver macro Signed-off-by: Alex Briskin <br.shurik@gmail.com> Acked-by: David Kershner <david.kershner@unisys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
"hdr" can't be NULL. We take skb->data which is non-NULL and add an offset to get "hdr". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Cihangir Akturk authored
Remove "struct pt_regs *" parameter from interrupt handlers, since it is no longer passed to interrupt handlers. Also, convert return types to irqreturn_t. Additionally, move DIO_irq_handler variable into the setup_GPIO function, as it's not used outside of this function. Signed-off-by: Cihangir Akturk <cakturk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shurong Zhang authored
This printk doesn't really add anything worthwhile. Signed-off-by: Shurong Zhang <zhang_shurong@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Colin Ian King authored
The call to _rtl_dbg_trace via macro HALMAC_RT_TRACE will trigger a null pointer deference on the null driver_adapter. Fix this by assigning driver_adapter earlier to halmac_adapter->driver_adapter before the tracing call so that a non-null driver_adapter is passed instead. Detected by CoverityScan, CID#1454613 ("Explicit null dereferenced") Fixes: 938a0447 ("staging: r8822be: Add code for halmac sub-driver") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Colin Ian King authored
A memory leak of eeprom_map occurs if the call to halmac_eeprom_parser_88xx fails. Fix this by kfree'ing it before returning. Detected by CoverityScan, CID#1454569 ("Resource leak") Fixes: 938a0447 ("staging: r8822be: Add code for halmac sub-driver") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
The obd_init_checks() function can either return -EOVERFLOW or -EINVAL but we accidentally ignore -EINVAL returns. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
The copy_from_user() function returns the number of bytes which we weren't able to copy. We don't want to return that to the user but instead we want to return -EFAULT. Fixes: d7e09d03 ("staging: add Lustre file system client support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
We recently changed from using obd_ioctl_popdata() to calling copy_to_user() directly. This if statement was supposed to be deleted but it was over looked. "err" is zero at this point so it means we return success. Fixes: b03679f6 ("staging: lustre: uapi: remove obd_ioctl_popdata() wrapper") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Himanshu Jha authored
calling memcpy immediately after memset with the same region of memory makes memset redundant. Build successfully. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Larry Finger authored
The changes in this commit are also being sent to the main rtlwifi drivers in wireless-next; however, these changes will also be useful for any debugging of r8822be before it gets moved into the main tree. Use debugfs to dump register and btcoex status, and also write registers and h2c. We create topdir in /sys/kernel/debug/rtlwifi/, and use the MAC address as subdirectory with several entries to dump mac_reg, bb_reg, rf_reg etc. An example is /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/mac_0 This change permits examination of device registers in a dynamic manner, a feature not available with the current debug mechanism. We use seq_file to replace RT_TRACE to dump status, then we can use 'cat' to access btcoex's status through debugfs. (i.e. /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/btcoex) Other related changes are 1. implement btc_disp_dbg_msg() to access btcoex's common status. 2. remove obsolete field bt_exist Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Yan-Hsuan Chuang <yhchuang@realtek.com> Cc: Birming Chiu <birming@realtek.com> Cc: Shaofu <shaofu@realtek.com> Cc: Steven Ting <steventing@realtek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
Smatch is distrustful of the "capab" value and marks it as user controlled. I think it actually comes from the firmware? Anyway, I looked at other drivers and they added a bounds check and it seems like a harmless thing to have so I have added it here as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Janani Sankara Babu authored
This patch is created to solve the following coding style issue reported by the checkpatch script. CHECK: spaces preffered around that '&' (ctx:VxV) Signed-off-by: Janani Sankara Babu <jananis37@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Janani Sankara Babu authored
This patch solves the following warning shown by the checkpatch script WARNING: Comparisons should place the constants on the right side of the test Signed-off-by: Janani Sankara Babu <jananis37@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull IOMMU fix from Joerg Roedel: "Another fix, this time in common IOMMU sysfs code. In the conversion from the old iommu sysfs-code to the iommu_device_register interface, I missed to update the release path for the struct device associated with an IOMMU. It freed the 'struct device', which was a pointer before, but is now embedded in another struct. Freeing from the middle of allocated memory had all kinds of nasty side effects when an IOMMU was unplugged. Unfortunatly nobody unplugged and IOMMU until now, so this was not discovered earlier. The fix is to make the 'struct device' a pointer again" * tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix wrong freeing of iommu_device->dev
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc fix from Greg KH: "Here is a single misc driver fix for 4.13-rc7. It resolves a reported problem in the Android binder driver due to previous patches in 4.13-rc. It's been in linux-next with no reported issues" * tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: ANDROID: binder: fix proc->tsk check.
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging/iio fixes from Greg KH: "Here are few small staging driver fixes, and some more IIO driver fixes for 4.13-rc7. Nothing major, just resolutions for some reported problems. All of these have been in linux-next with no reported problems" * tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: magnetometer: st_magn: remove ihl property for LSM303AGR iio: magnetometer: st_magn: fix status register address for LSM303AGR iio: hid-sensor-trigger: Fix the race with user space powering up sensors iio: trigger: stm32-timer: fix get trigger mode iio: imu: adis16480: Fix acceleration scale factor for adis16480 PATCH] iio: Fix some documentation warnings staging: rtl8188eu: add RNX-N150NUB support Revert "staging: fsl-mc: be consistent when checking strcmp() return" iio: adc: stm32: fix common clock rate iio: adc: ina219: Avoid underflow for sleeping time iio: trigger: stm32-timer: add enable attribute iio: trigger: stm32-timer: fix get/set down count direction iio: trigger: stm32-timer: fix write_raw return value iio: trigger: stm32-timer: fix quadrature mode get routine iio: bmp280: properly initialize device for humidity reading
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull NTB fixes from Jon Mason: "NTB bug fixes to address an incorrect ntb_mw_count reference in the NTB transport, improperly bringing down the link if SPADs are corrupted, and an out-of-order issue regarding link negotiation and data passing" * tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb: ntb: ntb_test: ensure the link is up before trying to configure the mws ntb: transport shouldn't disable link due to bogus values in SPADs ntb: use correct mw_count function in ntb_tool and ntb_transport
-
- 27 Aug, 2017 2 commits
-
-
Linus Torvalds authored
The "lock_page_killable()" function waits for exclusive access to the page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry set. That means that if it gets woken up, other waiters may have been skipped. That, in turn, means that if it sees the page being unlocked, it *must* take that lock and return success, even if a lethal signal is also pending. So instead of checking for lethal signals first, we need to check for them after we've checked the actual bit that we were waiting for. Even if that might then delay the killing of the process. This matches the order of the old "wait_on_bit_lock()" infrastructure that the page locking used to use (and is still used in a few other areas). Note that if we still return an error after having unsuccessfully tried to acquire the page lock, that is ok: that means that some other thread was able to get ahead of us and lock the page, and when that other thread then unlocks the page, the wakeup event will be repeated. So any other pending waiters will now get properly woken up. Fixes: 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jan Kara <jack@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Tim Chen and Kan Liang have been battling a customer load that shows extremely long page wakeup lists. The cause seems to be constant NUMA migration of a hot page that is shared across a lot of threads, but the actual root cause for the exact behavior has not been found. Tim has a patch that batches the wait list traversal at wakeup time, so that we at least don't get long uninterruptible cases where we traverse and wake up thousands of processes and get nasty latency spikes. That is likely 4.14 material, but we're still discussing the page waitqueue specific parts of it. In the meantime, I've tried to look at making the page wait queues less expensive, and failing miserably. If you have thousands of threads waiting for the same page, it will be painful. We'll need to try to figure out the NUMA balancing issue some day, in addition to avoiding the excessive spinlock hold times. That said, having tried to rewrite the page wait queues, I can at least fix up some of the braindamage in the current situation. In particular: (a) we don't want to continue walking the page wait list if the bit we're waiting for already got set again (which seems to be one of the patterns of the bad load). That makes no progress and just causes pointless cache pollution chasing the pointers. (b) we don't want to put the non-locking waiters always on the front of the queue, and the locking waiters always on the back. Not only is that unfair, it means that we wake up thousands of reading threads that will just end up being blocked by the writer later anyway. Also add a comment about the layout of 'struct wait_page_key' - there is an external user of it in the cachefiles code that means that it has to match the layout of 'struct wait_bit_key' in the two first members. It so happens to match, because 'struct page *' and 'unsigned long *' end up having the same values simply because the page flags are the first member in struct page. Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Christopher Lameter <cl@linux.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-