Commits · 0b9b241406818a871c6d25390aa487dba966d548 · Kirill Smelkov / linux

12 Dec, 2020 27 commits

inet: frags: batch fqdir destroy works · 0b9b2414

SeongJae Park authored Dec 11, 2020

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster.  It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
about 2 minutes.  On 40 CPU cores / 70GB DRAM machine, the available
memory continuously reduced in a fast speed (about 120MB per second,
15GB in total within the 2 minutes).  Note that the issue don't
reproduce on every machine.  On my 6 CPU cores machine, the problem
didn't reproduce.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.  Therefore, 'fqdir_work_fn()' called frequently under the
workload and made the contention for 'rcu_barrier()' high.  In more
detail, the global mutex, 'rcu_state.barrier_mutex' became the
bottleneck.

This commit avoids such contention by doing the 'rcu_barrier()' and
subsequent lightweight works in a batched manner, as similar to that of
'cleanup_net()'.  The fqdir hashtable destruction, which is done before
the 'rcu_barrier()', is still allowed to run in parallel for fast
processing, but this commit makes it to use a dedicated work queue
instead of the 'system_wq', to make sure that the number of threads is
bounded.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201211112405.31158-1-sjpark@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0b9b2414

nfc: s3fwrn5: let core configure the interrupt trigger · e0a64d1d

Krzysztof Kozlowski authored Dec 10, 2020

If interrupt trigger is not set when requesting the interrupt, the core
will take care of reading trigger type from Devicetree. There is no
point to do it in the driver.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201210211824.214949-1-krzk@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e0a64d1d

net: dsa: mt7530: enable MTU normalization · 771c8901

DENG Qingfang authored Dec 11, 2020

MT7530 has a global RX length register, so we are actually changing its
MRU.
Enable MTU normalization for this reason.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Acked-by: Landen Chao <landen.chao@mediatek.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20201210170322.3433-1-dqfext@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

771c8901

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · e2437ac2

Jakub Kicinski authored Dec 12, 2020

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-12-12

Just one patch this time:

1) Redact the SA keys with kernel lockdown confidentiality.
   If enabled, no secret keys are sent to uuserspace.
   From Antony Antony.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: redact SA secret with lockdown confidentiality
====================

Link: https://lore.kernel.org/r/20201212085737.2101294-1-steffen.klassert@secunet.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e2437ac2

Merge tag 'wireless-drivers-next-2020-12-12' of... · e5795aac

Jakub Kicinski authored Dec 12, 2020

Merge tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.11

Second set of patches for v5.11. iwlwifi gaining support for the new
6 GHz band and rtw88 got a new channel. Lots of new features for mt76
and ath11k now has working suspend for PCI devices. And as always,
smaller fixes and cleanups all over.

Major changes:

rtw88
 * add support for channel 144

mt76
 * support for more sta interfaces on mt7615/mt7915
 * mt7915 encapsulation offload
 * performance improvements
 * channel noise report on mt7915
 * mt7915 testmode support
 * mt7915 DBDC support

iwlwifi
 * support 6 GHz band

ath11k
 * suspend support for QCA6390 PCI devices
 * support TXOP duration based RTS threshold
 * mesh: add support for 256 bitmap in blockack frames in 11ax

* tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (197 commits)
  ath11k: implement suspend for QCA6390 PCI devices
  ath11k: hif: add ce irq enable and disable functions
  ath11k: implement WoW enable and wakeup commands
  ath11k: set credit_update flag for flow controlled ep only
  ath11k: dp: stop rx pktlog before suspend
  ath11k: htc: implement suspend handling
  ath11k: htc: remove unused struct ath11k_htc_ops
  ath11k: pci: read select_window register to ensure write is finished
  ath11k: hif: implement suspend and resume functions
  ath11k: mhi: hook suspend and resume
  ath11k: Fix incorrect tlvs in scan start command
  ath11k: pci: disable VDD4BLOW
  ath11k: pci: fix L1ss clock unstable problem
  ath11k: pci: fix hot reset stability issues
  ath11k: put hw to DBS using WMI_PDEV_SET_HW_MODE_CMDID
  ath11k: mhi: print a warning if firmware crashed
  ath11k: use MHI provided APIs to allocate and free MHI controller
  ath10k: add atomic protection for device recovery
  ath10k: add option for chip-id based BDF selection
  mt76: remove unused variable q
  ...
====================

Link: https://lore.kernel.org/r/20201212050839.EF50EC433C6@smtp.codeaurora.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e5795aac

Merge tag 'mac80211-next-for-net-next-2020-12-11' of... · 00f7763a

Jakub Kicinski authored Dec 12, 2020

Merge tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A new set of wireless changes:
 * validate key indices for key deletion
 * more preamble support in mac80211
 * various 6 GHz scan fixes/improvements
 * a common SAR power limitations API
 * various small fixes & code improvements

* tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits)
  mac80211: add ieee80211_set_sar_specs
  nl80211: add common API to configure SAR power limitations
  mac80211: fix a mistake check for rx_stats update
  mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc
  mac80211: Update rate control on channel change
  mac80211: don't filter out beacons once we start CSA
  mac80211: Fix calculation of minimal channel width
  mac80211: ignore country element TX power on 6 GHz
  mac80211: use bitfield helpers for BA session action frames
  mac80211: support Rx timestamp calculation for all preamble types
  mac80211: don't set set TDLS STA bandwidth wider than possible
  mac80211: support driver-based disconnect with reconnect hint
  cfg80211: support immediate reconnect request hint
  mac80211: use struct assignment for he_obss_pd
  cfg80211: remove struct ieee80211_he_bss_color
  nl80211: validate key indexes for cfg80211_registered_device
  cfg80211: include block-tx flag in channel switch started event
  mac80211: disallow band-switch during CSA
  ieee80211: update reduced neighbor report TBTT info length
  cfg80211: Save the regulatory domain when setting custom regulatory
  ...
====================

Link: https://lore.kernel.org/r/20201211142552.209018-1-johannes@sipsolutions.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

00f7763a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 46d5e62d

Jakub Kicinski authored Dec 11, 2020

xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

46d5e62d

Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git · 7ab25038

Kalle Valo authored Dec 12, 2020

ath.git patches for v5.11. Major changes:

ath11k

* suspend support for QCA6390 PCI devices

* support TXOP duration based RTS threshold

* mesh: add support for 256 bitmap in blockack frames in 11ax

7ab25038

ath11k: implement suspend for QCA6390 PCI devices · d1b0c338

Carl Huang authored Dec 11, 2020

Now that all the needed pieces are in place implement suspend support QCA6390
PCI devices. All other devices will return -EOPNOTSUPP during suspend. The
suspend is implemented by switching the firmware to WoW mode during suspend, so
the firmware will be running on low power mode while host is in suspend.

At the moment we are not able to shutdown and fully power off the device due to
bugs in MHI subsystem, so WoW mode is a workaround for the time being.

During suspend we enable WoW mode, disable CE irq and DP irq, then put MHI to
suspend state. During resume, driver resumes MHI firstly, then enables CE irq
and dp IRQ, and sends WoW wakeup command to firmware.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-11-git-send-email-kvalo@codeaurora.org

d1b0c338

ath11k: hif: add ce irq enable and disable functions · d578ec2a

Carl Huang authored Dec 11, 2020

Add ce irq enable and disable hif layer functions, so core module can enable
enable them without cleaning pipe and refilling pipe. Needed for suspend.

d578ec2a

ath11k: implement WoW enable and wakeup commands · 79802b13

Carl Huang authored Dec 11, 2020

Implement wow enable ane wow wakeup commands which are needed for suspend.

79802b13

ath11k: set credit_update flag for flow controlled ep only · 2151ffde

Carl Huang authored Dec 11, 2020

Firmware will check all the pipes before entering WoW mode during suspend. If
ATH11K_HTC_FLAG_NEED_CREDIT_UPDATE is set, firmware treats this pipe needed to
return credit even though it's actually not required. If any pipe needs to
return credit, the suspend_complete message doesn't send to host but is
dropped. So host gets time out and WoW suspend failed.

2151ffde

ath11k: dp: stop rx pktlog before suspend · 840c36fa

Carl Huang authored Dec 11, 2020

Stop dp rx pktlog when entering suspend and reap the mon_status buffer to keep
it empty. During resume restart the reap timer.

840c36fa

ath11k: htc: implement suspend handling · 8733d835

Carl Huang authored Dec 11, 2020

When ath11k sends suspend command to firmware, firmware will
return suspend_complete events and add handlers for those.

8733d835

ath11k: htc: remove unused struct ath11k_htc_ops · d50370c9

Kalle Valo authored Dec 11, 2020

No need for it so remove. Compile tested only.
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-5-git-send-email-kvalo@codeaurora.org

d50370c9

ath11k: pci: read select_window register to ensure write is finished · f6fa37a4

Carl Huang authored Dec 11, 2020

Just when resume from WoW, the write to select_window doesn't take
effect immediately, so read the register again to ensure the write
operation is finished.

Another change is to reset select_window to ZERO because this
register isn't restored after WoW, so the content of this register
becomes ZERO too.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-4-git-send-email-kvalo@codeaurora.org

f6fa37a4

ath11k: hif: implement suspend and resume functions · fa5917e4

Carl Huang authored Dec 11, 2020

For suspend support add suspend and resume to HIF layer. These ops are optional
and, for example, AHB bus driver does not need to implement these.

fa5917e4

ath11k: mhi: hook suspend and resume · 34fb81e4

Carl Huang authored Dec 11, 2020

MHI suspend and resume isn't hooked in ath11k yet, so hook these
functions needed for suspend support.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-2-git-send-email-kvalo@codeaurora.org

34fb81e4

ath11k: Fix incorrect tlvs in scan start command · f57ad6a9

Pradeep Kumar Chitrapu authored Dec 10, 2020

Currently 6G specific tlvs have duplicate entries which is causing
scan failures. Fix this by removing the duplicate entries of the same
tlv. This also fixes out-of-bound memory writes caused due to
adding tlvs when num_hint_bssid and num_hint_s_ssid are ZEROs.

Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1-01386-QCAHKSWPL_SILICONZ-1

Fixes: 74601ecf ("ath11k: Add support for 6g scan hint")
Reported-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607609124-17250-7-git-send-email-kvalo@codeaurora.org

f57ad6a9

ath11k: pci: disable VDD4BLOW · 0ccdf439

Carl Huang authored Dec 10, 2020

It's recommended to disable VDD4BLOW during initialisation.

0ccdf439

ath11k: pci: fix L1ss clock unstable problem · 06999407

Carl Huang authored Dec 10, 2020

For QCA6390, one PCI related clock drifts sometimes, and
it makes PCI link difficult to quit L1ss. Fix it by writing
some registers which are known to fix the problem.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607609124-17250-5-git-send-email-kvalo@codeaurora.org

06999407

ath11k: pci: fix hot reset stability issues · babb0ced

Carl Huang authored Dec 10, 2020

For QCA6390, host needs to reset some registers before MHI power up to fix PCI
link unstable issue if hot reset happened. Also clear all pending interrupts
during power up.

babb0ced

ath11k: put hw to DBS using WMI_PDEV_SET_HW_MODE_CMDID · 43ed15e1

Carl Huang authored Dec 10, 2020

It's recommended to use wmi command WMI_PDEV_SET_HW_MODE_CMDID to put hardware
to dbs mode instead of wmi_init command. This fixes a few strange stability
issues.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607609124-17250-3-git-send-email-kvalo@codeaurora.org

43ed15e1

ath11k: mhi: print a warning if firmware crashed · fc46e1b2

Kalle Valo authored Dec 10, 2020

There was no way to detect if the firmware crashed so add a warning. At the
moment the firmware is not restarted or anything like that, so when this
happens ath11k modules need to be reloaded.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607609124-17250-2-git-send-email-kvalo@codeaurora.org

fc46e1b2

ath11k: use MHI provided APIs to allocate and free MHI controller · 57449b07

Bhaumik Bhatt authored Nov 17, 2020

Use MHI provided APIs to allocate and free MHI controller to
improve MHI host driver handling. This also fixes a memory leak
as the MHI controller was allocated but never freed.
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1605634436-36506-1-git-send-email-bbhatt@codeaurora.org

57449b07

ath10k: add atomic protection for device recovery · 5dadbe4e

Wen Gong authored Sep 08, 2020

When it has more than one restart_work queued meanwhile, the 2nd
restart_work is very easy to break the 1st restart work and lead
recovery fail.

Add a flag to allow only one restart work running untill
device successfully recovered.

It already has flag ATH10K_FLAG_CRASH_FLUSH, but it can not use this
flag again, because it is clear in ath10k_core_start. The function
ieee80211_reconfig(called by ieee80211_restart_work) of mac80211 do
many things and drv_start(call to ath10k_core_start) is 1st thing,
when drv_start complete, it does not mean restart complete. So it
add new flag and clear it in ath10k_reconfig_complete, because it
is the last thing called from drv_reconfig_complete of function
ieee80211_reconfig, after it, the restart process finished.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/010101746bead6a0-d5e97c66-dedd-4b92-810e-c2e4840fafc9-000000@us-west-2.amazonses.com

5dadbe4e

ath10k: add option for chip-id based BDF selection · 2bc2b87b

Abhishek Kumar authored Dec 09, 2020

In some devices difference in chip-id should be enough to pick
the right BDF. Add another support for chip-id based BDF selection.
With this new option, ath10k supports 2 fallback options.

The board name with chip-id as option looks as follows
board name 'bus=snoc,qmi-board-id=ff,qmi-chip-id=320'

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
Tested-on: QCA6174 HW3.2 WLAN.RM.4.4.1-00157-QCARMSWPZ-1
Signed-off-by: Abhishek Kumar <kuabhs@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201207231824.v3.1.Ia6b95087ca566f77423f3802a78b946f7b593ff5@changeid

2bc2b87b

11 Dec, 2020 13 commits

Merge tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 7f376f19

Linus Torvalds authored Dec 11, 2020

Pull mtd fixes from Miquel Raynal:
"Second series of fixes for raw NAND drivers initiated because of a
rework of the ECC engine subsystem.

The location of the DT parsing logic got moved, breaking several
drivers which in fact were not doing the ECC engine initialization at
the right place.

These drivers have been fixed by enforcing a particular ECC engine
type and algorithm, software Hamming, while the algorithm may be
overwritten by a DT property. This merge request fixes this in the
xway, socrates, plat_nand, pasemi, orion, mpc5121, gpio, au1550 and
ams-delta controller drivers"

* tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: xway: Do not force a particular software ECC engine
mtd: rawnand: socrates: Do not force a particular software ECC engine
mtd: rawnand: plat_nand: Do not force a particular software ECC engine
mtd: rawnand: pasemi: Do not force a particular software ECC engine
mtd: rawnand: orion: Do not force a particular software ECC engine
mtd: rawnand: mpc5121: Do not force a particular software ECC engine
mtd: rawnand: gpio: Do not force a particular software ECC engine
mtd: rawnand: au1550: Do not force a particular software ECC engine
mtd: rawnand: ams-delta: Do not force a particular software ECC engine

7f376f19

Merge tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 1de5d12b

Linus Torvalds authored Dec 11, 2020

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC fixes:

  MMC core:
   - Fixup condition for CMD13 polling for RPMB requests

  MMC host:
   - mtk-sd: Fix system suspend/resume support for CQHCI
   - mtd-sd: Extend SDIO IRQ fix to more variants
   - sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
   - tmio: Bring HW to a sane state after a power off"

* tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mediatek: mark PM functions as __maybe_unused
  mmc: block: Fixup condition for CMD13 polling for RPMB requests
  mmc: tmio: improve bringing HW to a sane state with MMC_POWER_OFF
  mmc: sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
  mmc: mediatek: Extend recheck_sdio_irq fix to more variants
  mmc: mediatek: Fix system suspend/resume support for CQHCI

1de5d12b

Merge tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 782598ec

Linus Torvalds authored Dec 11, 2020

Pull zonefs fix from Damien Le Moal:
 "A single patch in this pull request to fix a BIO and page reference
  leak when writing sequential zone files"

* tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: fix page reference and BIO leak

782598ec

bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers · b7906b70

Andrii Nakryiko authored Dec 11, 2020

Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb7 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b7906b70

Merge branch 'akpm' (patches from Andrew) · a06caa4a

Linus Torvalds authored Dec 11, 2020

Merge misc fixes from Andrew Morton:
 "8 patches.

  Subsystems affected by this patch series: proc, selftests, kbuild, and
  mm (pagecache, kasan, hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/hugetlb: clear compound_nr before freeing gigantic pages
  kasan: fix object remaining in offline per-cpu quarantine
  elfcore: fix building with clang
  initramfs: fix clang build failure
  kbuild: avoid static_assert for genksyms
  selftest/fpu: avoid clang warning
  proc: use untagged_addr() for pagemap_read addresses
  revert "mm/filemap: add static for function __add_to_page_cache_locked"

a06caa4a

mm/hugetlb: clear compound_nr before freeing gigantic pages · ba9c1201

Gerald Schaefer authored Dec 11, 2020

Commit 1378a5ee ("mm: store compound_nr as well as compound_order")
added compound_nr counter to first tail struct page, overlaying with
page->mapping.  The overlay itself is fine, but while freeing gigantic
hugepages via free_contig_range(), a "bad page" check will trigger for
non-NULL page->mapping on the first tail page:

  BUG: Bad page state in process bash  pfn:380001
  page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001
  aops:0x0
  flags: 0x3ffff00000000000()
  raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000
  raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000
  page dumped because: non-NULL mapping
  Modules linked in:
  CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1
  Hardware name: IBM 3906 M03 703 (LPAR)
  Call Trace:
    show_stack+0x6e/0xe8
    dump_stack+0x90/0xc8
    bad_page+0xd6/0x130
    free_pcppages_bulk+0x26a/0x800
    free_unref_page+0x6e/0x90
    free_contig_range+0x94/0xe8
    update_and_free_page+0x1c4/0x2c8
    free_pool_huge_page+0x11e/0x138
    set_max_huge_pages+0x228/0x300
    nr_hugepages_store_common+0xb8/0x130
    kernfs_fop_write+0xd2/0x218
    vfs_write+0xb0/0x2b8
    ksys_write+0xac/0xe0
    system_call+0xe6/0x288
  Disabling lock debugging due to kernel taint

This is because only the compound_order is cleared in
destroy_compound_gigantic_page(), and compound_nr is set to
1U << order == 1 for order 0 in set_compound_order(page, 0).

Fix this by explicitly clearing compound_nr for first tail page after
calling set_compound_order(page, 0).

Link: https://lkml.kernel.org/r/20201208182813.66391-2-gerald.schaefer@linux.ibm.com
Fixes: 1378a5ee ("mm: store compound_nr as well as compound_order")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ba9c1201

kasan: fix object remaining in offline per-cpu quarantine · 6c82d45c

Kuan-Ying Lee authored Dec 11, 2020

We hit this issue in our internal test.  When enabling generic kasan, a
kfree()'d object is put into per-cpu quarantine first.  If the cpu goes
offline, object still remains in the per-cpu quarantine.  If we call
kmem_cache_destroy() now, slub will report "Objects remaining" error.

  =============================================================================
  BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
  CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c878-dirty #10
  Hardware name: linux,dummy-virt (DT)
  Call trace:
     dump_backtrace+0x0/0x2b0
     show_stack+0x18/0x68
     dump_stack+0xfc/0x168
     slab_err+0xac/0xd4
     __kmem_cache_shutdown+0x1e4/0x3c8
     kmem_cache_destroy+0x68/0x130
     test_version_show+0x84/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
     do_el0_svc+0x38/0xa0
     el0_sync_handler+0x170/0x178
     el0_sync+0x174/0x180
  INFO: Object 0x(____ptrval____) @offset=15848
  INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
     stack_trace_save+0x9c/0xd0
     set_track+0x64/0xf0
     alloc_debug_processing+0x104/0x1a0
     ___slab_alloc+0x628/0x648
     __slab_alloc.isra.0+0x2c/0x58
     kmem_cache_alloc+0x560/0x588
     test_version_show+0x98/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
  kmem_cache_destroy test_module_slab: Slab cache still has objects

Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline.  Set a per-cpu variable to
indicate this cpu is offline.

[qiang.zhang@windriver.com: fix slab double free when cpu-hotplug]
  Link: https://lkml.kernel.org/r/20201204102206.20237-1-qiang.zhang@windriver.com

Link: https://lkml.kernel.org/r/1606895585-17382-2-git-send-email-Kuan-Ying.Lee@mediatek.comSigned-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Guangye Yang <guangye.yang@mediatek.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Qian Cai <qcai@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6c82d45c

elfcore: fix building with clang · 6e7b64b9

Arnd Bergmann authored Dec 11, 2020

kernel/elfcore.c only contains weak symbols, which triggers a bug with
clang in combination with recordmcount:

  Cannot find symbol for section 2: .text.
  kernel/elfcore.o: failed

Move the empty stubs into linux/elfcore.h as inline functions.  As only
two architectures use these, just use the architecture specific Kconfig
symbols to key off the declaration.

Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.orgSigned-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Barret Rhoden <brho@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6e7b64b9

initramfs: fix clang build failure · 55d5b7dd

Arnd Bergmann authored Dec 11, 2020

There is only one function in init/initramfs.c that is in the .text
section, and it is marked __weak.  When building with clang-12 and the
integrated assembler, this leads to a bug with recordmcount:

  ./scripts/recordmcount  "init/initramfs.o"
  Cannot find symbol for section 2: .text.
  init/initramfs.o: failed

I'm not quite sure what exactly goes wrong, but I notice that this
function is only ever called from an __init function, and normally
inlined.  Marking it __init as well is clearly correct and it leads to
recordmcount no longer complaining.

Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.orgSigned-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Barret Rhoden <brho@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

55d5b7dd

kbuild: avoid static_assert for genksyms · 14dc3983

Arnd Bergmann authored Dec 11, 2020

genksyms does not know or care about the _Static_assert() built-in, and
sometimes falls back to ignoring the later symbols, which causes
undefined behavior such as

WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned.
ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object
net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation

Redefine static_assert for genksyms to avoid that.

Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.orgSigned-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

14dc3983

selftest/fpu: avoid clang warning · 84edc2ef

Arnd Bergmann authored Dec 11, 2020

With extra warnings enabled, clang complains about the redundant
-mhard-float argument:

  clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument]

Move this into the gcc-only part of the Makefile.

Link: https://lkml.kernel.org/r/20201203223652.1320700-1-arnd@kernel.org
Fixes: 4185b3b9 ("selftests/fpu: Add an FPU selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Petteri Aimonen <jpa@git.mail.kapsi.fi>
Cc: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

84edc2ef

proc: use untagged_addr() for pagemap_read addresses · 40d6366e

Miles Chen authored Dec 11, 2020

When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().

I tested with 5.10-rc4 and the issue remains.

Explanation from Catalin in [1]:

 "Arguably, that's a user-space bug since tagged file offsets were never
  supported. In this case it's not even a tag at bit 56 as per the arm64
  tagged address ABI but rather down to bit 47. You could say that the
  problem is caused by the C library (malloc()) or whoever created the
  tagged vaddr and passed it to this function. It's not a kernel
  regression as we've never supported it.

  Now, pagemap is a special case where the offset is usually not
  generated as a classic file offset but rather derived by shifting a
  user virtual address. I guess we can make a concession for pagemap
  (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"

My test code is based on [2]:

A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8

userspace program:

  uint64 OsLayer::VirtualToPhysical(void *vaddr) {
	uint64 frame, paddr, pfnmask, pagemask;
	int pagesize = sysconf(_SC_PAGESIZE);
	off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
	int fd = open(kPagemapPath, O_RDONLY);
	...

	if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
		int err = errno;
		string errtxt = ErrorString(err);
		if (fd >= 0)
			close(fd);
		return 0;
	}
  ...
  }

kernel fs/proc/task_mmu.c:

  static ssize_t pagemap_read(struct file *file, char __user *buf,
		size_t count, loff_t *ppos)
  {
	...
	src = *ppos;
	svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
	start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
	end_vaddr = mm->task_size;

	/* watch out for wraparound */
	// svpfn == 0xb400007662f54
	// (mm->task_size >> PAGE) == 0x8000000
	if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
		start_vaddr = end_vaddr;

	ret = 0;
	while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
		int len;
		unsigned long end;
		...
	}
	...
  }

[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158

Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.comSigned-off-by: Miles Chen <miles.chen@mediatek.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
Cc: <stable@vger.kernel.org>	[5.4-]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

40d6366e

revert "mm/filemap: add static for function __add_to_page_cache_locked" · 16c0cc0c

Andrew Morton authored Dec 11, 2020

Revert commit 3351b16a ("mm/filemap: add static for function
__add_to_page_cache_locked") due to incompatibility with
ALLOW_ERROR_INJECTION which result in build errors.

Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.comTested-by: Justin Forbes <jmforbes@linuxtx.org>
Tested-by: Greg Thelen <gthelen@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

16c0cc0c