Commits · 8adc0a06d68a2e433b960377e515e7a6b19b429f · Kirill Smelkov / linux

21 Jan, 2021 11 commits

perf script: Fix overrun issue for dynamically-allocated PMU type number · 8adc0a06

Jin Yao authored Dec 09, 2020

When unpacking the event which is from dynamic PMU, the array
output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL
uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX =
PERF_TYPE_MAX + 1).

/* In builtin-script.c */

process_event()
{
        unsigned int type = output_type(attr->type);

        if (output[type].fields == 0)
                return;
}

output[10] is overrun.

Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then
output_type(attr->type) will return OUTPUT_TYPE_OTHER here.

Note that if PERF_TYPE_MAX ever changed, then there would be a conflict
between old perf.data files that had a dynamicaliy allocated PMU number
that would then be the same as a fixed PERF_TYPE.

Example:

  # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1
  # perf script

  Before:
         swapper     0 [000] 1479253.987551:     277766               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.987797:     246709               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988127:     329883               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988273:     146393               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988523:     249977               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988877:     354090               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989023:     145940               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989383:     359856               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989523:     140082               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])

  After:
         swapper     0 [000] 1397040.402011:     272384               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402011:       5396  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402011:        967 uncore_imc/data_writes/:
         swapper     0 [000] 1397040.402259:     249153               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402259:       7231  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402259:       1297 uncore_imc/data_writes/:
         swapper     0 [000] 1397040.402508:     249108               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402508:       5333  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402508:       1008 uncore_imc/data_writes/:
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20201209005828.21302-1-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8adc0a06

perf metricgroup: Fix system PMU metrics · 3d6e79ee

John Garry authored Jan 19, 2021

Joakim reports that getting "perf stat" for multiple system PMU metrics
segfaults:

  $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all
  Segmentation fault
  $

While the same works without issue for a single metric.

The logic in metricgroup__add_metric_sys_event_iter() is broken, in that
add_metric() @M argument should be NULL for each new metric. Fix by not
passing a holder for that, and rather make local in
metricgroup__add_metric_sys_event_iter().

Fixes: be335ec2 ("perf metricgroup: Support adding metrics for system PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611050655-44020-1-git-send-email-john.garry@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

3d6e79ee

perf metricgroup: Fix for metrics containing duration_time · 9c880c24

John Garry authored Jan 21, 2021

Metrics containing duration_time cause a segfault:

  $ perf stat -v -M L1D_Cache_Fill_BW sleep 1
  Using CPUID GenuineIntel-6-3D-4
  metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
  found event duration_time
  found event l1d.replacement
  adding {l1d.replacement}:W,duration_time
  l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/
  Segmentation fault
  $

In commit c2337d67 ("perf metricgroup: Fix metrics using aliases
covering multiple PMUs"), the logic in find_evsel_group() when iter'ing
events was changed to not only select events in same group, but also for
aliased PMUs.

Checking whether events were for aliased PMUs was done by comparing the
event PMU name. This was not safe for duration_time event, which has no
associated PMU (and no PMU name), so fix by checking if the event PMU name
is set also.

Committer testing:

Reproduced the bug, then, on a:

  $ grep -m1 ^'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  $

We now get:

  $ perf stat -M L1D_Cache_Fill_BW sleep 1

   Performance counter stats for 'sleep 1':

               4,141      l1d.replacement:u
       1,001,285,107 ns   duration_time:u

         1.001285107 seconds time elapsed

         0.000000000 seconds user
         0.001119000 seconds sys

  $

Detais from -v:

  Using CPUID GenuineIntel-6-8E-A
  metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
  found event duration_time
  found event l1d.replacement
  adding {l1d.replacement}:W,duration_time
  l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/
  Control descriptor is not initialized
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
  l1d.replacement:u: 4592 612201 612201
  duration_time:u: 1001478621 1001478621 1001478621

Fixes: c2337d67 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

9c880c24

perf evlist: Fix id index for heterogeneous systems · fc705fec

Adrian Hunter authored Jan 21, 2021

perf_evlist__set_sid_idx() updates perf_sample_id with the evlist map
index, CPU number and TID. It is passed indexes to the evsel's cpu and
thread maps, but references the evlist's maps instead. That results in
using incorrect CPU numbers on heterogeneous systems. Fix it by using
evsel maps.

The id index (PERF_RECORD_ID_INDEX) is used by AUX area tracing when in
sampling mode. Having an incorrect CPU number causes the trace data to
be attributed to the wrong CPU, and can result in decoder errors because
the trace data is then associated with the wrong process.

Committer notes:

Keep the class prefix convention in the function name, switching from
perf_evlist__set_sid_idx() to perf_evsel__set_sid_idx().

Fixes: 3c659eed ("perf tools: Add id index")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210121125446.11287-1-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

fc705fec

Merge tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 9f29bd8b

Linus Torvalds authored Jan 21, 2021

Pull fs and udf fixes from Jan Kara:
 "A lazytime handling fix from Eric Biggers and a fix of UDF session
  handling for large devices"

* tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix the problem that the disc content is not displayed
  fs: fix lazytime expiration handling in __writeback_single_inode()

9f29bd8b

Merge tag 'printk-for-5.11-printk-rework-fixup' of... · 2561bbbe

Linus Torvalds authored Jan 21, 2021

Merge tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Fix line counting and buffer size calculation. Both regressions
   caused that a reader buffer might not get filled as much as possible.

 - Restore non-documented behavior of printk() reader API and make it
   official.

   It did not fill the last byte of the provided buffer before 5.10. Two
   architectures, powerpc and um, used it to add the trailing '\0'.
   There might theoretically be more callers depending on this behavior
   in userspace.

* tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix buffer overflow potential for print_text()
  printk: fix kmsg_dump_get_buffer length calulations
  printk: ringbuffer: fix line counting

2561bbbe

Merge tag 'acpi-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6a52f4cf

Linus Torvalds authored Jan 21, 2021

Pull ACPI fix from Rafael Wysocki:
 "Modify a helper function in the ACPI core to match the behavior
  expected by its users so as to prevent NULL pointer dereferences and
  occasional memory corruption from occurring (Hans de Goede)"

* tag 'acpi-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: scan: Make acpi_bus_get_device() clear return pointer on error

6a52f4cf

Merge tag 'sound-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 120fbdb8

Linus Torvalds authored Jan 21, 2021

Pull sound fixes from Takashi Iwai:
 "Here is a collection of sound fixes targeted for 5.11-rc5. Most
  notably, USB-audio still got a few intensive changes for covering the
  regressions while the rest are all small fixes.

   - A trivial fix for sequencer OSS emulation error path

   - HD-audio runtime PM regression fix, a few quirks and new IDs

   - USB-audio regression fixes for Pioneer device, Logitech webcams,
     etc

   - ASoC SOF Intel coverage

   - MAINTAINERS file update

   - A fix in the jack handling in ASoC HDMI codec"

* tag 'sound-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Fix hw constraints dependencies
  ALSA: hda: Balance runtime/system PM if direct-complete is disabled
  ALSA: usb-audio: Avoid implicit feedback on Pioneer devices
  ALSA: usb-audio: Set sample rate for all sharing EPs on UAC1
  ALSA: usb-audio: Fix UAC1 rate setup for secondary endpoints
  MAINTAINERS: update qcom ASoC drivers list
  MAINTAINERS: update maintainers of qcom audio
  ALSA: hda: Add Cometlake-R PCI ID
  ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
  ALSA: hda/via: Add minimum mute flag
  ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
  ALSA: usb-audio: Always apply the hw constraints for implicit fb sync
  ASoC: SOF: Intel: fix page fault at probe if i915 init fails
  ALSA: hda: Add AlderLake-P PCI ID and HDMI codec vid
  ASoC: SOF: Intel: hda: Avoid checking jack on system suspend
  ASoC: SOF: Intel: hda: Modify existing helper to disable WAKEEN
  ASoC: SOF: Intel: hda: Resume codec to do jack detection
  MAINTAINERS: update references to stm32 audio bindings
  ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack()

120fbdb8

Merge tag 'gpio-fixes-for-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · d7631e43

Linus Torvalds authored Jan 21, 2021

Pull gpio fixes from Bartosz Golaszewski:

 - rework the character device code to avoid a frame size warning

 - fix printk format issues in gpio-tools

 - warn on redefinition of the to_irq callback in core gpiolib code

 - fix PWM period calculation in gpio-mvebu

 - make gpio-sifive Kconfig entry consistent with other drivers

 - fix a build issue in gpio-tegra

* tag 'gpio-fixes-for-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: tegra: Add missing dependencies
  gpio: sifive: select IRQ_DOMAIN_HIERARCHY rather than depend on it
  gpio: mvebu: fix pwm .get_state period calculation
  gpiolib: add a warning on gpiochip->to_irq defined
  tools: gpio: fix %llu warning in gpio-watch.c
  tools: gpio: fix %llu warning in gpio-event-mon.c
  gpiolib: cdev: fix frame size warning in gpio_ioctl()

d7631e43

Merge tag 'pinctrl-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 63858ac3

Linus Torvalds authored Jan 21, 2021

Pull pin control fixes from Linus Walleij:
 "These are all driver fixes, the Qualcomm stuff is the most widely used
  and important:

   - The main matter is a complicated fixup for the Qualcomm deep sleep
     states.

     This manifests in how interrupts get handled or in some cases not
     handled in cooperation with the PDC (Power Domain Controller). It's
     one of these really hardcore bug fixes that signifies high maturity
     of the platform.

   - Fix a register layout problem in the JZ4760 driver

   - Fix a register offset in the Aspeed G6 driver

   - Fix a compiler warning in the Nomadik driver

   - Fix a fallback code path in the mediatek driver"

* tag 'pinctrl-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: Don't clear pending interrupts when enabling
  pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmasking
  pinctrl: qcom: No need to read-modify-write the interrupt status
  pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0
  pinctrl: mediatek: Fix fallback call path
  pinctrl: nomadik: Remove unused variable in nmk_gpio_dbg_show_one
  pinctrl: aspeed: g6: Fix PWMG0 pinctrl setting
  pinctrl: ingenic: Rename registers from JZ4760_GPIO_* to JZ4770_GPIO_*
  pinctrl: ingenic: Fix JZ4760 support

63858ac3

Merge branch 'printk-rework' into for-linus · 535b6a12
Petr Mladek authored Jan 21, 2021

535b6a12

20 Jan, 2021 23 commits

Merge tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 9791581c

Linus Torvalds authored Jan 20, 2021

Pull btrfs fixes from David Sterba:
 "A few more one line fixes for various bugs, stable material.

   - fix send when emitting clone operation from the same file and root

   - fix double free on error when cleaning backrefs

   - lockdep fix during relocation

   - handle potential error during reloc when starting transaction

   - skip running delayed refs during commit (leftover from code removal
     in this dev cycle)"

* tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't clear ret in btrfs_start_dirty_block_groups
  btrfs: fix lockdep splat in btrfs_recover_relocation
  btrfs: do not double free backref nodes on error
  btrfs: don't get an EINTR during drop_snapshot for reloc
  btrfs: send: fix invalid clone operations when cloning from the same file and root
  btrfs: no need to run delayed refs after commit_fs_roots during commit

9791581c

ALSA: usb-audio: Fix hw constraints dependencies · 506c203c

Takashi Iwai authored Jan 20, 2021

Since the recent refactoring, it's been reported that some USB-audio
devices (typically webcams) are no longer detected properly by
PulseAudio. The debug session revealed that it's failing at probing
by PA to try the sample rate 44.1kHz while the device has discrete
sample rates other than 44.1kHz. But the puzzle was that arecord
works as is, and some other devices with the discrete rates work,
either.

After all, this turned out to be the lack of the dependencies in a few
hw constraint rules: snd_pcm_hw_rule_add() has the (variable)
arguments specifying the dependent parameters, and some functions
didn't set the target parameter itself as the dependencies. This
resulted in an invalid parameter that could be generated only in a
certain call pattern. This bug itself has been present in the code,
but it didn't trigger errors just because the rules were casually
avoiding such a corner case. After the recent refactoring and
cleanup, however, the hw constraints work "as expected", and the
problem surfaced now.

For fixing the problem above, this patch adds the missing dependent
parameters to each snd_pcm_hw_rule() call.

Fixes: bc4e94aa ("ALSA: usb-audio: Handle discrete rates properly in hw constraints")
BugLink: http://bugzilla.opensuse.org/show_bug.cgi?id=1181014
Link: https://lore.kernel.org/r/20210120204554.30177-1-tiwai@suse.deSigned-off-by: Takashi Iwai <tiwai@suse.de>

506c203c

Merge tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 75439bc4

Linus Torvalds authored Jan 20, 2021

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
  can trees.

  Current release - regressions:

   - nfc: nci: fix the wrong NCI_CORE_INIT parameters

  Current release - new code bugs:

   - bpf: allow empty module BTFs

  Previous releases - regressions:

   - bpf: fix signed_{sub,add32}_overflows type handling

   - tcp: do not mess with cloned skbs in tcp_add_backlog()

   - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach

   - bpf: don't leak memory in bpf getsockopt when optlen == 0

   - tcp: fix potential use-after-free due to double kfree()

   - mac80211: fix encryption issues with WEP

   - devlink: use right genl user_ptr when handling port param get/set

   - ipv6: set multicast flag on the multicast route

   - tcp: fix TCP_USER_TIMEOUT with zero window

  Previous releases - always broken:

   - bpf: local storage helpers should check nullness of owner ptr passed

   - mac80211: fix incorrect strlen of .write in debugfs

   - cls_flower: call nla_ok() before nla_next()

   - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"

* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: systemport: free dev before on error path
  net: usb: cdc_ncm: don't spew notifications
  net: mscc: ocelot: Fix multicast to the CPU port
  tcp: Fix potential use-after-free due to double kfree()
  bpf: Fix signed_{sub,add32}_overflows type handling
  can: peak_usb: fix use after free bugs
  can: vxcan: vxcan_xmit: fix use after free bug
  can: dev: can_restart: fix use after free bug
  tcp: fix TCP socket rehash stats mis-accounting
  net: dsa: b53: fix an off by one in checking "vlan->vid"
  tcp: do not mess with cloned skbs in tcp_add_backlog()
  selftests: net: fib_tests: remove duplicate log test
  net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
  sh_eth: Fix power down vs. is_opened flag ordering
  net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
  netfilter: rpfilter: mask ecn bits before fib lookup
  udp: mask TOS bits in udp_v4_early_demux()
  xsk: Clear pool even for inactive queues
  bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
  sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
  ...

75439bc4

Merge tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 2e4ceed6

Linus Torvalds authored Jan 20, 2021

Pull xen fix from Juergen Gross:
 "A fix for build failure showing up in some configurations"

* tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: fix 'nopvspin' build error

2e4ceed6

X.509: Fix crash caused by NULL pointer · 7178a107

Tianjia Zhang authored Jan 19, 2021

On the following call path, `sig->pkey_algo` is not assigned
in asymmetric_key_verify_signature(), which causes runtime
crash in public_key_verify_signature().

  keyctl_pkey_verify
    asymmetric_key_verify_signature
      verify_signature
        public_key_verify_signature

This patch simply check this situation and fixes the crash
caused by NULL pointer.

Fixes: 21552563 ("X.509: support OSCCA SM2-with-SM3 certificate verification")
Reported-by: Tobias Markus <tobias@markus-regensburg.de>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: João Fonseca <jpedrofonseca@ua.pt>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7178a107

cachefiles: Drop superfluous readpages aops NULL check · db58465f

Takashi Iwai authored Jan 20, 2021

After the recent actions to convert readpages aops to readahead, the
NULL checks of readpages aops in cachefiles_read_or_alloc_page() may
hit falsely.  More badly, it's an ASSERT() call, and this panics.

Drop the superfluous NULL checks for fixing this regression.

[DH: Note that cachefiles never actually used readpages, so this check was
 never actually necessary]

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883
BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245
Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

db58465f

ACPI: scan: Make acpi_bus_get_device() clear return pointer on error · 78a18fec

Hans de Goede authored Jan 15, 2021

Set the acpi_device pointer which acpi_bus_get_device() returns-by-
reference to NULL on errors.

We've recently had 2 cases where callers of acpi_bus_get_device()
did not properly error check the return value, so set the returned-
by-reference acpi_device pointer to NULL, because at least some
callers of acpi_bus_get_device() expect that to be done on errors.

[ rjw: This issue was exposed by commit 71da201f ("ACPI: scan:
  Defer enumeration of devices with _DEP lists") which caused it to
  be much more likely to occur on some systems, but the real defect
  had been introduced by an earlier commit. ]

Fixes: 40e7fcb1 ("ACPI: Add _DEP support to fix battery issue on Asus T100TA")
Fixes: bcfcd409 ("usb: split code locating ACPI companion into port and device")
Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Diagnosed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: All applicable <stable@vger.kernel.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

78a18fec

Merge tag 'linux-can-fixes-for-5.11-20210120' of... · 535d3159

Jakub Kicinski authored Jan 20, 2021

Merge tag 'linux-can-fixes-for-5.11-20210120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
linux-can-fixes-for-5.11-20210120

All three patches are by Vincent Mailhol and fix a potential use after free bug
in the CAN device infrastructure, the vxcan driver, and the peak_usk driver. In
the TX-path the skb is used to read from after it was passed to the networking
stack with netif_rx_ni().

* tag 'linux-can-fixes-for-5.11-20210120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: peak_usb: fix use after free bugs
  can: vxcan: vxcan_xmit: fix use after free bug
  can: dev: can_restart: fix use after free bug
====================

Link: https://lore.kernel.org/r/20210120125202.2187358-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

535d3159

net: systemport: free dev before on error path · 0c630a66

Pan Bian authored Jan 19, 2021

On the error path, it should goto the error handling label to free
allocated memory rather than directly return.

Fixes: 31bc72d9 ("net: systemport: fetch and use clock resources")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210120044423.1704-1-bianpan2016@163.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0c630a66

net: usb: cdc_ncm: don't spew notifications · de658a19

Grant Grundler authored Jan 19, 2021

RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.

This issue has been reported by others:
	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
	https://lkml.org/lkml/2020/8/27/1083

...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber: 000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...

This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.

This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.

When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...

Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

de658a19

net: mscc: ocelot: Fix multicast to the CPU port · 584b7cfc

Alban Bedel authored Jan 19, 2021

Multicast entries in the MAC table use the high bits of the MAC
address to encode the ports that should get the packets. But this port
mask does not work for the CPU port, to receive these packets on the
CPU port the MAC_CPU_COPY flag must be set.

Because of this IPv6 was effectively not working because neighbor
solicitations were never received. This was not apparent before commit
9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb
entries) as the IPv6 entries were broken so all incoming IPv6
multicast was then treated as unknown and flooded on all ports.

To fix this problem rework the ocelot_mact_learn() to set the
MAC_CPU_COPY flag when a multicast entry that target the CPU port is
added. For this we have to read back the ports endcoded in the pseudo
MAC address by the caller. It is not a very nice design but that avoid
changing the callers and should make backporting easier.
Signed-off-by: Alban Bedel <alban.bedel@aerq.com>
Fixes: 9403c158 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries")
Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

584b7cfc

tcp: Fix potential use-after-free due to double kfree() · c89dffc7

Kuniyuki Iwashima authored Jan 18, 2021

Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct
request_sock and then can allocate inet_rsk(req)->ireq_opt. After that,
tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to
inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full
socket into ehash and sets NULL to ireq_opt. Otherwise,
tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full
socket.

The commit 01770a16 ("tcp: fix race condition when creating child
sockets from syncookies") added a new path, in which more than one cores
create full sockets for the same SYN cookie. Currently, the core which
loses the race frees the full socket without resetting inet_opt, resulting
in that both sock_put() and reqsk_put() call kfree() for the same memory:

  sock_put
    sk_free
      __sk_free
        sk_destruct
          __sk_destruct
            sk->sk_destruct/inet_sock_destruct
              kfree(rcu_dereference_protected(inet->inet_opt, 1));

  reqsk_put
    reqsk_free
      __reqsk_free
        req->rsk_ops->destructor/tcp_v4_reqsk_destructor
          kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1));

Calling kmalloc() between the double kfree() can lead to use-after-free, so
this patch fixes it by setting NULL to inet_opt before sock_put().

As a side note, this kind of issue does not happen for IPv6. This is
because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which
correspond to ireq_opt in IPv4.

Fixes: 01770a16 ("tcp: fix race condition when creating child sockets from syncookies")
CC: Ricardo Dias <rdias@singlestore.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jpSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c89dffc7

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b3741b43

Jakub Kicinski authored Jan 20, 2021

Daniel Borkmann says:

====================
pull-request: bpf 2021-01-20

1) Fix wrong bpf_map_peek_elem_proto helper callback, from Mircea Cirjaliu.

2) Fix signed_{sub,add32}_overflows type truncation, from Daniel Borkmann.

3) Fix AF_XDP to also clear pools for inactive queues, from Maxim Mikityanskiy.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix signed_{sub,add32}_overflows type handling
  xsk: Clear pool even for inactive queues
  bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
====================

Link: https://lore.kernel.org/r/20210120163439.8160-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b3741b43

bpf: Fix signed_{sub,add32}_overflows type handling · bc895e8b

Daniel Borkmann authored Jan 20, 2021

Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.

Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>

bc895e8b

can: peak_usb: fix use after free bugs · 50aca891

Vincent Mailhol authored Jan 20, 2021

After calling peak_usb_netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is accessed
after the peak_usb_netif_rx_ni().

Reordering the lines solves the issue.

Fixes: 0a25e1f4 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Link: https://lore.kernel.org/r/20210120114137.200019-4-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

50aca891

can: vxcan: vxcan_xmit: fix use after free bug · 75854cad

Vincent Mailhol authored Jan 20, 2021

After calling netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the canfd_frame cfd which aliases skb memory is accessed
after the netif_rx_ni().

Fixes: a8f820a3 ("can: add Virtual CAN Tunnel driver (vxcan)")
Link: https://lore.kernel.org/r/20210120114137.200019-3-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

75854cad

can: dev: can_restart: fix use after free bug · 03f16c50

Vincent Mailhol authored Jan 20, 2021

After calling netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is accessed
after the netif_rx_ni() in:
stats->rx_bytes += cf->len;

Reordering the lines solves the issue.

Fixes: 39549eef ("can: CAN Network device driver and Netlink interface")
Link: https://lore.kernel.org/r/20210120114137.200019-2-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

03f16c50

tcp: fix TCP socket rehash stats mis-accounting · 9c30ae83

Yuchung Cheng authored Jan 19, 2021

The previous commit 32efcc06 ("tcp: export count for rehash attempts")
would mis-account rehashing SNMP and socket stats:

  a. During handshake of an active open, only counts the first
     SYN timeout

  b. After handshake of passive and active open, stop updating
     after (roughly) TCP_RETRIES1 recurring RTOs

  c. After the socket aborts, over count timeout_rehash by 1

This patch fixes this by checking the rehash result from sk_rethink_txhash.

Fixes: 32efcc06 ("tcp: export count for rehash attempts")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9c30ae83

net: dsa: b53: fix an off by one in checking "vlan->vid" · 8e4052c3

Dan Carpenter authored Jan 19, 2021

The > comparison should be >= to prevent accessing one element beyond
the end of the dev->vlans[] array in the caller function, b53_vlan_add().
The "dev->vlans" array is allocated in the b53_switch_init() function
and it has "dev->num_vlans" elements.

Fixes: a2482d2c ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/YAbxI97Dl/pmBy5V@mwandaSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8e4052c3

tcp: do not mess with cloned skbs in tcp_add_backlog() · b160c285

Eric Dumazet authored Jan 19, 2021

Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0

This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.

Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workload mixing TCP incoming traffic
and forwarded packets.

The problem is that tcp_add_backlog() is writing
over gso_segs and gso_size even if the incoming packet will not
be coalesced to the backlog tail packet.

While skb_try_coalesce() would bail out if tail packet is cloned,
this overwriting would lead to corruptions of other packets
cooked by lan78xx, sharing a common super-packet.

The strategy used by lan78xx is to use a big skb, and split
it into all received packets using skb_clone() to avoid copies.
The drawback of this strategy is that all the small skb share a common
struct skb_shared_info.

This patch rewrites TCP gso_size/gso_segs handling to only
happen on the tail skb, since skb_try_coalesce() made sure
it was not cloned.

Fixes: 4f693b55 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Juerg Haefliger <juergh@canonical.com>
Tested-by: Juerg Haefliger <juergh@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423
Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b160c285

selftests: net: fib_tests: remove duplicate log test · fd23d2dc

Hangbin Liu authored Jan 19, 2021

The previous test added an address with a specified metric and check if
correspond route was created. I somehow added two logs for the same
test. Remove the duplicated one.
Reported-by: Antoine Tenart <atenart@redhat.com>
Fixes: 0d29169a ("selftests/net/fib_tests: update addr_metric_test for peer route testing")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210119025930.2810532-1-liuhangbin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fd23d2dc

net: nfc: nci: fix the wrong NCI_CORE_INIT parameters · 4964e5a1

Bongsu Jeon authored Jan 19, 2021

Fix the code because NCI_CORE_INIT_CMD includes two parameters in NCI2.0
but there is no parameters in NCI1.x.

Fixes: bcd684aa ("net/nfc/nci: Support NCI 2.x initial sequence")
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Link: https://lore.kernel.org/r/20210118205522.317087-1-bongsu.jeon@samsung.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4964e5a1

sh_eth: Fix power down vs. is_opened flag ordering · f6a2e94b

Geert Uytterhoeven authored Jan 18, 2021

sh_eth_close() does a synchronous power down of the device before
marking it closed. Revert the order, to make sure the device is never
marked opened while suspended.

While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as
there is no reason to do a synchronous power down.

Fixes: 7fa2955f ("sh_eth: Fix sleeping function called from invalid context")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20210118150812.796791-1-geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f6a2e94b

19 Jan, 2021 6 commits

net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled · a3eb4e9d

Tariq Toukan authored Jan 17, 2021

With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.

Fixes: 14136564 ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a3eb4e9d

Merge branch 'ipv4-ensure-ecn-bits-don-t-influence-source-address-validation' · 2565ff4e

Jakub Kicinski authored Jan 19, 2021

Guillaume Nault says:

====================
ipv4: Ensure ECN bits don't influence source address validation

Functions that end up calling fib_table_lookup() should clear the ECN
bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
differently.

Most functions already clear the ECN bits, but there are a few cases
where this is not done. This series only fixes the ones related to
source address validation.
====================

Link: https://lore.kernel.org/r/cover.1610790904.git.gnault@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2565ff4e

netfilter: rpfilter: mask ecn bits before fib lookup · 2e5a6266

Guillaume Nault authored Jan 16, 2021

RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.

Reproducer:

  Create two netns, connected with a veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/32 dev veth10

  Add a route to ns1 in ns0:
  $ ip -netns ns0 route add 192.0.2.11/32 dev veth01

  In ns1, only packets with TOS 4 can be routed to ns0:
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10

  Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
  is 4:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
    ... 0% packet loss ...

  Now use iptable's rpfilter module in ns1:
  $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP

  Not-ECT and ECT(1) packets still pass:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
    ... 0% packet loss ...

  But ECT(0) and ECN packets are dropped:
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
    ... 100% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
    ... 100% packet loss ...

After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.

Fixes: 8f97339d ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2e5a6266

udp: mask TOS bits in udp_v4_early_demux() · 8d2b51b0

Guillaume Nault authored Jan 16, 2021

udp_v4_early_demux() is the only function that calls
ip_mc_validate_source() with a TOS that hasn't been masked with
IPTOS_RT_MASK.

This results in different behaviours for incoming multicast UDPv4
packets, depending on if ip_mc_validate_source() is called from the
early-demux path (udp_v4_early_demux) or from the regular input path
(ip_route_input_noref).

ECN would normally not be used with UDP multicast packets, so the
practical consequences should be limited on that side. However,
IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
with the non-early-demux path behaviour.

Reproducer:

  Setup two netns, connected with veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10

  In ns0, add route to multicast address 224.0.2.0/24 using source
  address 198.51.100.10:
  $ ip -netns ns0 address add 198.51.100.10/32 dev lo
  $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10

  In ns1, define route to 198.51.100.10, only for packets with TOS 4:
  $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10

  Also activate rp_filter in ns1, so that incoming packets not matching
  the above route get dropped:
  $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1

  Now try to receive packets on 224.0.2.11:
  $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -

  In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
  tos 6 for socat):
  $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6

  The "test0" message is properly received by socat in ns1, because
  early-demux has no cached dst to use, so source address validation
  is done by ip_route_input_mc(), which receives a TOS that has the
  ECN bits masked.

  Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
  $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6

  The "test1" message isn't received by socat in ns1, because, now,
  early-demux has a cached dst to use and calls ip_mc_validate_source()
  immediately, without masking the ECN bits.

Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8d2b51b0

xsk: Clear pool even for inactive queues · b425e24a

Maxim Mikityanskiy authored Jan 18, 2021

The number of queues can change by other means, rather than ethtool. For
example, attaching an mqprio qdisc with num_tc > 1 leads to creating
multiple sets of TX queues, which may be then destroyed when mqprio is
deleted. If an AF_XDP socket is created while mqprio is active,
dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
decrease with deletion of mqprio, which will mean that the pool won't be
NULLed, and a further increase of the number of TX queues may expose a
dangling pointer.

To avoid any potential misbehavior, this commit clears pool for RX and
TX queues, regardless of real_num_*_queues, still taking into
consideration num_*_queues to avoid overflows.

Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
Fixes: a41b4f3c ("xsk: simplify xdp_clear_umem_at_qid implementation")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com

b425e24a

Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block · 45dfb8a5

Linus Torvalds authored Jan 19, 2021

Pull task_work fix from Jens Axboe:
 "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
  task_work run we had in get_signal().

  This caused a regression for some setups, since we're relying on eg
  ____fput() being run to close and release, for example, a pipe and
  wake the other end.

  For 5.11, I prefer the simple solution of just reinstating the
  unconditional run, even if it conceptually doesn't make much sense -
  if you need that kind of guarantee, you should be using TWA_SIGNAL
  instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
  ensure that other potential gotchas/assumptions for task_work don't
  regress for 5.11.

  We're looking into further simplifying the task_work notifications for
  5.12 which would resolve that too"

* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
  task_work: unconditionally run task_work from get_signal()

45dfb8a5