1. 21 Jan, 2021 11 commits
    • Jin Yao's avatar
      perf script: Fix overrun issue for dynamically-allocated PMU type number · 8adc0a06
      Jin Yao authored
      When unpacking the event which is from dynamic PMU, the array
      output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL
      uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX =
      PERF_TYPE_MAX + 1).
      
      /* In builtin-script.c */
      
      process_event()
      {
              unsigned int type = output_type(attr->type);
      
              if (output[type].fields == 0)
                      return;
      }
      
      output[10] is overrun.
      
      Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then
      output_type(attr->type) will return OUTPUT_TYPE_OTHER here.
      
      Note that if PERF_TYPE_MAX ever changed, then there would be a conflict
      between old perf.data files that had a dynamicaliy allocated PMU number
      that would then be the same as a fixed PERF_TYPE.
      
      Example:
      
        # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1
        # perf script
      
        Before:
               swapper     0 [000] 1479253.987551:     277766               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.987797:     246709               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.988127:     329883               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.988273:     146393               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.988523:     249977               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.988877:     354090               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.989023:     145940               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.989383:     359856               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1479253.989523:     140082               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
      
        After:
               swapper     0 [000] 1397040.402011:     272384               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1397040.402011:       5396  uncore_imc/data_reads/:
               swapper     0 [000] 1397040.402011:        967 uncore_imc/data_writes/:
               swapper     0 [000] 1397040.402259:     249153               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1397040.402259:       7231  uncore_imc/data_reads/:
               swapper     0 [000] 1397040.402259:       1297 uncore_imc/data_writes/:
               swapper     0 [000] 1397040.402508:     249108               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
               swapper     0 [000] 1397040.402508:       5333  uncore_imc/data_reads/:
               swapper     0 [000] 1397040.402508:       1008 uncore_imc/data_writes/:
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20201209005828.21302-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8adc0a06
    • John Garry's avatar
      perf metricgroup: Fix system PMU metrics · 3d6e79ee
      John Garry authored
      Joakim reports that getting "perf stat" for multiple system PMU metrics
      segfaults:
      
        $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all
        Segmentation fault
        $
      
      While the same works without issue for a single metric.
      
      The logic in metricgroup__add_metric_sys_event_iter() is broken, in that
      add_metric() @M argument should be NULL for each new metric. Fix by not
      passing a holder for that, and rather make local in
      metricgroup__add_metric_sys_event_iter().
      
      Fixes: be335ec2 ("perf metricgroup: Support adding metrics for system PMUs")
      Reported-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611050655-44020-1-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3d6e79ee
    • John Garry's avatar
      perf metricgroup: Fix for metrics containing duration_time · 9c880c24
      John Garry authored
      Metrics containing duration_time cause a segfault:
      
        $ perf stat -v -M L1D_Cache_Fill_BW sleep 1
        Using CPUID GenuineIntel-6-3D-4
        metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
        found event duration_time
        found event l1d.replacement
        adding {l1d.replacement}:W,duration_time
        l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/
        Segmentation fault
        $
      
      In commit c2337d67 ("perf metricgroup: Fix metrics using aliases
      covering multiple PMUs"), the logic in find_evsel_group() when iter'ing
      events was changed to not only select events in same group, but also for
      aliased PMUs.
      
      Checking whether events were for aliased PMUs was done by comparing the
      event PMU name. This was not safe for duration_time event, which has no
      associated PMU (and no PMU name), so fix by checking if the event PMU name
      is set also.
      
      Committer testing:
      
      Reproduced the bug, then, on a:
      
        $ grep -m1 ^'model name' /proc/cpuinfo
        model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
        $
      
      We now get:
      
        $ perf stat -M L1D_Cache_Fill_BW sleep 1
      
         Performance counter stats for 'sleep 1':
      
                     4,141      l1d.replacement:u
             1,001,285,107 ns   duration_time:u
      
               1.001285107 seconds time elapsed
      
               0.000000000 seconds user
               0.001119000 seconds sys
      
        $
      
      Detais from -v:
      
        Using CPUID GenuineIntel-6-8E-A
        metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
        found event duration_time
        found event l1d.replacement
        adding {l1d.replacement}:W,duration_time
        l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/
        Control descriptor is not initialized
        Warning:
        kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
        Warning:
        kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
        l1d.replacement:u: 4592 612201 612201
        duration_time:u: 1001478621 1001478621 1001478621
      
      Fixes: c2337d67 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs")
      Reported-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9c880c24
    • Adrian Hunter's avatar
      perf evlist: Fix id index for heterogeneous systems · fc705fec
      Adrian Hunter authored
      perf_evlist__set_sid_idx() updates perf_sample_id with the evlist map
      index, CPU number and TID. It is passed indexes to the evsel's cpu and
      thread maps, but references the evlist's maps instead. That results in
      using incorrect CPU numbers on heterogeneous systems. Fix it by using
      evsel maps.
      
      The id index (PERF_RECORD_ID_INDEX) is used by AUX area tracing when in
      sampling mode. Having an incorrect CPU number causes the trace data to
      be attributed to the wrong CPU, and can result in decoder errors because
      the trace data is then associated with the wrong process.
      
      Committer notes:
      
      Keep the class prefix convention in the function name, switching from
      perf_evlist__set_sid_idx() to perf_evsel__set_sid_idx().
      
      Fixes: 3c659eed ("perf tools: Add id index")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20210121125446.11287-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fc705fec
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 9f29bd8b
      Linus Torvalds authored
      Pull fs and udf fixes from Jan Kara:
       "A lazytime handling fix from Eric Biggers and a fix of UDF session
        handling for large devices"
      
      * tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: fix the problem that the disc content is not displayed
        fs: fix lazytime expiration handling in __writeback_single_inode()
      9f29bd8b
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.11-printk-rework-fixup' of... · 2561bbbe
      Linus Torvalds authored
      Merge tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
      
      Pull printk fixes from Petr Mladek:
      
       - Fix line counting and buffer size calculation. Both regressions
         caused that a reader buffer might not get filled as much as possible.
      
       - Restore non-documented behavior of printk() reader API and make it
         official.
      
         It did not fill the last byte of the provided buffer before 5.10. Two
         architectures, powerpc and um, used it to add the trailing '\0'.
         There might theoretically be more callers depending on this behavior
         in userspace.
      
      * tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: fix buffer overflow potential for print_text()
        printk: fix kmsg_dump_get_buffer length calulations
        printk: ringbuffer: fix line counting
      2561bbbe
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6a52f4cf
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Modify a helper function in the ACPI core to match the behavior
        expected by its users so as to prevent NULL pointer dereferences and
        occasional memory corruption from occurring (Hans de Goede)"
      
      * tag 'acpi-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
      6a52f4cf
    • Linus Torvalds's avatar
      Merge tag 'sound-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 120fbdb8
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Here is a collection of sound fixes targeted for 5.11-rc5. Most
        notably, USB-audio still got a few intensive changes for covering the
        regressions while the rest are all small fixes.
      
         - A trivial fix for sequencer OSS emulation error path
      
         - HD-audio runtime PM regression fix, a few quirks and new IDs
      
         - USB-audio regression fixes for Pioneer device, Logitech webcams,
           etc
      
         - ASoC SOF Intel coverage
      
         - MAINTAINERS file update
      
         - A fix in the jack handling in ASoC HDMI codec"
      
      * tag 'sound-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Fix hw constraints dependencies
        ALSA: hda: Balance runtime/system PM if direct-complete is disabled
        ALSA: usb-audio: Avoid implicit feedback on Pioneer devices
        ALSA: usb-audio: Set sample rate for all sharing EPs on UAC1
        ALSA: usb-audio: Fix UAC1 rate setup for secondary endpoints
        MAINTAINERS: update qcom ASoC drivers list
        MAINTAINERS: update maintainers of qcom audio
        ALSA: hda: Add Cometlake-R PCI ID
        ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
        ALSA: hda/via: Add minimum mute flag
        ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
        ALSA: usb-audio: Always apply the hw constraints for implicit fb sync
        ASoC: SOF: Intel: fix page fault at probe if i915 init fails
        ALSA: hda: Add AlderLake-P PCI ID and HDMI codec vid
        ASoC: SOF: Intel: hda: Avoid checking jack on system suspend
        ASoC: SOF: Intel: hda: Modify existing helper to disable WAKEEN
        ASoC: SOF: Intel: hda: Resume codec to do jack detection
        MAINTAINERS: update references to stm32 audio bindings
        ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack()
      120fbdb8
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · d7631e43
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - rework the character device code to avoid a frame size warning
      
       - fix printk format issues in gpio-tools
      
       - warn on redefinition of the to_irq callback in core gpiolib code
      
       - fix PWM period calculation in gpio-mvebu
      
       - make gpio-sifive Kconfig entry consistent with other drivers
      
       - fix a build issue in gpio-tegra
      
      * tag 'gpio-fixes-for-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: tegra: Add missing dependencies
        gpio: sifive: select IRQ_DOMAIN_HIERARCHY rather than depend on it
        gpio: mvebu: fix pwm .get_state period calculation
        gpiolib: add a warning on gpiochip->to_irq defined
        tools: gpio: fix %llu warning in gpio-watch.c
        tools: gpio: fix %llu warning in gpio-event-mon.c
        gpiolib: cdev: fix frame size warning in gpio_ioctl()
      d7631e43
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 63858ac3
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "These are all driver fixes, the Qualcomm stuff is the most widely used
        and important:
      
         - The main matter is a complicated fixup for the Qualcomm deep sleep
           states.
      
           This manifests in how interrupts get handled or in some cases not
           handled in cooperation with the PDC (Power Domain Controller). It's
           one of these really hardcore bug fixes that signifies high maturity
           of the platform.
      
         - Fix a register layout problem in the JZ4760 driver
      
         - Fix a register offset in the Aspeed G6 driver
      
         - Fix a compiler warning in the Nomadik driver
      
         - Fix a fallback code path in the mediatek driver"
      
      * tag 'pinctrl-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: qcom: Don't clear pending interrupts when enabling
        pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmasking
        pinctrl: qcom: No need to read-modify-write the interrupt status
        pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0
        pinctrl: mediatek: Fix fallback call path
        pinctrl: nomadik: Remove unused variable in nmk_gpio_dbg_show_one
        pinctrl: aspeed: g6: Fix PWMG0 pinctrl setting
        pinctrl: ingenic: Rename registers from JZ4760_GPIO_* to JZ4770_GPIO_*
        pinctrl: ingenic: Fix JZ4760 support
      63858ac3
    • Petr Mladek's avatar
      Merge branch 'printk-rework' into for-linus · 535b6a12
      Petr Mladek authored
      535b6a12
  2. 20 Jan, 2021 23 commits
  3. 19 Jan, 2021 6 commits
    • Tariq Toukan's avatar
      net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled · a3eb4e9d
      Tariq Toukan authored
      With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
      logically done when RXCSUM offload is off.
      
      Fixes: 14136564 ("net: Add TLS RX offload feature")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a3eb4e9d
    • Jakub Kicinski's avatar
      Merge branch 'ipv4-ensure-ecn-bits-don-t-influence-source-address-validation' · 2565ff4e
      Jakub Kicinski authored
      Guillaume Nault says:
      
      ====================
      ipv4: Ensure ECN bits don't influence source address validation
      
      Functions that end up calling fib_table_lookup() should clear the ECN
      bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
      differently.
      
      Most functions already clear the ECN bits, but there are a few cases
      where this is not done. This series only fixes the ones related to
      source address validation.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1610790904.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2565ff4e
    • Guillaume Nault's avatar
      netfilter: rpfilter: mask ecn bits before fib lookup · 2e5a6266
      Guillaume Nault authored
      RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
      treats Not-ECT or ECT(1) packets in a different way than those with
      ECT(0) or CE.
      
      Reproducer:
      
        Create two netns, connected with a veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11/32 dev veth10
      
        Add a route to ns1 in ns0:
        $ ip -netns ns0 route add 192.0.2.11/32 dev veth01
      
        In ns1, only packets with TOS 4 can be routed to ns0:
        $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
      
        Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
        is 4:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 0% packet loss ...
      
        Now use iptable's rpfilter module in ns1:
        $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
      
        Not-ECT and ECT(1) packets still pass:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
      
        But ECT(0) and ECN packets are dropped:
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 100% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 100% packet loss ...
      
      After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
      
      Fixes: 8f97339d ("netfilter: add ipv4 reverse path filter match")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e5a6266
    • Guillaume Nault's avatar
      udp: mask TOS bits in udp_v4_early_demux() · 8d2b51b0
      Guillaume Nault authored
      udp_v4_early_demux() is the only function that calls
      ip_mc_validate_source() with a TOS that hasn't been masked with
      IPTOS_RT_MASK.
      
      This results in different behaviours for incoming multicast UDPv4
      packets, depending on if ip_mc_validate_source() is called from the
      early-demux path (udp_v4_early_demux) or from the regular input path
      (ip_route_input_noref).
      
      ECN would normally not be used with UDP multicast packets, so the
      practical consequences should be limited on that side. However,
      IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
      with the non-early-demux path behaviour.
      
      Reproducer:
      
        Setup two netns, connected with veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip -netns ns0 link set dev lo up
        $ ip -netns ns1 link set dev lo up
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
      
        In ns0, add route to multicast address 224.0.2.0/24 using source
        address 198.51.100.10:
        $ ip -netns ns0 address add 198.51.100.10/32 dev lo
        $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
      
        In ns1, define route to 198.51.100.10, only for packets with TOS 4:
        $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
      
        Also activate rp_filter in ns1, so that incoming packets not matching
        the above route get dropped:
        $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
      
        Now try to receive packets on 224.0.2.11:
        $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
      
        In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
        tos 6 for socat):
        $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test0" message is properly received by socat in ns1, because
        early-demux has no cached dst to use, so source address validation
        is done by ip_route_input_mc(), which receives a TOS that has the
        ECN bits masked.
      
        Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
        $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test1" message isn't received by socat in ns1, because, now,
        early-demux has a cached dst to use and calls ip_mc_validate_source()
        immediately, without masking the ECN bits.
      
      Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d2b51b0
    • Maxim Mikityanskiy's avatar
      xsk: Clear pool even for inactive queues · b425e24a
      Maxim Mikityanskiy authored
      The number of queues can change by other means, rather than ethtool. For
      example, attaching an mqprio qdisc with num_tc > 1 leads to creating
      multiple sets of TX queues, which may be then destroyed when mqprio is
      deleted. If an AF_XDP socket is created while mqprio is active,
      dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
      decrease with deletion of mqprio, which will mean that the pool won't be
      NULLed, and a further increase of the number of TX queues may expose a
      dangling pointer.
      
      To avoid any potential misbehavior, this commit clears pool for RX and
      TX queues, regardless of real_num_*_queues, still taking into
      consideration num_*_queues to avoid overflows.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Fixes: a41b4f3c ("xsk: simplify xdp_clear_umem_at_qid implementation")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com
      b425e24a
    • Linus Torvalds's avatar
      Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block · 45dfb8a5
      Linus Torvalds authored
      Pull task_work fix from Jens Axboe:
       "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
        task_work run we had in get_signal().
      
        This caused a regression for some setups, since we're relying on eg
        ____fput() being run to close and release, for example, a pipe and
        wake the other end.
      
        For 5.11, I prefer the simple solution of just reinstating the
        unconditional run, even if it conceptually doesn't make much sense -
        if you need that kind of guarantee, you should be using TWA_SIGNAL
        instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
        ensure that other potential gotchas/assumptions for task_work don't
        regress for 5.11.
      
        We're looking into further simplifying the task_work notifications for
        5.12 which would resolve that too"
      
      * tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
        task_work: unconditionally run task_work from get_signal()
      45dfb8a5