1. 08 Sep, 2022 2 commits
  2. 07 Sep, 2022 8 commits
    • Yacan Liu's avatar
      net/smc: Fix possible access to freed memory in link clear · e9b1a4f8
      Yacan Liu authored
      After modifying the QP to the Error state, all RX WR would be completed
      with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
      wait for it is done, but destroy the QP and free the link group directly.
      So there is a risk that accessing the freed memory in tasklet context.
      
      Here is a crash example:
      
       BUG: unable to handle page fault for address: ffffffff8f220860
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0002) - not-present page
       PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060
       Oops: 0002 [#1] SMP PTI
       CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S         OE     5.10.0-0607+ #23
       Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
       RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
       Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
       RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086
       RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000
       RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00
       RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b
       R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010
       R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040
       FS:  0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <IRQ>
        _raw_spin_lock_irqsave+0x30/0x40
        mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
        smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
        tasklet_action_common.isra.21+0x66/0x100
        __do_softirq+0xd5/0x29c
        asm_call_irq_on_stack+0x12/0x20
        </IRQ>
        do_softirq_own_stack+0x37/0x40
        irq_exit_rcu+0x9d/0xa0
        sysvec_call_function_single+0x34/0x80
        asm_sysvec_call_function_single+0x12/0x20
      
      Fixes: bd4ad577 ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
      Signed-off-by: default avatarYacan Liu <liuyacan@corp.netease.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9b1a4f8
    • Lorenzo Bianconi's avatar
      net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb · f27b405e
      Lorenzo Bianconi authored
      Even if max hash configured in hw in mtk_ppe_hash_entry is
      MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in
      mtk_ppe_check_skb routine
      
      Fixes: c4f033d9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f27b405e
    • Menglong Dong's avatar
      net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM · 9cb252c4
      Menglong Dong authored
      As Eric reported, the 'reason' field is not presented when trace the
      kfree_skb event by perf:
      
      $ perf record -e skb:kfree_skb -a sleep 10
      $ perf script
        ip_defrag 14605 [021]   221.614303:   skb:kfree_skb:
        skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1
        reason:
      
      The cause seems to be passing kernel address directly to TP_printk(),
      which is not right. As the enum 'skb_drop_reason' is not exported to
      user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason
      string from the 'reason' field, which is a number.
      
      Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used
      to define the trace enum by TRACE_DEFINE_ENUM(). With the help of
      DEFINE_DROP_REASON(), now we can remove the auto-generate that we
      introduced in the commit ec43908d
      ("net: skb: use auto-generation to convert skb drop reason to string"),
      and define the string array 'drop_reasons'.
      
      Hmmmm...now we come back to the situation that have to maintain drop
      reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they
      are both in dropreason.h, which makes it easier.
      
      After this commit, now the format of kfree_skb is like this:
      
      $ cat /tracing/events/skb/kfree_skb/format
      name: kfree_skb
      ID: 1524
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:void * skbaddr;   offset:8;       size:8; signed:0;
              field:void * location;  offset:16;      size:8; signed:0;
              field:unsigned short protocol;  offset:24;      size:2; signed:0;
              field:enum skb_drop_reason reason;      offset:28;      size:4; signed:0;
      
      print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ......
      
      Fixes: ec43908d ("net: skb: use auto-generation to convert skb drop reason to string")
      Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cb252c4
    • Lorenzo Bianconi's avatar
      net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear · 0e80707d
      Lorenzo Bianconi authored
      Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine.
      
      Fixes: 33fc42de ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e80707d
    • David S. Miller's avatar
      Merge branch 'dsa-felix-fixes' · 0f51fa2a
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fixes for Felix DSA driver calculation of tc-taprio guard bands
      
      This series fixes some bugs which are not quite new, but date from v5.13
      when static guard bands were enabled by Michael Walle to prevent
      tc-taprio overruns.
      
      The investigation started when Xiaoliang asked privately what is the
      expected max SDU for a traffic class when its minimum gate interval is
      10 us. The answer, as it turns out, is not an L1 size of 1250 octets,
      but 1245 octets, since otherwise, the switch will not consider frames
      for egress scheduling, because the static guard band is exactly as large
      as the time interval. The switch needs a minimum of 33 ns outside of the
      guard band to consider a frame for scheduling, and the reduction of the
      max SDU by 5 provides exactly for that.
      
      The fix for that (patch 1/3) is relatively small, but during testing, it
      became apparent that cut-through forwarding prevents oversized frame
      dropping from working properly. This is solved through the larger patch
      2/3. Finally, patch 3/3 fixes one more tc-taprio locking problem found
      through code inspection.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f51fa2a
    • Vladimir Oltean's avatar
      net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set · a4bb481a
      Vladimir Oltean authored
      The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set()
      runs unlocked with respect to the other functions that access it, which
      are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and
      vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock,
      so move the vsc9959_sched_speed_set() access under that lock as well, to
      resolve the concurrency.
      
      Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4bb481a
    • Vladimir Oltean's avatar
      net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio · 843794bb
      Vladimir Oltean authored
      Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
      frames even way larger than 601 octets are transmitted even though these
      should be considered as oversized, according to the documentation, and
      dropped.
      
      Since oversized frame dropping depends on frame size, which is only
      known at the EOF stage, and therefore not at SOF when cut-through
      forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
      into consideration for traffic classes that are cut-through.
      
      Since cut-through forwarding has no UAPI to control it, and the driver
      enables it based on the mantra "if we can, then why not", the strategy
      is to alter vsc9959_cut_through_fwd() to take into consideration which
      tc's have oversize frame dropping enabled, and disable cut-through for
      them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
      cut-through determination process.
      
      There are 2 strategies for vsc9959_cut_through_fwd() to determine
      whether a tc has oversized dropping enabled or not. One is to keep a bit
      mask of traffic classes per port, and the other is to read back from the
      hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
      feature is enabled). We choose reading back from registers, because
      struct ocelot_port is shared with drivers (ocelot, seville) that don't
      support either cut-through nor tc-taprio, and we don't have a felix
      specific extension of struct ocelot_port. Furthermore, reading registers
      from the Felix hardware is quite cheap, since they are memory-mapped.
      
      Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843794bb
    • Vladimir Oltean's avatar
      net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet · 11afdc65
      Vladimir Oltean authored
      The blamed commit broke tc-taprio schedules such as this one:
      
      tc qdisc replace dev $swp1 root taprio \
              num_tc 8 \
              map 0 1 2 3 4 5 6 7 \
              queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
              base-time 0 \
              sched-entry S 0x7f 990000 \
              sched-entry S 0x80  10000 \
              flags 0x2
      
      because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
      band added earlier than its 'gate close' event, such that packet
      overruns won't occur in the worst case of the largest packet possible.
      
      Since guard bands are statically determined based on the per-tc
      QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
      we need to discuss what happens with TC 7 depending on kernel version,
      since the driver, prior to commit 55a515b1 ("net: dsa: felix: drop
      oversized frames with tc-taprio instead of hanging the port"), did not
      touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
      
      1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
        1518, and at gigabit this introduces a static guard band (independent
        of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
        time of 20 octets => 160 ns). But this is larger than the time window
        itself, of 10000 ns. So, the queue system never considers a frame with
        TC 7 as eligible for transmission, since the gate practically never
        opens, and these frames are forever stuck in the TX queues and hang
        the port.
      
      2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
        enabling oversized frame dropping, we make an effort to set
        QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
        one more role, which we did not take into account: per-tc static guard
        band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
        There is a discrepancy between what the driver thinks (that there is
        no guard band, and 100% of min_gate_len[tc] is available for egress
        scheduling) and what the hardware actually does (crops the equivalent
        of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
        means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
      
      In both cases, even minimum sized Ethernet frames are stuck on egress
      rather than being considered for scheduling on TC 7, even if they would
      fit given a proper configuration. Considering the current situation,
      with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
      octets in size are not eligible for oversized dropping (because they are
      smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
      for scheduling either, because the min_gate_len[7] (10000 ns) minus the
      guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
      octet == 9840 ns) minus the guard band auto-added for L1 overhead by
      QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
      leaves 0 ns for scheduling in the queue system proper.
      
      Investigating the hardware behavior, it becomes apparent that the queue
      system needs precisely 33 ns of 'gate open' time in order to consider a
      frame as eligible for scheduling to a tc. So the solution to this
      problem is to amend vsc9959_tas_guard_bands_update(), by giving the
      per-tc guard bands less space by exactly 33 ns, just enough for one
      frame to be scheduled in that interval. This allows the queue system to
      make forward progress for that port-tc, and prevents it from hanging.
      
      Fixes: 297c4de6 ("net: dsa: felix: re-enable TAS guard band mode")
      Reported-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11afdc65
  3. 06 Sep, 2022 3 commits
    • jerry.meng's avatar
      net: usb: qmi_wwan: add Quectel RM520N · e1091e22
      jerry.meng authored
      add support for Quectel RM520N which is based on Qualcomm SDX62 chip.
      
      0x0801: DIAG + NMEA + AT + MODEM + RMNET
      
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 10 Spd=480  MxCh= 0
      D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=2c7c ProdID=0801 Rev= 5.04
      S:  Manufacturer=Quectel
      S:  Product=RM520N-GL
      S:  SerialNumber=384af524
      C:* #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=option
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarjerry.meng <jerry-meng@foxmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/tencent_E50CA8A206904897C2D20DDAE90731183C05@qq.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e1091e22
    • Christian Marangi's avatar
      net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data · 42b998d4
      Christian Marangi authored
      of_device_get_match_data is called on priv->dev before priv->dev is
      actually set. Move of_device_get_match_data after priv->dev is correctly
      set to fix this kernel panic.
      
      Fixes: 3bb0844e ("net: dsa: qca8k: cache match data to speed up access")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220904215319.13070-1-ansuelsmth@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42b998d4
    • Neal Cardwell's avatar
      tcp: fix early ETIMEDOUT after spurious non-SACK RTO · 686dc2db
      Neal Cardwell authored
      Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
      of a spurious non-SACK RTO could cause a connection to fail to clear
      retrans_stamp, causing a later RTO to very prematurely time out the
      connection with ETIMEDOUT.
      
      Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
      report:
      
      (*1) Send one data packet on a non-SACK connection
      
      (*2) Because no ACK packet is received, the packet is retransmitted
           and we enter CA_Loss; but this retransmission is spurious.
      
      (*3) The ACK for the original data is received. The transmitted packet
           is acknowledged.  The TCP timestamp is before the retrans_stamp,
           so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
           true without changing state to Open (because tcp_is_sack() is
           false), and tcp_process_loss() returns without calling
           tcp_try_undo_recovery().  Normally after undoing a CA_Loss
           episode, tcp_fastretrans_alert() would see that the connection
           has returned to CA_Open and fall through and call
           tcp_try_to_open(), which would set retrans_stamp to 0.  However,
           for non-SACK connections we hold the connection in CA_Loss, so do
           not fall through to call tcp_try_to_open() and do not set
           retrans_stamp to 0. So retrans_stamp is (erroneously) still
           non-zero.
      
           At this point the first "retransmission event" has passed and
           been recovered from. Any future retransmission is a completely
           new "event". However, retrans_stamp is erroneously still
           set. (And we are still in CA_Loss, which is correct.)
      
      (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
           packet is sent. Note: No data is transmitted between (*3) and
           (*4) and we disabled keep alives.
      
           The socket's timeout SHOULD be calculated from this point in
           time, but instead it's calculated from the prior "event" 16
           minutes ago (step (*2)).
      
      (*5) Because no ACK packet is received, the packet is retransmitted.
      
      (*6) At the time of the 2nd retransmission, the socket returns
           ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
           too far in the past (set at the time of (*2)).
      
      This commit fixes this bug by ensuring that we reuse in
      tcp_try_undo_loss() the same careful logic for non-SACK connections
      that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
      we factor out that logic into a new
      tcp_is_non_sack_preventing_reopen() helper and call that helper from
      both undo functions.
      
      Fixes: da34ac76 ("tcp: only undo on partial ACKs in CA_Loss")
      Reported-by: default avatarNagaraj Arankal <nagaraj.p.arankal@hpe.com>
      Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      686dc2db
  4. 05 Sep, 2022 8 commits
    • David S. Miller's avatar
      Merge tag 'for-net-2022-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · beb43252
      David S. Miller authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression preventing ACL packet transmission
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beb43252
    • Christophe JAILLET's avatar
      stmmac: intel: Simplify intel_eth_pci_remove() · 1621e70f
      Christophe JAILLET authored
      There is no point to call pcim_iounmap_regions() in the remove function,
      this frees a managed resource that would be release by the framework
      anyway.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1621e70f
    • Greg Kroah-Hartman's avatar
      net: mvpp2: debugfs: fix memory leak when using debugfs_lookup() · fe2c9c61
      Greg Kroah-Hartman authored
      When calling debugfs_lookup() the result must have dput() called on it,
      otherwise the memory will leak over time.  Fix this up to be much
      simpler logic and only create the root debugfs directory once when the
      driver is first accessed.  That resolves the memory leak and makes
      things more obvious as to what the intent is.
      
      Cc: Marcin Wojtas <mw@semihalf.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Cc: stable <stable@kernel.org>
      Fixes: 21da57a2 ("net: mvpp2: add a debugfs interface for the Header Parser")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe2c9c61
    • David Lebrun's avatar
      ipv6: sr: fix out-of-bounds read when setting HMAC data. · 84a53580
      David Lebrun authored
      The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
      Segment Routing Headers. This configuration is realised via netlink through
      four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
      SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
      length of the SECRET attribute, it is possible to provide invalid combinations
      (e.g., secret = "", secretlen = 64). This case is not checked in the code and
      with an appropriately crafted netlink message, an out-of-bounds read of up
      to 64 bytes (max secret length) can occur past the skb end pointer and into
      skb_shared_info:
      
      Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
      208		memcpy(hinfo->secret, secret, slen);
      (gdb) bt
       #0  seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
       #1  0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
          extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>,
          family=<optimized out>) at net/netlink/genetlink.c:731
       #2  0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
          family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775
       #3  genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
       #4  0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>)
          at net/netlink/af_netlink.c:2501
       #5  0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
       #6  0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
          at net/netlink/af_netlink.c:1319
       #7  netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>)
          at net/netlink/af_netlink.c:1345
       #8  0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921
      ...
      (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
      $1 = 0xffff88800b1b76c0
      (gdb) p/x secret
      $2 = 0xffff88800b1b76c0
      (gdb) p slen
      $3 = 64 '@'
      
      The OOB data can then be read back from userspace by dumping HMAC state. This
      commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
      SECRET.
      Reported-by: default avatarLucas Leong <wmliang.tw@gmail.com>
      Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
      Fixes: 4f4853dc ("ipv6: sr: implement API to control SR HMAC structure")
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84a53580
    • David S. Miller's avatar
      Merge branch 'bonding-fixes' · 060ad609
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      bonding: fix lladdr finding and confirmation
      
      This patch set fixed 3 issues when setting lladdr as bonding IPv6 target.
      Please see each patch for the details.
      
      v2: separate the patch to 3 parts
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      060ad609
    • Hangbin Liu's avatar
      bonding: accept unsolicited NA message · 592335a4
      Hangbin Liu authored
      The unsolicited NA message with all-nodes multicast dest address should
      be valid, as this also means the link could reach the target.
      
      Also rename bond_validate_ns() to bond_validate_na().
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      592335a4
    • Hangbin Liu's avatar
      bonding: add all node mcast address when slave up · fd16eb94
      Hangbin Liu authored
      When a link is enslave to bond, it need to set the interface down first.
      This makes the slave remove mac multicast address 33:33:00:00:00:01(The
      IPv6 multicast address ff02::1 is kept even when the interface down). When
      bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf8
      ("bonding / ipv6: no addrconf for slaves separately from master").
      
      This is not an issue before we adding the lladdr target feature for bonding,
      as the mac multicast address will be added back when bond interface up and
      join group ff02::1.
      
      But after adding lladdr target feature for bonding. When user set a lladdr
      target, the unsolicited NA message with all-nodes multicast dest will be
      dropped as the slave interface never add 33:33:00:00:00:01 back.
      
      Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when
      the slave interface up.
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd16eb94
    • Hangbin Liu's avatar
      bonding: use unspecified address if no available link local address · b7f14132
      Hangbin Liu authored
      When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get
      available source address and send IPv6 neighbor solicit message.
      
      If the target is global address, ipv6_dev_get_saddr() will get any
      available src address. But if the target is link local address,
      ipv6_dev_get_saddr() will only get available address from our interface,
      i.e. the corresponding bond interface.
      
      But before bond interface up, all the address is tentative, while
      ipv6_dev_get_saddr() will ignore tentative address. This makes we can't
      find available link local src address, then bond_ns_send() will not be
      called and no NS message was sent. Finally bond interface will keep in
      down state.
      
      Fix this by sending NS with unspecified address if there is no available
      source address.
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7f14132
  5. 04 Sep, 2022 1 commit
  6. 03 Sep, 2022 12 commits
  7. 02 Sep, 2022 6 commits
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix hci_read_buffer_size_sync · be318363
      Luiz Augusto von Dentz authored
      hci_read_buffer_size_sync shall not use HCI_OP_LE_READ_BUFFER_SIZE_V2
      sinze that is LE specific, instead it is hci_le_read_buffer_size_sync
      version that shall use it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216382
      Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      be318363
    • Ivan Vecera's avatar
      iavf: Detach device during reset task · aa626da9
      Ivan Vecera authored
      iavf_reset_task() takes crit_lock at the beginning and holds
      it during whole call. The function subsequently calls
      iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs
      when userspace initiates during the reset task any ndo callback
      that runs under RTNL like iavf_open() because some of that
      functions tries to take crit_lock. This leads to classic A-B B-A
      deadlock scenario.
      
      To resolve this situation the device should be detached in
      iavf_reset_task() prior taking crit_lock to avoid subsequent
      ndos running under RTNL and reattach the device at the end.
      
      Fixes: 62fe2a86 ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability")
      Cc: Jacob Keller <jacob.e.keller@intel.com>
      Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
      Cc: SlawomirX Laba <slawomirx.laba@intel.com>
      Tested-by: default avatarVitaly Grinberg <vgrinber@redhat.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      aa626da9
    • Ivan Vecera's avatar
      i40e: Fix kernel crash during module removal · fb8396ae
      Ivan Vecera authored
      The driver incorrectly frees client instance and subsequent
      i40e module removal leads to kernel crash.
      
      Reproducer:
      1. Do ethtool offline test followed immediately by another one
      host# ethtool -t eth0 offline; ethtool -t eth0 offline
      2. Remove recursively irdma module that also removes i40e module
      host# modprobe -r irdma
      
      Result:
      [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting
      [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished
      [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting
      [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished
      [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110
      [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2
      [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01
      [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1
      [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [ 8687.768755] #PF: supervisor read access in kernel mode
      [ 8687.773895] #PF: error_code(0x0000) - not-present page
      [ 8687.779034] PGD 0 P4D 0
      [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G        W I        5.19.0+ #2
      [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019
      [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e]
      [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b
      [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202
      [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000
      [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000
      [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000
      [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0
      [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008
      [ 8687.870342] FS:  00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000
      [ 8687.878427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0
      [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 8687.905572] PKRU: 55555554
      [ 8687.908286] Call Trace:
      [ 8687.910737]  <TASK>
      [ 8687.912843]  i40e_remove+0x2c0/0x330 [i40e]
      [ 8687.917040]  pci_device_remove+0x33/0xa0
      [ 8687.920962]  device_release_driver_internal+0x1aa/0x230
      [ 8687.926188]  driver_detach+0x44/0x90
      [ 8687.929770]  bus_remove_driver+0x55/0xe0
      [ 8687.933693]  pci_unregister_driver+0x2a/0xb0
      [ 8687.937967]  i40e_exit_module+0xc/0xf48 [i40e]
      
      Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this
      failure is indicated back to i40e_client_subtask() that calls
      i40e_client_del_instance() to free client instance referenced
      by pf->cinst and sets this pointer to NULL. During the module
      removal i40e_remove() calls i40e_lan_del_device() that dereferences
      pf->cinst that is NULL -> crash.
      Do not remove client instance when client open callbacks fails and
      just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs
      to take care about this situation (when netdev is up and client
      is NOT opened) in i40e_notify_client_of_netdev_close() and
      calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED
      is set.
      
      Fixes: 0ef2d5af ("i40e: KISS the client interface")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarHelena Anna Dubel <helena.anna.dubel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fb8396ae
    • Przemyslaw Patynowski's avatar
      i40e: Fix ADQ rate limiting for PF · 45bb006d
      Przemyslaw Patynowski authored
      Fix HW rate limiting for ADQ.
      Fallback to kernel queue selection for ADQ, as it is network stack
      that decides which queue to use for transmit with ADQ configured.
      Reset PF after creation of VMDq2 VSIs required for ADQ, as to
      reprogram TX queue contexts in i40e_configure_tx_ring.
      Without this patch PF would limit TX rate only according to TC0.
      
      Fixes: a9ce82f7 ("i40e: Enable 'channel' mode in mqprio for TC configs")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Tested-by: default avatarBharathi Sreenivas <bharathi.sreenivas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      45bb006d
    • Michal Swiatkowski's avatar
      ice: use bitmap_free instead of devm_kfree · 59ac3255
      Michal Swiatkowski authored
      pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be
      used to free this memory.
      
      Fixes: 78b5713a ("ice: Alloc queue management bitmaps and arrays dynamically")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      59ac3255
    • Przemyslaw Patynowski's avatar
      ice: Fix DMA mappings leak · 7e753eb6
      Przemyslaw Patynowski authored
      Fix leak, when user changes ring parameters.
      During reallocation of RX buffers, new DMA mappings are created for
      those buffers. New buffers with different RX ring count should
      substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq
      and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused
      leak of already mapped DMA.
      Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate
      back to rx_buf, when BPF program unloads.
      If BPF program is loaded/unloaded and XSK pools are created, reallocate
      RX queues accordingly in XDP_SETUP_XSK_POOL handler.
      
      Steps for reproduction:
      while :
      do
      	for ((i=0; i<=8160; i=i+32))
      	do
      		ethtool -G enp130s0f0 rx $i tx $i
      		sleep 0.5
      		ethtool -g enp130s0f0
      	done
      done
      
      Fixes: 617f3e1b ("ice: xsk: allocate separate memory for XDP SW ring")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7e753eb6