1. 04 Aug, 2020 30 commits
    • Edward Cree's avatar
      sfc_ef100: read Design Parameters at probe time · adcfc348
      Edward Cree authored
      Several parts of the EF100 architecture are parameterised (to allow
       varying capabilities on FPGAs according to resource constraints), and
       these parameters are exposed to the driver through a TLV-encoded
       region of the BAR.
      For the most part we either don't care about these values at all or
       just need to sanity-check them against the driver's assumptions, but
       there are a number of TSO limits which we record so that we will be
       able to check against them in the TX path when handling GSO skbs.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adcfc348
    • Edward Cree's avatar
      sfc_ef100: fail the probe if NIC uses unsol_ev credits · 4496363b
      Edward Cree authored
      In the future, EF100 is planned to have a credit-based scheme for
       handling unsolicited events, which drivers will need to use in order
       to function correctly.  However, current EF100 hardware does not yet
       generate unsolicited events and the credit scheme has not yet been
       implemented in firmware.  To prevent compatibility problems later if
       the current driver is used with future firmware which does implement
       it, we check for the corresponding capability flag (which that
       future firmware will set), and if found, we refuse to probe.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4496363b
    • Edward Cree's avatar
      sfc_ef100: check firmware version at start-of-day · 8e737145
      Edward Cree authored
      Early in EF100 development there was a different format of event
       descriptor; if the NIC is somehow running the very old firmware
       which will use that format, fail the probe.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e737145
    • Jiafei Pan's avatar
      enetc: use napi_schedule to be compatible with PREEMPT_RT · 215602a8
      Jiafei Pan authored
      The driver calls napi_schedule_irqoff() from a context where, in RT,
      hardirqs are not disabled, since the IRQ handler is force-threaded.
      
      In the call path of this function, __raise_softirq_irqoff() is modifying
      its per-CPU mask of pending softirqs that must be processed, using
      or_softirq_pending(). The or_softirq_pending() function is not atomic,
      but since interrupts are supposed to be disabled, nobody should be
      preempting it, and the operation should be safe.
      
      Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case,
      it isn't safe, and the pending softirqs mask can get corrupted,
      resulting in softirqs being lost and never processed.
      
      To have common code that works with PREEMPT_RT and with mainline Linux,
      we can use plain napi_schedule() instead. The difference is that
      napi_schedule() (via __napi_schedule) also calls local_irq_save, which
      disables hardirqs if they aren't already. But, since they already are
      disabled in non-RT, this means that in practice we don't see any
      measurable difference in throughput or latency with this patch.
      Signed-off-by: default avatarJiafei Pan <Jiafei.Pan@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      215602a8
    • Jiafei Pan's avatar
      dpaa2-eth: use napi_schedule to be compatible with PREEMPT_RT · 6c33ae1a
      Jiafei Pan authored
      The driver calls napi_schedule_irqoff() from a context where, in RT,
      hardirqs are not disabled, since the IRQ handler is force-threaded.
      
      In the call path of this function, __raise_softirq_irqoff() is modifying
      its per-CPU mask of pending softirqs that must be processed, using
      or_softirq_pending(). The or_softirq_pending() function is not atomic,
      but since interrupts are supposed to be disabled, nobody should be
      preempting it, and the operation should be safe.
      
      Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case,
      it isn't safe, and the pending softirqs mask can get corrupted,
      resulting in softirqs being lost and never processed.
      
      To have common code that works with PREEMPT_RT and with mainline Linux,
      we can use plain napi_schedule() instead. The difference is that
      napi_schedule() (via __napi_schedule) also calls local_irq_save, which
      disables hardirqs if they aren't already. But, since they already are
      disabled in non-RT, this means that in practice we don't see any
      measurable difference in throughput or latency with this patch.
      Signed-off-by: default avatarJiafei Pan <Jiafei.Pan@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c33ae1a
    • David S. Miller's avatar
      Merge branch 'net-dsa-loop-Preparatory-changes-for-802-1Q-data-path' · d8f375ea
      David S. Miller authored
      net: dsa: loop: Preparatory changes for 802.1Q data path
      Florian Fainelli says:
      
      ====================
      These patches are all meant to help pave the way for a 802.1Q data path
      added to the mockup driver, making it more useful than just testing for
      configuration. Sending those out now since there is no real need to
      wait.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8f375ea
    • Florian Fainelli's avatar
      net: dsa: loop: Set correct number of ports · 947b6ef9
      Florian Fainelli authored
      We only support DSA_LOOP_NUM_PORTS in the switch, do not tell the DSA
      core to allocate up to DSA_MAX_PORTS which is nearly the double (6 vs.
      11).
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      947b6ef9
    • Florian Fainelli's avatar
      net: dsa: loop: Wire-up MTU callbacks · c99194ed
      Florian Fainelli authored
      For now we simply store the port MTU into a per-port member.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c99194ed
    • Florian Fainelli's avatar
      net: dsa: loop: Move data structures to header · 6c84a589
      Florian Fainelli authored
      In preparation for adding support for a mockup data path, move the
      driver data structures to include/linux/dsa/loop.h such that we can
      share them between net/dsa/ and drivers/net/dsa/ later on.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c84a589
    • Florian Fainelli's avatar
      net: dsa: loop: Support 4K VLANs · 916a8d16
      Florian Fainelli authored
      Allocate a 4K array of VLANs instead of limiting ourselves to just 5
      which is arbitrary.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      916a8d16
    • Florian Fainelli's avatar
      net: dsa: loop: PVID should be per-port · 81d4e8e0
      Florian Fainelli authored
      The PVID should be per-port, this is a preliminary change to support a
      802.1Q data path in the driver.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d4e8e0
    • Rahul Lakkireddy's avatar
      cxgb4: add TC-MATCHALL IPv6 support · 59b328cf
      Rahul Lakkireddy authored
      Matching IPv6 traffic require allocating their own individual slots
      in TCAM. So, fetch additional slots to insert IPv6 rules. Also, fetch
      the cumulative stats of all the slots occupied by the Matchall rule.
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59b328cf
    • Vladimir Oltean's avatar
      net: dsa: sja1105: poll for extts events from a timer · af9fdd2b
      Vladimir Oltean authored
      The current poll interval is enough to ensure that rising and falling
      edge events are not lost for a 1 PPS signal with 50% duty cycle.
      
      But when we deliver the events to user space, it will try to infer if
      they were corresponding to a rising or to a falling edge (the kernel
      driver doesn't know that either). User space will try to make that
      inference based on the time at which the PPS master had emitted the
      pulse (i.e. if it's a .0 time, it's rising edge, if it's .5 time, it's
      falling edge).
      
      But there is no in-kernel API for retrieving the precise timestamp
      corresponding to a PPS master (aka perout) pulse. So user space has to
      guess even that. It will read the PTP time on the PPS master right after
      we've delivered the extts event, and declare that the PPS master time
      was just the closest integer second, based on 2 thresholds (lower than
      .25, or higher than .75, and ignore anything else).
      
      Except that, if we poll for extts events (and our hardware doesn't
      really help us, by not providing an interrupt), then there is a risk
      that the poll period (and therefore the time at which the event is
      delivered) might confuse user space.
      
      Because we are always scheduling the next extts poll at
      SJA1105_EXTTS_INTERVAL "from now" (that's the only thing that the
      schedule_delayed_work() API gives us), it means that the start time of
      the next delayed workqueue will always be shifted to the right a little
      bit (shifted with the SPI access duration of this workqueue run).
      In turn, because user space sees extts events that are non-periodic
      compared to the PPS master's time, this means that it might start making
      wrong guesses about rising/falling edge.
      
      To understand the effect, here is the output of ts2phc currently. Notice
      the 'src' timestamps of the 'SKIP extts' events, and how they have a
      large wander. They keep increasing until the upper limit for the ignore
      threshold (.75 seconds), after which the application starts ignoring the
      _other_ edge.
      
      ts2phc[26.624]: /dev/ptp3 SKIP extts index 0 at 21.449898912 src 21.657784518
      ts2phc[27.133]: adding tstamp 21.949894240 to clock /dev/ptp3
      ts2phc[27.133]: adding tstamp 22.000000000 to clock /dev/ptp1
      ts2phc[27.133]: /dev/ptp3 offset        640 s2 freq   +5112
      ts2phc[27.636]: /dev/ptp3 SKIP extts index 0 at 22.449889360 src 22.669398022
      ts2phc[28.140]: adding tstamp 22.949884376 to clock /dev/ptp3
      ts2phc[28.140]: adding tstamp 23.000000000 to clock /dev/ptp1
      ts2phc[28.140]: /dev/ptp3 offset         96 s2 freq   +4760
      ts2phc[28.644]: /dev/ptp3 SKIP extts index 0 at 23.449879504 src 23.677420422
      ts2phc[29.153]: adding tstamp 23.949874704 to clock /dev/ptp3
      ts2phc[29.153]: adding tstamp 24.000000000 to clock /dev/ptp1
      ts2phc[29.153]: /dev/ptp3 offset       -264 s2 freq   +4429
      ts2phc[29.656]: /dev/ptp3 SKIP extts index 0 at 24.449870008 src 24.689407238
      ts2phc[30.160]: adding tstamp 24.949865376 to clock /dev/ptp3
      ts2phc[30.160]: adding tstamp 25.000000000 to clock /dev/ptp1
      ts2phc[30.160]: /dev/ptp3 offset       -280 s2 freq   +4334
      ts2phc[30.664]: /dev/ptp3 SKIP extts index 0 at 25.449860760 src 25.697449926
      ts2phc[31.168]: adding tstamp 25.949856176 to clock /dev/ptp3
      ts2phc[31.168]: adding tstamp 26.000000000 to clock /dev/ptp1
      ts2phc[31.168]: /dev/ptp3 offset       -176 s2 freq   +4354
      ts2phc[31.672]: /dev/ptp3 SKIP extts index 0 at 26.449851584 src 26.705433606
      ts2phc[32.180]: adding tstamp 26.949846992 to clock /dev/ptp3
      ts2phc[32.180]: adding tstamp 27.000000000 to clock /dev/ptp1
      ts2phc[32.180]: /dev/ptp3 offset        -80 s2 freq   +4397
      ts2phc[32.684]: /dev/ptp3 SKIP extts index 0 at 27.449842384 src 27.717415110
      ts2phc[33.192]: adding tstamp 27.949837768 to clock /dev/ptp3
      ts2phc[33.192]: adding tstamp 28.000000000 to clock /dev/ptp1
      ts2phc[33.192]: /dev/ptp3 offset          0 s2 freq   +4453
      ts2phc[33.696]: /dev/ptp3 SKIP extts index 0 at 28.449833128 src 28.729412902
      ts2phc[34.200]: adding tstamp 28.949828472 to clock /dev/ptp3
      ts2phc[34.200]: adding tstamp 29.000000000 to clock /dev/ptp1
      ts2phc[34.200]: /dev/ptp3 offset          8 s2 freq   +4461
      ts2phc[34.704]: /dev/ptp3 SKIP extts index 0 at 29.449823816 src 29.737416038
      ts2phc[35.208]: adding tstamp 29.949819152 to clock /dev/ptp3
      ts2phc[35.208]: adding tstamp 30.000000000 to clock /dev/ptp1
      ts2phc[35.208]: /dev/ptp3 offset         -8 s2 freq   +4447
      ts2phc[35.712]: /dev/ptp3 SKIP extts index 0 at 30.449814496 src 30.745554982
      ts2phc[36.216]: adding tstamp 30.949809840 to clock /dev/ptp3
      ts2phc[36.216]: adding tstamp 31.000000000 to clock /dev/ptp1
      ts2phc[36.216]: /dev/ptp3 offset         -8 s2 freq   +4445
      ts2phc[36.468]: /dev/ptp3 SKIP extts index 0 at 31.449805184 src 31.501109446
      ts2phc[36.972]: adding tstamp 31.949800536 to clock /dev/ptp3
      ts2phc[36.972]: adding tstamp 32.000000000 to clock /dev/ptp1
      ts2phc[36.972]: /dev/ptp3 offset         -8 s2 freq   +4442
      ts2phc[37.480]: /dev/ptp3 SKIP extts index 0 at 32.449795896 src 32.513320070
      ts2phc[37.984]: adding tstamp 32.949791248 to clock /dev/ptp3
      ts2phc[37.984]: adding tstamp 33.000000000 to clock /dev/ptp1
      ts2phc[37.984]: /dev/ptp3 offset          0 s2 freq   +4448
      
      Fix that by taking the following measures:
      - Schedule the poll from a timer. Because we are really scheduling the
        timer periodically, the extts events delivered to user space are
        periodic too, and don't suffer from the "shift-to-the-right" effect.
      - Increase the poll period to 6 times a second. This imposes a smaller
        upper bound to the shift that can occur to the delivery time of extts
        events, and makes user space (ts2phc) to always interpret correctly
        which events should be skipped and which shouldn't.
      - Move the SPI readout itself to the main PTP kernel thread, instead of
        the generic workqueue. This is because the timer runs in atomic
        context, but is also better than before, because if needed, we can
        chrt & taskset this kernel thread, to ensure it gets enough priority
        under load.
      
      After this patch, one can notice that the wander is greatly reduced, and
      that the latencies of one extts poll are not propagated to the next. The
      'src' timestamp that is skipped is never larger than .65 seconds (which
      means .15 seconds larger than the time at which the real event occurred
      at, and .10 seconds smaller than the .75 upper threshold for ignoring
      the falling edge):
      
      ts2phc[40.076]: adding tstamp 34.949261296 to clock /dev/ptp3
      ts2phc[40.076]: adding tstamp 35.000000000 to clock /dev/ptp1
      ts2phc[40.076]: /dev/ptp3 offset         48 s2 freq   +4631
      ts2phc[40.568]: /dev/ptp3 SKIP extts index 0 at 35.449256496 src 35.595791078
      ts2phc[41.064]: adding tstamp 35.949251744 to clock /dev/ptp3
      ts2phc[41.064]: adding tstamp 36.000000000 to clock /dev/ptp1
      ts2phc[41.064]: /dev/ptp3 offset       -224 s2 freq   +4374
      ts2phc[41.552]: /dev/ptp3 SKIP extts index 0 at 36.449247088 src 36.579825574
      ts2phc[42.044]: adding tstamp 36.949242456 to clock /dev/ptp3
      ts2phc[42.044]: adding tstamp 37.000000000 to clock /dev/ptp1
      ts2phc[42.044]: /dev/ptp3 offset       -240 s2 freq   +4290
      ts2phc[42.536]: /dev/ptp3 SKIP extts index 0 at 37.449237848 src 37.563828774
      ts2phc[43.028]: adding tstamp 37.949233264 to clock /dev/ptp3
      ts2phc[43.028]: adding tstamp 38.000000000 to clock /dev/ptp1
      ts2phc[43.028]: /dev/ptp3 offset       -144 s2 freq   +4314
      ts2phc[43.520]: /dev/ptp3 SKIP extts index 0 at 38.449228656 src 38.547823238
      ts2phc[44.012]: adding tstamp 38.949224048 to clock /dev/ptp3
      ts2phc[44.012]: adding tstamp 39.000000000 to clock /dev/ptp1
      ts2phc[44.012]: /dev/ptp3 offset        -80 s2 freq   +4335
      ts2phc[44.508]: /dev/ptp3 SKIP extts index 0 at 39.449219432 src 39.535846118
      ts2phc[44.996]: adding tstamp 39.949214816 to clock /dev/ptp3
      ts2phc[44.996]: adding tstamp 40.000000000 to clock /dev/ptp1
      ts2phc[44.996]: /dev/ptp3 offset        -32 s2 freq   +4359
      ts2phc[45.488]: /dev/ptp3 SKIP extts index 0 at 40.449210192 src 40.515824678
      ts2phc[45.980]: adding tstamp 40.949205568 to clock /dev/ptp3
      ts2phc[45.980]: adding tstamp 41.000000000 to clock /dev/ptp1
      ts2phc[45.980]: /dev/ptp3 offset          8 s2 freq   +4390
      ts2phc[46.636]: /dev/ptp3 SKIP extts index 0 at 41.449200928 src 41.664176902
      ts2phc[47.132]: adding tstamp 41.949196288 to clock /dev/ptp3
      ts2phc[47.132]: adding tstamp 42.000000000 to clock /dev/ptp1
      ts2phc[47.132]: /dev/ptp3 offset          0 s2 freq   +4384
      ts2phc[47.620]: /dev/ptp3 SKIP extts index 0 at 42.449191656 src 42.648117190
      ts2phc[48.112]: adding tstamp 42.949187016 to clock /dev/ptp3
      ts2phc[48.112]: adding tstamp 43.000000000 to clock /dev/ptp1
      ts2phc[48.112]: /dev/ptp3 offset          0 s2 freq   +4384
      ts2phc[48.604]: /dev/ptp3 SKIP extts index 0 at 43.449182384 src 43.632112582
      ts2phc[49.100]: adding tstamp 43.949177736 to clock /dev/ptp3
      ts2phc[49.100]: adding tstamp 44.000000000 to clock /dev/ptp1
      ts2phc[49.100]: /dev/ptp3 offset         -8 s2 freq   +4376
      ts2phc[49.588]: /dev/ptp3 SKIP extts index 0 at 44.449173096 src 44.616136774
      ts2phc[50.080]: adding tstamp 44.949168464 to clock /dev/ptp3
      ts2phc[50.080]: adding tstamp 45.000000000 to clock /dev/ptp1
      ts2phc[50.080]: /dev/ptp3 offset          8 s2 freq   +4390
      ts2phc[50.572]: /dev/ptp3 SKIP extts index 0 at 45.449163816 src 45.600134662
      ts2phc[51.064]: adding tstamp 45.949159160 to clock /dev/ptp3
      ts2phc[51.064]: adding tstamp 46.000000000 to clock /dev/ptp1
      ts2phc[51.064]: /dev/ptp3 offset         -8 s2 freq   +4376
      ts2phc[51.556]: /dev/ptp3 SKIP extts index 0 at 46.449154528 src 46.584588550
      ts2phc[52.048]: adding tstamp 46.949149896 to clock /dev/ptp3
      ts2phc[52.048]: adding tstamp 47.000000000 to clock /dev/ptp1
      ts2phc[52.048]: /dev/ptp3 offset          0 s2 freq   +4382
      ts2phc[52.540]: /dev/ptp3 SKIP extts index 0 at 47.449145256 src 47.568132198
      ts2phc[53.032]: adding tstamp 47.949140616 to clock /dev/ptp3
      ts2phc[53.032]: adding tstamp 48.000000000 to clock /dev/ptp1
      ts2phc[53.032]: /dev/ptp3 offset          0 s2 freq   +4382
      ts2phc[53.524]: /dev/ptp3 SKIP extts index 0 at 48.449135968 src 48.552121446
      ts2phc[54.016]: adding tstamp 48.949131320 to clock /dev/ptp3
      ts2phc[54.016]: adding tstamp 49.000000000 to clock /dev/ptp1
      ts2phc[54.016]: /dev/ptp3 offset          0 s2 freq   +4382
      ts2phc[54.512]: /dev/ptp3 SKIP extts index 0 at 49.449126680 src 49.540147014
      ts2phc[55.000]: adding tstamp 49.949122040 to clock /dev/ptp3
      ts2phc[55.000]: adding tstamp 50.000000000 to clock /dev/ptp1
      ts2phc[55.000]: /dev/ptp3 offset          0 s2 freq   +4382
      ts2phc[55.492]: /dev/ptp3 SKIP extts index 0 at 50.449117400 src 50.520119078
      ts2phc[55.988]: adding tstamp 50.949112768 to clock /dev/ptp3
      ts2phc[55.988]: adding tstamp 51.000000000 to clock /dev/ptp1
      ts2phc[55.988]: /dev/ptp3 offset          8 s2 freq   +4390
      ts2phc[56.476]: /dev/ptp3 SKIP extts index 0 at 51.449108120 src 51.504175910
      ts2phc[57.132]: adding tstamp 51.949103480 to clock /dev/ptp3
      ts2phc[57.132]: adding tstamp 52.000000000 to clock /dev/ptp1
      ts2phc[57.132]: /dev/ptp3 offset          0 s2 freq   +4384
      ts2phc[57.624]: /dev/ptp3 SKIP extts index 0 at 52.449098840 src 52.651833574
      ts2phc[58.116]: adding tstamp 52.949094200 to clock /dev/ptp3
      ts2phc[58.116]: adding tstamp 53.000000000 to clock /dev/ptp1
      ts2phc[58.116]: /dev/ptp3 offset          8 s2 freq   +4392
      ts2phc[58.612]: /dev/ptp3 SKIP extts index 0 at 53.449089560 src 53.639826918
      ts2phc[59.100]: adding tstamp 53.949084920 to clock /dev/ptp3
      ts2phc[59.100]: adding tstamp 54.000000000 to clock /dev/ptp1
      ts2phc[59.100]: /dev/ptp3 offset          8 s2 freq   +4394
      ts2phc[59.592]: /dev/ptp3 SKIP extts index 0 at 54.449080272 src 54.619842278
      ts2phc[60.084]: adding tstamp 54.949075624 to clock /dev/ptp3
      ts2phc[60.084]: adding tstamp 55.000000000 to clock /dev/ptp1
      ts2phc[60.084]: /dev/ptp3 offset          8 s2 freq   +4397
      ts2phc[60.576]: /dev/ptp3 SKIP extts index 0 at 55.449070968 src 55.603885542
      ts2phc[61.068]: adding tstamp 55.949066312 to clock /dev/ptp3
      ts2phc[61.068]: adding tstamp 56.000000000 to clock /dev/ptp1
      ts2phc[61.068]: /dev/ptp3 offset          0 s2 freq   +4391
      ts2phc[61.560]: /dev/ptp3 SKIP extts index 0 at 56.449061680 src 56.587885798
      ts2phc[62.052]: adding tstamp 56.949057032 to clock /dev/ptp3
      ts2phc[62.052]: adding tstamp 57.000000000 to clock /dev/ptp1
      ts2phc[62.052]: /dev/ptp3 offset         -8 s2 freq   +4383
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af9fdd2b
    • Paolo Abeni's avatar
      mptcp: fix bogus sendmsg() return code under pressure · 8555c6bf
      Paolo Abeni authored
      In case of memory pressure, mptcp_sendmsg() may call
      sk_stream_wait_memory() after succesfully xmitting some
      bytes. If the latter fails we currently return to the
      user-space the error code, ignoring the succeful xmit.
      
      Address the issue always checking for the xmitted bytes
      before mptcp_sendmsg() completes.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8555c6bf
    • David S. Miller's avatar
      Merge branch 'mlxsw-Add-support-for-buffer-drop-traps' · f8deaea0
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add support for buffer drop traps
      
      Petr says:
      
      A recent patch set added the ability to mirror buffer related drops
      (e.g., early drops) through a netdev. This patch set adds the ability to
      trap such packets to the local CPU for analysis.
      
      The trapping towards the CPU is configured by using tc-trap action
      instead of tc-mirred as was done when the packets were mirrored through
      a netdev. A future patch set will also add the ability to sample the
      dropped packets using tc-sample action.
      
      The buffer related drop traps are added to devlink, which means that the
      dropped packets can be reported to user space via the kernel's
      drop_monitor module.
      
      Patch set overview:
      
      Patch #1 adds the early_drop trap to devlink
      
      Patch #2 adds extack to a few devlink operations to facilitate better
      error reporting to user space. This is necessary - among other things -
      because the action of buffer drop traps cannot be changed in mlxsw
      
      Patch #3 performs a small refactoring in mlxsw, patch #4 fixes a bug that
      this patchset would trigger.
      
      Patches #5-#6 add the infrastructure required to support different traps
      / trap groups in mlxsw per-ASIC. This is required because buffer drop
      traps are not supported by Spectrum-1
      
      Patch #7 extends mlxsw to register the early_drop trap
      
      Patch #8 adds the offload logic for the "trap" action at a qevent block.
      
      Patch #9 adds a mlxsw-specific selftest.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8deaea0
    • Petr Machata's avatar
      selftests: mlxsw: RED: Test offload of trapping on RED qevents · 8fb6ac45
      Petr Machata authored
      Add a selftest for RED early_drop and mark qevents when a trap action is
      attached at the associated block.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fb6ac45
    • Petr Machata's avatar
      mlxsw: spectrum_qdisc: Offload action trap for qevents · 54a92385
      Petr Machata authored
      When offloading action trap on a qevent, pass to_dev of NULL to the SPAN
      module to trigger the mirror to the CPU port. Query the buffer drops
      policer and use it for policing of the trapped traffic.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54a92385
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Add early_drop trap · 6687e953
      Ido Schimmel authored
      As previously explained, packets that are dropped due to buffer related
      reasons (e.g., tail drop, early drop) can be mirrored to the CPU port.
      These packets are then trapped with one of the "mirror session" traps
      and their CQE includes the reason for which the packet was mirrored.
      
      Register with devlink a new trap, early_drop, and initialize the
      corresponding Rx listener with the appropriate mirror reason. Return an
      error in case user tries to change the traps' action, as this is not
      supported.
      
      Since Spectrum-1 does not support these traps, the above is only done
      for Spectrum-2 onwards.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6687e953
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Allow for per-ASIC traps initialization · 869c7be9
      Ido Schimmel authored
      Subsequent patches will need to register different traps for Spectrum-1
      and Spectrum-2 onwards.
      
      Enable that by invoking a per-ASIC operation during traps
      initialization.
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      869c7be9
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Allow for per-ASIC trap groups initialization · 36d1fd68
      Ido Schimmel authored
      Subsequent patches will need to register different trap groups for
      Spectrum-1 and Spectrum-2 onwards.
      
      Enable that by invoking a per-ASIC operation during trap groups
      initialization.
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36d1fd68
    • Petr Machata's avatar
      mlxsw: spectrum_span: On policer_id_base_ref_count, use dec_and_test · 928345c0
      Petr Machata authored
      When unsetting policer base, the SPAN code currently uses refcount_dec().
      However that function splats when the counter reaches zero, because
      reaching zero without actually testing is in general indicative of a
      missing cleanup. There is no cleanup to be done here, but nonetheless, use
      refcount_dec_and_test() as required.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      928345c0
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Use 'size_t' for array sizes · 76ba292c
      Ido Schimmel authored
      Use 'size_t' instead of 'u64' for array sizes, as this this is correct
      type to use for expressions involving sizeof().
      Suggested-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76ba292c
    • Ido Schimmel's avatar
      devlink: Pass extack when setting trap's action and group's parameters · c88e11e0
      Ido Schimmel authored
      A later patch will refuse to set the action of certain traps in mlxsw
      and also to change the policer binding of certain groups. Pass extack so
      that failure could be communicated clearly to user space.
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88e11e0
    • Amit Cohen's avatar
      devlink: Add early_drop trap · 08e335f6
      Amit Cohen authored
      Add the packet trap that can report packets that were ECN marked due to RED
      AQM.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08e335f6
    • YueHaibing's avatar
      fib: Fix undef compile warning · 80fbbb16
      YueHaibing authored
      net/core/fib_rules.c:26:7: warning: "CONFIG_IP_MULTIPLE_TABLES" is not defined, evaluates to 0 [-Wundef]
       #elif CONFIG_IP_MULTIPLE_TABLES
             ^~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: 8b66a6fd ("fib: fix another fib_rules_ops indirect call wrapper problem")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-By: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80fbbb16
    • Geliang Tang's avatar
      mptcp: use mptcp_for_each_subflow in mptcp_stream_accept · 190f8b06
      Geliang Tang authored
      Use mptcp_for_each_subflow in mptcp_stream_accept instead of
      open-coding.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      190f8b06
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2020-08-03' of... · ee494f42
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      A few more changes, notably:
       * handle new SAE (WPA3 authentication) status codes in the correct way
       * fix a while that should be an if instead, avoiding infinite loops
       * handle beacon filtering changing better
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee494f42
    • Jisheng Zhang's avatar
      net: stmmac: fix failed to suspend if phy based WOL is enabled · 01f4d47a
      Jisheng Zhang authored
      With the latest net-next tree, if test suspend/resume after enabling
      WOL, we get error as below:
      
      [  487.086365] dpm_run_callback(): mdio_bus_suspend+0x0/0x30 returns -16
      [  487.086375] PM: Device stmmac-0:00 failed to suspend: error -16
      
      -16 means -EBUSY, this is because I didn't enable wakeup of the correct
      device when implementing phy based WOL feature. To be honest, I caught
      the issue when implementing phy based WOL and then fix it locally, but
      forgot to amend the phy based wol patch. Today, I found the issue by
      testing net-next tree.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01f4d47a
    • Ioana-Ruxandra Stăncioi's avatar
      seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header · 88fab21c
      Ioana-Ruxandra Stăncioi authored
      Refactor the function seg6_lwt_headroom out of the seg6_iptunnel.h uapi
      header, because it is only used in seg6_iptunnel.c. Moreover, it is only
      used in the kernel code, as indicated by the "#ifdef __KERNEL__".
      Suggested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarIoana-Ruxandra Stăncioi <stancioi@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88fab21c
    • Jianfeng Wang's avatar
      tcp: apply a floor of 1 for RTT samples from TCP timestamps · 730e700e
      Jianfeng Wang authored
      For retransmitted packets, TCP needs to resort to using TCP timestamps
      for computing RTT samples. In the common case where the data and ACK
      fall in the same 1-millisecond interval, TCP senders with millisecond-
      granularity TCP timestamps compute a ca_rtt_us of 0. This ca_rtt_us
      of 0 propagates to rs->rtt_us.
      
      This value of 0 can cause performance problems for congestion control
      modules. For example, in BBR, the zero min_rtt sample can bring the
      min_rtt and BDP estimate down to 0, reduce snd_cwnd and result in a
      low throughput. It would be hard to mitigate this with filtering in
      the congestion control module, because the proper floor to apply would
      depend on the method of RTT sampling (using timestamp options or
      internally-saved transmission timestamps).
      
      This fix applies a floor of 1 for the RTT sample delta from TCP
      timestamps, so that seq_rtt_us, ca_rtt_us, and rs->rtt_us will be at
      least 1 * (USEC_PER_SEC / TCP_TS_HZ).
      
      Note that the receiver RTT computation in tcp_rcv_rtt_measure() and
      min_rtt computation in tcp_update_rtt_min() both already apply a floor
      of 1 timestamp tick, so this commit makes the code more consistent in
      avoiding this edge case of a value of 0.
      Signed-off-by: default avatarJianfeng Wang <jfwang@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKevin Yang <yyd@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      730e700e
  2. 03 Aug, 2020 10 commits