1. 17 Jan, 2018 31 commits
    • Michael Chan's avatar
      bnxt_en: Expand bnxt_check_rings() to check all resources. · 8f23d638
      Michael Chan authored
      bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
      to see if there are enough resources to support the new configuration.
      Expand the call to test all resources if the firmware supports the new
      API.  With the more flexible resource allocation scheme, this call must
      be made to check that all resources are available before committing to
      allocate the resources.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f23d638
    • Michael Chan's avatar
      bnxt_en: Implement new method for the PF to assign SRIOV resources. · 4673d664
      Michael Chan authored
      Instead of the old method of evenly dividing the resources to the VFs,
      use the new firmware API to specify min and max resources for each VF.
      This way, there is more flexibility for each VF to allocate more or less
      resources.
      
      The min is the absolute minimum for each VF to function.  The max is the
      global resources minus the resources used by the PF.  Each VF is
      guaranteed the min.  Up to max resources may be available for some VFs.
      
      The PF driver can use one of 2 strategies specified in NVRAM to assign
      the resources.  The old legacy strategy of evenly dividing the resources
      or the new flexible strategy.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4673d664
    • Michael Chan's avatar
      bnxt_en: Reserve resources for RFS. · 6a1eef5b
      Michael Chan authored
      In bnxt_rfs_capable(), add call to reserve vnic resources to support
      NTUPLE.  Return true if we can successfully reserve enough vnics.
      Otherwise, reserve the minimum 1 VNIC for normal operations not
      supporting NTUPLE and return false.
      
      Also, suppress warning message about not enough resources for NTUPLE when
      only 1 RX ring is in use.  NTUPLE filters by definition require multiple
      RX rings.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a1eef5b
    • Michael Chan's avatar
      bnxt_en: Implement new method to reserve rings. · 674f50a5
      Michael Chan authored
      The new method will call firmware to reserve the desired tx, rx, cmpl
      rings, ring groups, stats context, and vnic resources.  A second query
      call will check the actual resources that firmware is able to reserve.
      The driver will then trim and adjust based on the actual resources
      provided by firmware.  The driver will then reserve the final resources
      in use.
      
      This method is a more flexible way of using hardware resources.  The
      resources are not fixed and can by adjusted by firmware.  The driver
      adapts to the available resources that the firmware can reserve for
      the driver.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      674f50a5
    • Michael Chan's avatar
      bnxt_en: Set initial default RX and TX ring numbers the same in combined mode. · 58ea801a
      Michael Chan authored
      In combined mode, the driver is currently not setting RX and TX ring
      numbers the same when firmware can allocate more RX than TX or vice versa.
      This will confuse the user as the ethtool convention assumes they are the
      same in combined mode.  Fix it by adding bnxt_trim_dflt_sh_rings() to trim
      RX and TX ring numbers to be the same as the completion ring number in
      combined mode.
      
      Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
      will not be the same as RX rings in combined mode.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58ea801a
    • Michael Chan's avatar
      bnxt_en: Add the new firmware API to query hardware resources. · be0dd9c4
      Michael Chan authored
      The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
      resources.  Use the new API when it is supported by firmware.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be0dd9c4
    • Michael Chan's avatar
      bnxt_en: Refactor hardware resource data structures. · 6a4f2947
      Michael Chan authored
      In preparation for new firmware APIs to allocate hardware resources,
      add a new struct bnxt_hw_resc to hold various min, max and reserved
      resources.  This new structure is common for PFs and VFs.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a4f2947
    • Michael Chan's avatar
      bnxt_en: Restore MSIX after disabling SRIOV. · 80fcaf46
      Michael Chan authored
      After SRIOV has been enabled and disabled, the MSIX vectors assigned to
      the VFs have to be re-initialized.  Otherwise they cannot be re-used by
      the PF.  For example, increasing the number of PF rings after disabling
      SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
      
      To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
      NIC, clear and re-init MSIX, and re-open the NIC.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80fcaf46
    • Michael Chan's avatar
      bnxt_en: Refactor bnxt_close_nic(). · 86e953db
      Michael Chan authored
      Add a new __bnxt_close_nic() function to do all the work previously done
      in bnxt_close_nic() except waiting for SRIOV configuration.  The new
      function will be used in the next patch as part of SRIOV cleanup.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86e953db
    • Michael Chan's avatar
      bnxt_en: Update firmware interface to 1.9.0. · 894aa69a
      Michael Chan authored
      The version has new firmware APIs to allocate PF/VF resources more
      flexibly.
      
      New toolchains were used to generate this file, resulting in a one-time
      large diffstat.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      894aa69a
    • David S. Miller's avatar
      Merge branch 'dwmac-meson8b-clock-fixes-for-Meson8b' · ee81098e
      David S. Miller authored
      Martin Blumenstingl says:
      
      ====================
      dwmac-meson8b: clock fixes for Meson8b
      
      this series is now successfully tested, thus we think it's ready to be
      applied to your net-next tree.
      
      Emiliano reported [0] that he couldn't get dwmac-meson8b to work on his
      Odroid-C1. This is the (hopefully) final version of this series, which
      was successfully tested.
      
      Due to the fact that the public S805/S905/S912 datasheets all seem to
      be outdated regarding the description of the PRG_ETH0 (also called
      PRG_ETHERNET_ADDR0) register Linus Lüssing offered to help testing with
      an oscilloscope and an Odroid-C1. I would like to say HUGE thanks to him
      at this point as he spent hours figuring out the effects of the bits
      that are (though to be) relevant to get Ethernet working on the
      Odroid-C1.
      We tested three scenarios, all based on version 3 of this series:
      1) MPLL2 at ~500MHz, m250_div set to 1, bit 10 enabled
      this resulted in a clock rate twice as high as expected at the RGMII TX
      clock pin (250MHz instead of 125MHz for Gbit connections and 50MHz
      instead of 25MHz for 100Mbit/s connections). it did not change the
      rate at the XTAL_IN pin of PHY (which stayed consistenly at 25MHz)
      2) MPLL2 at ~250MHz, m250_div set to 1, bit 10 disabled
      the oscilloscope shows "no clock" for the RGMII TX clock pin at it's
      highest resolution (and random rates at lower resolutions). XTAL_IN is
      still at 25MHz
      3) MPLL2 at ~250MHz, m250_div set to 1, bit 10 enabled
      this resulted in a 125MHz signal at the RGMII TX clock pin for Gbit
      speeds and 25MHz for 100Mbit/s - both values are as expected. The rate
      on the XTAL_IN pin was at 25MHz
      -> boot-logs (with the PRG_ETH0 register value) and screenshots from the
      readings of the oscilloscope can be found at:
      https://metameute.de/~tux/linux/amlogic/odroidc1/ethernet/
      
      Version 4 of this series is based on the results from Linus Lüssing's
      help with the oscilloscope and Odroid-C1.
      Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
      only partially test this. @Emiliano: Could you please give this version
      a try and let me know about the results (preferably with a "Tested-by"
      if it works)?
      You obviously still need your two "ARM: dts: meson8b" patches which
      - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
      - enable Ethernet on the Odroid-C1 (according to your last thest a TX
        delay of 4ns is required to make it work properly)
      
      When testing on Meson8b this also needs a fix for the MPLL clock driver:
      "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
      https://patchwork.kernel.org/patch/10131677/
      
      I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
      and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
      fine (so let's hope that this also fixes your Meson8b issue :)).
      
      changes since v4 at [4]:
      - dropped "RFT" status since Jerome tested this series successfully!
      - dropped PATCH #2 ("simplify generating the clock names"). I will
        improve the whole clock registration in a separate series. since that
        patch didn't really improve anything I dropped it for now
      - added Jerome's Acked-/Reviewed-/Tested-by's - many thanks!
      
      changes since v3 at [3]:
      - renamed the function PATCH #1 from meson8b_init_rgmii_clk to
        meson8b_init_rgmii_tx_clk since we now know what the register bits
        mean
      - rewrote PATCH #3 because bit 10 is a gate clock and it seems that
        there is an internal fixed divide-by-2 clock. see the patch
        description for a detailed explanation
      - updated the description of PATCH #4 and #5 as the clock we're trying
        to fix is the "RGMII TX" clock (old version stated that this is the
        "RGMII clock" or "PHY reference clock"). also updated the numbers in
        the description now that we have the clock hierarchy right (at least
        we hope so)
      
      changes since v2 at [2]:
      - added PATCH #2 to make the following patch easier
      - Emiliano reported that there's currently another bug in the
        dwmac-meson8b driver which prevents it from working with RGMII PHYs on
        Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
        (instead of a divide by 5 or divide by 10 clock divider). This has not
        been visible on GXBB and later due to the input clock which always led
        to a selection of "divide by 10" (which is done internally in the IP
        block, but the bit actually means "enable RGMII clock output").
        PATCH #3 was added to address this issue.
      - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
        updated and the patch itself rebased because the m25_div clock was
        removed with the new PATCH #3 (so some of the statements were not
        valid anymore)
      
      changes since v1 at [1]:
      - changed the subject of the cover-letter to indicate that this is all
        about the RGMII clock
      - added PATCH #1 which ensures that we don't unnecessarily change the
        parent clocks in RMII mode (and also makes the code easier to
        understand)
      - changed subject of PATCH #2 (formerly PATCH #1) to state that this
        is about the RGMII clock
      - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
      - replaced PATCH #3 (formerly PATCH #2) with one that sets
        CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
        on Meson8b correctly
      
      [0] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
      [1] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
      [2] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005861.html
      [3] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005899.html
      [4] http://lists.infradead.org/pipermail/linux-amlogic/2018-January/006125.html
      ====================
      Tested-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee81098e
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock · fb7d38a7
      Martin Blumenstingl authored
      On Meson8b the only valid input clock is MPLL2. The bootloader
      configures that to run at 500002394Hz which cannot be divided evenly
      down to 125MHz using the m250_div clock. Currently the common clock
      framework chooses a m250_div of 2 - with the internal fixed
      "divide by 10" this results in a RGMII TX clock of 125001197Hz (120Hz
      above the requested 125MHz).
      
      Letting the common clock framework propagate the rate changes up to the
      parent of m250_mux allows us to get the best possible clock rate. With
      this patch the common clock framework calculates a rate of
      very-close-to-250MHz (249999701Hz to be exact) for the MPLL2 clock
      (which is the mux input). Dividing that by 2 (which is an internal,
      fixed divider for the RGMII TX clock) gives us an RGMII TX clock of
      124999850Hz (which is only 150Hz off the requested 125MHz, compared to
      1197Hz based on the MPLL2 rate set by u-boot and the Amlogic GPL kernel
      sources).
      
      SoCs from the Meson GX series are not affected by this change because
      the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
      since it's running at 1GHz, so it's already a multiple of 250MHz and
      125MHz).
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Suggested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb7d38a7
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b · 433c6cab
      Martin Blumenstingl authored
      Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
      set by Odroid-C1's u-boot is close to (but not exactly) 500MHz. The
      exact rate is 500002394Hz, which is calculated in
      drivers/clk/meson/clk-mpll.c using the following formula:
      DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
      Odroid-C1's u-boot configures MPLL2 with the following values:
      - SDM_DEN = 16384
      - SDM = 1638
      - N2 = 5
      
      The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
      the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
      common clock framework chooses a divider which is too big to generate
      the 250MHz clock (a divider of 2 would be needed, but this is rounded up
      to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
      because it requires a (close to) 125MHz RGMII TX clock (on Gbit speeds,
      the IP block internally divides that down to 25MHz on 100Mbit/s
      connections and 2.5MHz on 10Mbit/s connections - we don't need any
      special configuration for that).
      
      Round the divider to the closest value to prevent this issue on Meson8b.
      This means we'll now end up with a clock rate for the RGMII TX clock of
      125001197Hz (= 125MHz plus 1197Hz), which is close-enough to 125MHz.
      This has no effect on the Meson GX SoCs since there fclk_div2 is used as
      input clock, which has a rate of 1000MHz (and thus is divisible cleanly
      to 250MHz and 125MHz).
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Reported-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      433c6cab
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration · 4f6a71b8
      Martin Blumenstingl authored
      Tests (using an oscilloscope and an Odroid-C1 board with a RTL8211F
      RGMII PHY) have shown that the PRG_ETH0 register behaves as follows:
      - bit 4 is a mux to choose between two parent clocks. according to the
        public S805 datasheet the only supported parent clock is MPLL2 (this
        was not verified using the oscilloscope).
        The public S805/S905 datasheet claims that this bit is reserved.
      - bits 9:7 control a one-based divider (register value 1 means "divide
        by 1", etc.) for the input clock. we call this clock the "m250_div"
        clock because it's value is always supposed to be (close to) 250MHz
        (see below for an explanation).
        The description in the public S805/S905 datasheet is a bit cryptic,
        but it comes down to "input clock = 250MHz * value" (which could also
        be expressed as "250MHz = input clock / value")
      - there seems to be an internal fixed divide-by-2 clock which takes the
        output from the m250_div and divides it by 2. This is not unusual on
        Amlogic SoCs, since the SDIO (MMC) driver also uses an internal fixed
        divide-by-2 clock.
        This is not documented in the public S805/S905 datasheet
      - bit 10 controls a gate clock which enables or disables the RGMII TX
        clock (which is an output on the MAC/SoC and an input in the PHY). we
        call this the "rgmii_tx_en" clock. if this bit is set to "0" the RGMII
        TX clock output is close to 0
        The description for this bit in the public S805/S905 datasheet is
        "Generate 25MHz clock for PHY". Based on these tests it's believed
        that this is wrong, and should probably read "Generate the 125MHz
        RGMII TX clock for the PHY"
      - the RGMII TX clock has to be set to 125MHz - the IP block adjusts the
        output (automatically) depending on the line speed (RGMII specifies
        that Gbit connections use a 125MHz clock, 100Mbit/s connections use a
        25MHz clock and 10Mbit/s connections use a 2.5MHz clock. only Gbit and
        100Mbit/s were tested with an oscilloscope). Due to the requirement
        that this clock always has to be set to 125MHz and due to the fixed
        divide-by-2 parent clock this means that m250_div will always end up
        with a rate of (close to) 250MHz.
      - bits 6:5 are the TX delay, which is also named "clock phase" in some
        of Amlogic's older GPL kernel sources.
      
      The PHY also has an XTAL_IN pin where a 25MHz clock has to be provided.
      Tests with the oscilloscope have shown that this is routed to a crystal
      right next to the RTL8211F PHY. The same seems to be true on the Khadas
      VIM2 (which uses a GXM SoC) board - however the 25MHz crystal is on the
      other side of the PCB there.
      
      This updates the clocks in the dwmac-meson8b driver by replacing the
      "m25_div" with the "rgmii_tx_en" clock and additionally introducing a
      fixed divide-by-2 clock between "m250_div" and "rgmii_tx_en".
      Now we also need to set a frequency of 125MHz on the RGMII clock
      (opposed to the 25MHz we set before, with that non-existing
      divide-by-5-or-10 divider).
      
      Special thanks go to Linus Lüssing for testing the various bits and
      checking the results with an oscilloscope on his Odroid-C1!
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Reported-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Acked-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f6a71b8
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode · 37512b42
      Martin Blumenstingl authored
      Neither the m25_div_clk nor the m250_div_clk or m250_mux_clk are used in
      RMII mode. The m25_div_clk output is routed to the RGMII PHY's "RGMII
      clock".
      This means that we don't need to configure the clocks in RMII mode. The
      driver however did this - with no effect since the clocks are not routed
      to the PHY in RMII mode.
      
      While here also rename meson8b_init_clk to meson8b_init_rgmii_tx_clk to
      make it easier to understand the code.
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37512b42
    • Jakub Kicinski's avatar
      net: sched: red: don't reset the backlog on every stat dump · 416ef9b1
      Jakub Kicinski authored
      Commit 0dfb33a0 ("sch_red: report backlog information") copied
      child's backlog into RED's backlog.  Back then RED did not maintain
      its own backlog counts.  This has changed after commit 2ccccf5f
      ("net_sched: update hierarchical backlog too") and commit d7f4f332
      ("sch_red: update backlog as well").  Copying is no longer necessary.
      
      Tested:
      
      $ tc -s qdisc show dev veth0
      qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
       Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 1260b 14p requeues 14
        marked 0 early 0 pdrop 0 other 0
      qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
       Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
       backlog 1260b 14p requeues 14
      
      Recently RED offload was added.  We need to make sure drivers don't
      depend on resetting the stats.  This means backlog should be treated
      like any other statistic:
      
        total_stat = new_hw_stat - prev_hw_stat;
      
      Adjust mlxsw.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      416ef9b1
    • Saeed Mahameed's avatar
      net/mlx5: Fix build break · 2d83619d
      Saeed Mahameed authored
      The latest merge between net and net-next introduced a complier assert in
      mlx5 driver.  In hca_cap_bits older fields are kept along with newer
      fields that should have replaced them.
      
      Fixes: c02b3741 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d83619d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c02b3741
      David S. Miller authored
      Overlapping changes all over.
      
      The mini-qdisc bits were a little bit tricky, however.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c02b3741
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 7018d1b3
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-01-17
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add initial BPF map offloading for nfp driver. Currently only
         programs were supported so far w/o being able to access maps.
         Offloaded programs are right now only allowed to perform map
         lookups, and control path is responsible for populating the
         maps. BPF core infrastructure along with nfp implementation is
         provided, from Jakub.
      
      2) Various follow-ups to Josef's BPF error injections. More
         specifically that includes: properly check whether the error
         injectable event is on function entry or not, remove the percpu
         bpf_kprobe_override and rather compare instruction pointer
         with original one, separate error-injection from kprobes since
         it's not limited to it, add injectable error types in order to
         specify what is the expected type of failure, and last but not
         least also support the kernel's fault injection framework, all
         from Masami.
      
      3) Various misc improvements and cleanups to the libbpf Makefile.
         That is, fix permissions when installing BPF header files, remove
         unused variables and functions, and also install the libbpf.h
         header, from Jesper.
      
      4) When offloading to nfp JIT and the BPF insn is unsupported in the
         JIT, then reject right at verification time. Also fix libbpf with
         regards to ELF section name matching by properly treating the
         program type as prefix. Both from Quentin.
      
      5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
         This is needed, for example, when building libfd from source as
         bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.
      
      6) xdp_convert_ctx_access() is simplified since it doesn't need to
         set target size during verification, from Jesper.
      
      7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
         program types, from Roman.
      
      8) Various functions in BPF cpumap were not declared static, from Wei.
      
      9) Fix a double semicolon in BPF samples, from Luis.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7018d1b3
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 8cbab92d
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "We had a few more items creep up over the last week. Given we are in
        -rc8, these are obviously limited to bugs that have a big downside and
        for which we are certain of the fix.
      
        The first is a straight up oops bug that all you have to do is read
        the code to see it's a guaranteed 100% oops bug.
      
        The second is a use-after-free issue. We get away lucky if the queue
        we are shutting down is empty, but if it isn't, we can end up oopsing.
        We really need to drain the queue before destroying it.
      
        The final one is an issue with bad user input causing us to access our
        port array out of bounds. While fixing the array out of bounds issue,
        it was noticed that the original code did the same thing twice (the
        call to rdma_ah_set_port_num()), so its removal is not balanced by a
        readd elsewhere, it was already where it needed to be in addition to
        where it didn't need to be.
      
        Summary:
      
         - Oops fix in hfi1 driver
      
         - use-after-free issue in iser-target
      
         - use of user supplied array index without proper checking"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx5: Fix out-of-bound access while querying AH
        IB/hfi1: Prevent a NULL dereference
        iser-target: Fix possible use-after-free in connection establishment error
      8cbab92d
    • Daniel Borkmann's avatar
      Merge branch 'bpf-libbpf-cleanups' · e8a9d968
      Daniel Borkmann authored
      Jesper Dangaard Brouer says:
      
      ====================
      This patchset contains some small improvements and cleanup for
      the Makefile in tools/lib/bpf/.
      
      It worries me that the libbpf.so shared library is not versioned,
      but it not addressed in this patchset.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e8a9d968
    • Jesper Dangaard Brouer's avatar
      libbpf: Makefile set specified permission mode · 7110d80d
      Jesper Dangaard Brouer authored
      The third parameter to do_install was not used by $(INSTALL) command.
      Fix this by only setting the -m option when the third parameter is supplied.
      
      The use of a third parameter was introduced in commit  eb54e522 ("bpf:
      install libbpf headers on 'make install'").
      
      Without this change, the header files are install as executables files (755).
      
      Fixes: eb54e522 ("bpf: install libbpf headers on 'make install'")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7110d80d
    • Jesper Dangaard Brouer's avatar
      libbpf: cleanup Makefile, remove unused elements · 63c85910
      Jesper Dangaard Brouer authored
      The plugin_dir_SQ variable is not used, remove it.
      The function update_dir is also unused, remove it.
      The variable $VERSION_FILES is empty, remove it.
      
      These all originates from the introduction of the Makefile, and is likely a copy paste
      from tools/lib/traceevent/Makefile.
      
      Fixes: 1b76c13e ("bpf tools: Introduce 'bpf' library and add bpf feature check")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      63c85910
    • Jesper Dangaard Brouer's avatar
      libbpf: install the header file libbpf.h · 7d386c62
      Jesper Dangaard Brouer authored
      It seems like an oversight not to install the header file for libbpf,
      given the libbpf.so + libbpf.a files are installed.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7d386c62
    • Daniel Borkmann's avatar
      Merge branch 'bpf-various-improvements' · f2f742f4
      Daniel Borkmann authored
      Jakub Kicinski says:
      
      ====================
      This series combines a number of random improvements ranging from
      libbpf to nfp driver.  NFP patches make better use of the verifier
      log.  There is a requested adjustment to the map offload code, and
      a warning fix for a W=1 build to the disassembler.  Quentin also
      fixes the libbpf program type detection, while Jiong allows the use
      of libbfd compiled from source.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f2f742f4
    • Quentin Monnet's avatar
      nfp: bpf: reject program on instructions unknown to the JIT compiler · 74801e50
      Quentin Monnet authored
      If an eBPF instruction is unknown to the driver JIT compiler, we can
      reject the program at verification time.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      74801e50
    • Jakub Kicinski's avatar
      nfp: bpf: print map lookup problems into verifier log · 7dfa4d87
      Jakub Kicinski authored
      Use the verifier log to output error messages if map lookup
      can't be offloaded.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7dfa4d87
    • Quentin Monnet's avatar
      libbpf: fix string comparison for guessing eBPF program type · d77be689
      Quentin Monnet authored
      libbpf is able to deduce the type of a program from the name of the ELF
      section in which it is located. However, the comparison is made on the
      first n characters, n being determined with sizeof() applied to the
      reference string (e.g. "xdp"). When such section names are supposed to
      receive a suffix separated with a slash (e.g. "kprobe/"), using sizeof()
      takes the final NUL character of the reference string into account,
      which implies that both strings must be equal. Instead, the desired
      behaviour would consist in taking the length of the string, *without*
      accounting for the ending NUL character, and to make sure the reference
      string is a prefix to the ELF section name.
      
      Subtract 1 to the total size of the string for obtaining the length for
      the comparison.
      
      Fixes: 583c9009 ("libbpf: add ability to guess program type based on section name")
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d77be689
    • Jiong Wang's avatar
      tools: bpftool: add -DPACKAGE when including bfd.h · 39b72ccd
      Jiong Wang authored
      bfd.h is requiring including of config.h except when PACKAGE or
      PACKAGE_VERSION are defined.
      
        /* PR 14072: Ensure that config.h is included first.  */
        #if !defined PACKAGE && !defined PACKAGE_VERSION
        #error config.h must be included before this header
        #endif
      
      This check has been introduced since May-2012. It doesn't show up in bfd.h
      on some Linux distribution, probably because distributions have remove it
      when building the package.
      
      However, sometimes the user might just build libfd from source code then
      link bpftool against it. For this case, bfd.h will be original that we need
      to define PACKAGE or PACKAGE_VERSION.
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      39b72ccd
    • Jakub Kicinski's avatar
      bpf: annotate bpf_insn_print_t with __printf · 48b32563
      Jakub Kicinski authored
      Functions of type bpf_insn_print_t take printf-like format
      string, mark the type accordingly.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      48b32563
    • Jakub Kicinski's avatar
      bpf: offload: make bpf_offload_dev_match() reject host+host case · 0a2d28ff
      Jakub Kicinski authored
      Daniel suggests it would be more logical for bpf_offload_dev_match()
      to return false is either the program or the map are not offloaded,
      rather than treating the both not offloaded case as a "matching
      CPU/host device".
      
      This makes no functional difference today, since verifier only calls
      bpf_offload_dev_match() when one of the objects is offloaded.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0a2d28ff
  2. 16 Jan, 2018 9 commits
    • Luis de Bethencourt's avatar
      samples/bpf: Fix trailing semicolon · 4c38f74c
      Luis de Bethencourt authored
      The trailing semicolon is an empty statement that does no operation.
      Removing it since it doesn't do anything.
      Signed-off-by: default avatarLuis de Bethencourt <luisbg@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4c38f74c
    • Wei Yongjun's avatar
      bpf: cpumap: make some functions static · 0fe875c5
      Wei Yongjun authored
      Fixes the following sparse warnings:
      
      kernel/bpf/cpumap.c:146:6: warning:
       symbol '__cpu_map_queue_destructor' was not declared. Should it be static?
      kernel/bpf/cpumap.c:225:16: warning:
       symbol 'cpu_map_build_skb' was not declared. Should it be static?
      kernel/bpf/cpumap.c:340:26: warning:
       symbol '__cpu_map_entry_alloc' was not declared. Should it be static?
      kernel/bpf/cpumap.c:398:6: warning:
       symbol '__cpu_map_entry_free' was not declared. Should it be static?
      kernel/bpf/cpumap.c:441:6: warning:
       symbol '__cpu_map_entry_replace' was not declared. Should it be static?
      kernel/bpf/cpumap.c:454:5: warning:
       symbol 'cpu_map_delete_elem' was not declared. Should it be static?
      kernel/bpf/cpumap.c:467:5: warning:
       symbol 'cpu_map_update_elem' was not declared. Should it be static?
      kernel/bpf/cpumap.c:505:6: warning:
       symbol 'cpu_map_free' was not declared. Should it be static?
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0fe875c5
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b45a53be
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers.
      
       2) Memory leak in key_notify_policy(), from Steffen Klassert.
      
       3) Fix overflow with bpf arrays, from Daniel Borkmann.
      
       4) Fix RDMA regression with mlx5 due to mlx5 no longer using
          pci_irq_get_affinity(), from Saeed Mahameed.
      
       5) Missing RCU read locking in nl80211_send_iface() when it calls
          ieee80211_bss_get_ie(), from Dominik Brodowski.
      
       6) cfg80211 should check dev_set_name()'s return value, from Johannes
          Berg.
      
       7) Missing module license tag in 9p protocol, from Stephen Hemminger.
      
       8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike
          Maloney.
      
       9) Fix endless loop in netlink extack code, from David Ahern.
      
      10) TLS socket layer sets inverted error codes, resulting in an endless
          loop. From Robert Hering.
      
      11) Revert openvswitch erspan tunnel support, it's mis-designed and we
          need to kill it before it goes into a real release. From William Tu.
      
      12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
        net, sched: fix panic when updating miniq {b,q}stats
        qed: Fix potential use-after-free in qed_spq_post()
        nfp: use the correct index for link speed table
        lan78xx: Fix failure in USB Full Speed
        sctp: do not allow the v4 socket to bind a v4mapped v6 address
        sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
        sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
        ibmvnic: Fix pending MAC address changes
        netlink: extack: avoid parenthesized string constant warning
        ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
        net: Allow neigh contructor functions ability to modify the primary_key
        sh_eth: fix dumping ARSTR
        Revert "openvswitch: Add erspan tunnel support."
        net/tls: Fix inverted error codes to avoid endless loop
        ipv6: ip6_make_skb() needs to clear cork.base.dst
        sctp: avoid compiler warning on implicit fallthru
        net: ipv4: Make "ip route get" match iif lo rules again.
        netlink: extack needs to be reset each time through loop
        tipc: fix a memory leak in tipc_nl_node_get_link()
        ipv6: fix udpv6 sendmsg crash caused by too small MTU
        ...
      b45a53be
    • Colin Ian King's avatar
      bnxt_en: don't update cpr->rx_bytes with uninitialized length len · e7e70fa6
      Colin Ian King authored
      Currently in the cases where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP or
      CMP_TYPE_RX_L2_TPA_END_CMP the exit path updates cpr->rx_bytes with an
      uninitialized length len.  Fix this by adding a new exit path that does
      not update the cpr stats with the bogus length len and remove the unused
      label next_rx_no_prod.
      
      Detected by CoverityScan, CID#1463807 ("Uninitialized scalar variable")
      Fixes: 6a8788f2 ("bnxt_en: add support for software dynamic interrupt moderation")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7e70fa6
    • Linus Torvalds's avatar
      Merge tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 41aa5e5d
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A few small last-minute fixes that should sneak into 4.15:
      
         - remove a spurious WARN_ON() triggered by syzkaller
      
         - fix for ioctl races in ALSA sequencer
      
         - two trivial HD-audio fixup entries"
      
      * tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: seq: Make ioctls race-free
        ALSA: pcm: Remove yet superfluous WARN_ON()
        ALSA: hda - Apply the existing quirk to iMac 14,1
        ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
      41aa5e5d
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 921d4f67
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Bring back context level recursive protection in ring buffer.
      
         The simpler counter protection failed, due to a path when tracing
         with trace_clock_global() as it could not be reentrant and depended
         on the ring buffer recursive protection to keep that from happening.
      
       - Prevent branch profiling when FORTIFY_SOURCE is enabled.
      
         It causes 50 - 60 MB in warning messages. Branch profiling should
         never be run on production systems, so there's no reason that it
         needs to be enabled with FORTIFY_SOURCE.
      
      * tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
        ring-buffer: Bring back context level recursive checks
      921d4f67
    • Daniel Borkmann's avatar
      net, sched: fix panic when updating miniq {b,q}stats · 81d947e2
      Daniel Borkmann authored
      While working on fixing another bug, I ran into the following panic
      on arm64 by simply attaching clsact qdisc, adding a filter and running
      traffic on ingress to it:
      
        [...]
        [  178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000
        [  178.197314] Mem abort info:
        [  178.200121]   ESR = 0x96000004
        [  178.203168]   Exception class = DABT (current EL), IL = 32 bits
        [  178.209095]   SET = 0, FnV = 0
        [  178.212157]   EA = 0, S1PTW = 0
        [  178.215288] Data abort info:
        [  178.218175]   ISV = 0, ISS = 0x00000004
        [  178.222019]   CM = 0, WnR = 0
        [  178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33
        [  178.231531] [0000810fb501f000] *pgd=0000000000000000
        [  178.236508] Internal error: Oops: 96000004 [#1] SMP
        [...]
        [  178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G        W        4.15.0-rc7+ #5
        [  178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
        [  178.326887] pstate: 60400005 (nZCv daif +PAN -UAO)
        [  178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
        [  178.336728] lr : __netif_receive_skb+0x28/0x78
        [  178.341161] sp : ffff00002344b750
        [  178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580
        [  178.349769] x27: 0000000000000000 x26: ffff000009378000
        [...]
        [  178.418715] x1 : 0000000000000054 x0 : 0000000000000000
        [  178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
        [  178.430537] Call trace:
        [  178.432976]  __netif_receive_skb_core+0x49c/0xac8
        [  178.437670]  __netif_receive_skb+0x28/0x78
        [  178.441757]  process_backlog+0x9c/0x160
        [  178.445584]  net_rx_action+0x2f8/0x3f0
        [...]
      
      Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
      which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
      Problem is that this cannot work since they are actually set up right after
      the qdisc ->init() callback in qdisc_create(), so first packet going into
      sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
      therefore panic.
      
      In order to fix this, allocation of {b,q}stats needs to happen before we
      call into ->init(). In net-next, there's already such option through commit
      d59f5ffa ("net: sched: a dflt qdisc may be used with per cpu stats").
      However, the bug needs to be fixed in net still for 4.15. Thus, include
      these bits to reduce any merge churn and reuse the static_flags field to
      set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
      there is no other user left. Prashant Bhole ran into the same issue but
      for net-next, thus adding him below as well as co-author. Same issue was
      also reported by Sandipan Das when using bcc.
      
      Fixes: 46209401 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
      Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.htmlReported-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Co-authored-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Co-authored-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d947e2
    • Alexey Dobriyan's avatar
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan authored
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96890d62
    • Roland Dreier's avatar
      qed: Fix potential use-after-free in qed_spq_post() · 70eeff66
      Roland Dreier authored
      We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
      calling qed_spq_add_entry().  The test is fine is the mode is EBLOCK,
      but if it isn't then qed_spq_add_entry() might kfree(p_ent).
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70eeff66