1. 05 Jun, 2019 31 commits
    • Jacob Keller's avatar
      ixgbe: fix PTP SDP pin setup on X540 hardware · 68d9676f
      Jacob Keller authored
      The function ixgbe_ptp_setup_sdp_X540 attempts to program a software
      defined pin, in order to generate a pulse-per-second output on SDP 0.
      
      It does work to generate the output, but does not align the output on
      the full second. Additionally, it does not take into account the
      cyclecounter multiplier. This leads to somewhat confusing code which is
      likely to be incorrect if blindly copied to another hardware type.
      
      Update this code to account for the cyclecounter multiplier, and to
      directly use timecounter_read.
      
      This change ensures that the SDP output will align properly on a full
      second, and makes the intent of the calculations a bit more clear.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      68d9676f
    • Jacob Keller's avatar
      ixgbe: reduce PTP Tx timestamp timeout to 1 second · 8fd70994
      Jacob Keller authored
      Previously we waited for a whole 15 seconds before we cleared the Tx
      timestamp state. This is astronomically long compared to the worst case
      timings expected by our devices. In addition, this is longer than the
      wait in ptp4l when it detects a fault (caused by missing Tx timestamps).
      Thus, reduce the timer to only 1 second, which is well after the maximum
      expected delay. This should reduce user frustration when a timestamp
      does get dropped for some reason.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8fd70994
    • William Tu's avatar
      ixgbe: fix AF_XDP tx packet count · 1bc1ffb0
      William Tu authored
      The total_packets count at ixgbe_clean_xdp_tx_irq is
      always zero when testing with xdpsock -t -N. Set the gso_segs
      to 1 to make the tx packet count correct.
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1bc1ffb0
    • William Tu's avatar
      ixgbe: fix AF_XDP tx byte count · 30d5703b
      William Tu authored
      The tx bytecount is done twice.  When running
      './xdpsock -t -N -i eth3' and 'ip -s link show dev eth3'
      The avg packet size is 120 instead of 60. So remove the
      extra one.
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      30d5703b
    • Jan Sokolowski's avatar
      ixgbe: remove umem from adapter · 9ba095a6
      Jan Sokolowski authored
      As current implementation of netdev already contains and provides
      umems for us, we no longer have the need to contain these
      structures in ixgbe_adapter.
      
      Refactor the code to operate on netdev-provided umems.
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9ba095a6
    • Jan Sokolowski's avatar
      ixgbe: add tracking of AF_XDP zero-copy state for each queue pair · d49e286d
      Jan Sokolowski authored
      Here, we add a bitmap to the ixgbe_adapter that tracks if a
      certain queue pair has been "zero-copy enabled" via the ndo_bpf.
      The bitmap is used in ixgbe_xsk_umem, and enables zero-copy if
      and only if XDP is enabled, the corresponding qid in the bitmap
      is set, and the umem is non-NULL;
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d49e286d
    • Fabio Estevam's avatar
      net: fec_ptp: Use dev_err() instead of pr_err() · 11694b03
      Fabio Estevam authored
      dev_err() is more appropriate for printing error messages inside
      drivers, so switch to dev_err().
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11694b03
    • David S. Miller's avatar
      Merge branch 'r8169-factor-out-firmware-handling' · e88e17fd
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      r8169: factor out firmware handling
      
      Let's factor out firmware handling into a separate source code file.
      This simplifies reading the code and makes clearer what the interface
      between driver and firmware handling is.
      
      v2:
      - fix small whitespace issue in patch 2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e88e17fd
    • Heiner Kallweit's avatar
      r8169: factor out firmware handling · 8197f9d2
      Heiner Kallweit authored
      Let's factor out firmware handling into a separate source code file.
      This simplifies reading the code and makes clearer what the interface
      between driver and firmware handling is.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8197f9d2
    • Heiner Kallweit's avatar
      r8169: rename r8169.c to r8169_main.c · 25e992a4
      Heiner Kallweit authored
      In preparation of factoring out firmware handling rename r8169.c to
      r8169_main.c.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25e992a4
    • Randy Dunlap's avatar
      net: ethernet: mediatek: fix mtk_eth_soc build errors & warnings · d28d66e5
      Randy Dunlap authored
      Fix build errors in Mediatek mtk_eth_soc driver.
      
      It looks like these 3 source files were meant to be linked together
      since 2 of them are library-like functions,
      but they are currently being built as 3 loadable modules.
      
      Fixes these build errors:
      
        WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/mediatek/mtk_eth_path.o
        WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/mediatek/mtk_sgmii.o
        ERROR: "mtk_sgmii_init" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
        ERROR: "mtk_setup_hw_path" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
        ERROR: "mtk_sgmii_setup_mode_force" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
        ERROR: "mtk_sgmii_setup_mode_an" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
        ERROR: "mtk_w32" [drivers/net/ethernet/mediatek/mtk_eth_path.ko] undefined!
        ERROR: "mtk_r32" [drivers/net/ethernet/mediatek/mtk_eth_path.ko] undefined!
      
      This changes the loadable module name from mtk_eth_soc to mtk_eth.
      I didn't see a way to leave it as mtk_eth_soc.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Sean Wang <sean.wang@mediatek.com>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Felix Fietkau <nbd@openwrt.org>
      Cc: Nelson Chang <nelson.chang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d28d66e5
    • David S. Miller's avatar
      Merge branch 'net-dsa-mv88e6xxx-support-for-mv88e6250' · 2a99283c
      David S. Miller authored
      Rasmus Villemoes says:
      
      ====================
      net: dsa: mv88e6xxx: support for mv88e6250
      
      This adds support for the mv88e6250 chip. Initially based on the
      mv88e6240, this time around, I've been through each ->ops callback and
      checked that it makes sense, either replacing with a 6250 specific
      variant or dropping it if no equivalent functionality seems to exist
      for the 6250. Along the way, I found a few oddities in the existing
      code, mostly sent as separate patches/questions.
      
      The one relevant to the 6250 is the ieee_pri_map callback, where the
      existing mv88e6085_g1_ieee_pri_map() is actually wrong for many of the
      existing users. I've put the mv88e6250_g1_ieee_pri_map() patch first
      in case some of the existing chips get switched over to use that and
      it is deemed important enough for -stable.
      
      v4:
      - fix style issue in 1/10
      - add Andrew's reviewed-by to 1,6,7,8,9,10.
      
      v3:
      - rebase on top of net-next/master
      - add reviewed-bys to patches unchanged from v2 (2,3,4,5)
      - add 6250-specific ->ieee_pri_map, ->port_set_speed, ->port_link_state (1,6,7)
      - in addition, use mv88e6065_phylink_validate for ->phylink_validate,
        and don't implement ->port_get_cmode, ->port_set_jumbo_size,
        ->port_disable_learn_limit, ->rmu_disable
      - drop ptp support
      - add patch adding the compatible string to the DT binding (9)
      - add small refactoring patch (10)
      
      v2:
      - rebase on top of net-next/master
      - add reviewed-by to two patches unchanged from v1 (2,3)
      - add separate watchdog_ops
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a99283c
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: refactor mv88e6352_g1_reset · 7358fd80
      Rasmus Villemoes authored
      The new mv88e6250_g1_reset() is identical to mv88e6352_g1_reset() except
      for the call of mv88e6352_g1_wait_ppu_polling(), so refactor the 6352
      version in term of the 6250 one. No functional change.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7358fd80
    • Rasmus Villemoes's avatar
      dt-bindings: net: dsa: marvell: add "marvell,mv88e6250" compatible string · dabde0da
      Rasmus Villemoes authored
      The mv88e6250 has port_base_addr 0x8 or 0x18 (depending on
      configuration pins), so it constitutes a new family and hence needs
      its own compatible string.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dabde0da
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: add support for mv88e6250 · 1f71836f
      Rasmus Villemoes authored
      This adds support for the Marvell 88E6250. I've checked that each
      member in the ops-structure makes sense, and basic switchdev
      functionality works fine.
      
      It uses the new dual_chip option, and since its port registers start
      at SMI address 0x08 or 0x18 (i.e., always sw_addr + 0x08), we need to
      introduce a new compatible string in order for the auto-identification
      in mv88e6xxx_detect() to work.
      
      The chip has four per port 16-bits statistics registers, two of which
      correspond to the existing "sw_in_filtered" and "sw_out_filtered" (but
      at offsets 0x13 and 0x10 rather than 0x12 and 0x13, because why should
      this be easy...). Wiring up those four statistics seems to require
      introducing a STATS_TYPE_PORT_6250 bit or similar, which seems a tad
      ugly, so for now this just allows access to the STATS_TYPE_BANK0 ones.
      
      The chip does have ptp support, and the existing
      mv88e6352_{gpio,avb,ptp}_ops at first glance seem like they would work
      out-of-the-box, but for simplicity (and lack of testing) I'm eliding
      this.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f71836f
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: implement port_link_state for mv88e6250 · ce91c453
      Rasmus Villemoes authored
      The mv88e6250 has a rather different way of reporting the link, speed
      and duplex status. A simple difference is that the link bit is bit 12
      rather than bit 11 of the port status register.
      
      It gets more complicated for speed and duplex, which do not have
      separate fields. Instead, there's a four-bit PortMode field, and
      decoding that depends on whether it's a phy or mii port. For the phy
      ports, only four of the 16 values have defined meaning; the rest are
      called "reserved", so returning {SPEED,DUPLEX}_UNKNOWN seems
      reasonable.
      
      For the mii ports, most possible values are documented (0x3 and 0x5
      are reserved), but I'm unable to make sense of them all. Since the
      bits simply reflect the Px_MODE[3:0] configuration pins, just support
      the subset that I'm certain about. Support for other setups can be
      added later.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce91c453
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: implement port_set_speed for mv88e6250 · a528e5be
      Rasmus Villemoes authored
      The data sheet also mentions the possibility of selecting 200 Mbps for
      the MII ports (ports 5 and 6) by setting the ForceSpd field to
      0x2 (aka MV88E6065_PORT_MAC_CTL_SPEED_200). However, there's a note
      that "actual speed is determined by bit 8 above", and flipping back a
      page, one finds that bits 13:8 are reserved...
      
      So without further information on what bit 8 means, let's stick to
      supporting just 10 and 100 Mbps on all ports.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a528e5be
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: implement watchdog_ops for mv88e6250 · 855cdfde
      Rasmus Villemoes authored
      The MV88E6352_G2_WDOG_CTL_* bits almost, but not quite, describe the
      watchdog control register on the mv88e6250. Among those actually
      referenced in the code, only QC_ENABLE differs (bit 6 rather than bit
      5).
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      855cdfde
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: implement vtu_getnext and vtu_loadpurge for mv88e6250 · bec8e572
      Rasmus Villemoes authored
      These are almost identical to the 6185 variants, but have fewer bits
      for the FID.
      
      Bit 10 of the VTU_OP register (offset 0x05) is the VidPolicy bit,
      which one should probably preserve in mv88e6xxx_g1_vtu_op(), instead
      of always writing a 0. However, on the 6352 family, that bit is
      located at bit 12 in the VTU FID register (offset 0x02), and is always
      unconditionally cleared by the mv88e6xxx_g1_vtu_fid_write()
      function.
      
      Since nothing in the existing driver seems to know or care about that
      bit, it seems reasonable to not add the boilerplate to preserve it for
      the 6250 (which would require adding a chip-specific vtu_op function,
      or adding chip-quirks to the existing one).
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bec8e572
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: prepare mv88e6xxx_g1_atu_op() for the mv88e6250 · 7b83df0d
      Rasmus Villemoes authored
      All the currently supported chips have .num_databases either 256 or
      4096, so this patch does not change behaviour for any of those. The
      mv88e6250, however, has .num_databases == 64, and it does not put the
      upper two bits in ATU control 13:12, but rather in ATU Operation
      9:8. So change the logic to prepare for supporting mv88e6250.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b83df0d
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: introduce support for two chips using direct smi addressing · f30a19b8
      Rasmus Villemoes authored
      The 88e6250 (as well as 6220, 6071, 6070, 6020) do not support
      multi-chip (indirect) addressing. However, one can still have two of
      them on the same mdio bus, since the device only uses 16 of the 32
      possible addresses, either addresses 0x00-0x0F or 0x10-0x1F depending
      on the ADDR4 pin at reset [since ADDR4 is internally pulled high, the
      latter is the default].
      
      In order to prepare for supporting the 88e6250 and friends, introduce
      mv88e6xxx_info::dual_chip to allow having a non-zero sw_addr while
      still using direct addressing.
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f30a19b8
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: add mv88e6250_g1_ieee_pri_map · df63b0d9
      Rasmus Villemoes authored
      Quite a few of the existing supported chips that use
      mv88e6085_g1_ieee_pri_map as ->ieee_pri_map (including, incidentally,
      mv88e6085 itself) actually have a reset value of 0xfa50 in the
      G1_IEEE_PRI register.
      
      The data sheet for the mv88e6095, however, does describe a reset value
      of 0xfa41.
      
      So rather than changing the value in the existing callback, introduce
      a new variant with the 0xfa50 value. That will be used by the upcoming
      mv88e6250, and existing chips can be switched over one by one,
      preferably double-checking both the data sheet and actual hardware in
      each case - if anybody actually feels this is important enough to
      care.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df63b0d9
    • Ronak Doshi's avatar
      vmxnet3: turn off lro when rxcsum is disabled · 3dd7400b
      Ronak Doshi authored
      Currently, when rx csum is disabled, vmxnet3 driver does not turn
      off lro, which can cause performance issues if user does not turn off
      lro explicitly. This patch adds fix_features support which is used to
      turn off LRO whenever RXCSUM is disabled.
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarRishi Mehta <rmehta@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dd7400b
    • David S. Miller's avatar
      Merge branch 'net-add-struct-nexthop-to-fib-info' · 9ec49a7e
      David S. Miller authored
      David Ahern says:
      
      ====================
      net: add struct nexthop to fib{6}_info
      
      Set 10 of 11 to improve route scalability via support for nexthops as
      standalone objects for fib entries.
          https://lwn.net/Articles/763950/
      
      This sets adds 'struct nexthop' to fib_info and fib6_info. IPv4
      already handles multiple fib_nh entries in a single fib_info, so
      the conversion to use a nexthop struct is fairly mechanical. IPv6
      using a nexthop struct with a fib6_info impacts a lot of core logic
      which is built around the assumption of a single, builtin fib6_nh
      per fib6_info. To make this easier to review, this set adds
      nexthop to fib6_info and adds checks in most places fib6_info is
      used. The next set finishes the IPv6 conversion, walking through
      the places that need to consider all fib6_nh within a nexthop struct.
      
      Offload drivers - mlx5, mlxsw and rocker - are changed to fail FIB
      entries using nexthop objects. That limitation can be removed once
      the drivers are updated to properly support separate nexthops.
      
      This set starts by adding accessors for fib_nh and fib_nhs in a
      fib_info. This makes it easier to extract the number of nexthops
      in the fib entry and a specific fib_nh once the entry references
      a struct nexthop. Patch 2 converts more of IPv4 code to use
      fib_nh_common allowing a struct nexthop to use a fib6_nh with an
      IPv4 entry.
      
      Patches 3 and 4 add 'struct nexthop' to fib{6}_info and update
      references to both take a different path when it is set. New
      exported functions are added to the nexthop code to validate a
      nexthop struct when configured for use with a fib entry. IPv4
      is allowed to use a nexthop with either v4 or v6 entries. IPv6
      is limited to v6 entries only. In both cases list_heads track
      the fib entries using a nexthop struct for fast correlation on
      events (e.g., device events or nexthop events like delete or
      replace).
      
      The last 3 patches add hooks to drivers listening for FIB
      notificationas. All 3 of them reject the routes as unsupported,
      returning an error message to the user via extack. For mlxsw
      at least this is a stop gap measure until the driver is updated for
      proper support.
      
      Functional tests for nexthops have already been committed. Those tests
      will be active after the next patch set which makes the code paths
      created by this set and the next one live.
      
      Existing code paths moved to the else branch of 'if (f{6}i->nh)' checks
      are covered by existing tests under selftests/net.
      
      v3
      - remove ip6_create_rt_rcu from ip6_pol_route in patch 4 and use pcpu
        routes for REJECT routes with the blackhole nexthop (request from Wei)
      
      v2
      - no code changes from v1
      - commit messages for first 4 patches updated
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ec49a7e
    • David Ahern's avatar
      rocker: Fail attempts to use routes with nexthop objects · dbcc4fa7
      David Ahern authored
      Fail attempts to use nexthop objects with routes until support can be
      properly added.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbcc4fa7
    • David Ahern's avatar
      mlx5: Fail attempts to use routes with nexthop objects · 6a87afc0
      David Ahern authored
      Fail attempts to use nexthop objects with routes until support can be
      properly added.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a87afc0
    • David Ahern's avatar
      mlxsw: Fail attempts to use routes with nexthop objects · 54250805
      David Ahern authored
      Fail attempts to use nexthop objects with routes until support can be
      properly added.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54250805
    • David Ahern's avatar
      ipv6: Plumb support for nexthop object in a fib6_info · f88d8ea6
      David Ahern authored
      Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
      fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
      referencing a nexthop object can not have 'sibling' entries (the old way
      of doing multipath routes), the nh_list is a union with fib6_siblings.
      
      Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
      using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
      and delete fib entries using the nexthop.
      
      Add a few nexthop helpers for use when a nexthop is added to fib6_info:
      - nexthop_fib6_nh - return first fib6_nh in a nexthop object
      - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
        if the fib6_info references a nexthop object
      - nexthop_path_fib6_result - similar to ipv4, select a path within a
        multipath nexthop object. If the nexthop is a blackhole, set
        fib6_result type to RTN_BLACKHOLE, and set the REJECT flag
      
      Update the fib6_info references to check for nh and take a different path
      as needed:
      - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
        be coalesced with other fib entries into a multipath route
      - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
        a nexthop
      - addrconf (host routes), RA's and info entries (anything configured via
        ndisc) does not use nexthop objects
      - fib6_info_destroy_rcu - put reference to nexthop object
      - fib6_purge_rt - drop fib6_info from f6i_list
      - fib6_select_path - update to use the new nexthop_path_fib6_result when
        fib entry uses a nexthop object
      - rt6_device_match - update to catch use of nexthop object as a blackhole
        and set fib6_type and flags.
      - ip6_route_info_create - don't add space for fib6_nh if fib entry is
        going to reference a nexthop object, take a reference to nexthop object,
        disallow use of source routing
      - rt6_nlmsg_size - add space for RTA_NH_ID
      - add rt6_fill_node_nexthop to add nexthop data on a dump
      
      As with ipv4, most of the changes push existing code into the else branch
      of whether the fib entry uses a nexthop object.
      
      Update the nexthop code to walk f6i_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f88d8ea6
    • David Ahern's avatar
      ipv4: Plumb support for nexthop object in a fib_info · 4c7e8084
      David Ahern authored
      Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
      fib_info side of the nexthop <-> fib_info relationship.
      
      Add fi_list list_head to 'struct nexthop' to track fib_info entries
      using a nexthop instance. Add __remove_nexthop_fib and add it to
      __remove_nexthop to walk the new list_head and mark those fib entries
      as dead when the nexthop is deleted.
      
      Add a few nexthop helpers for use when a nexthop is added to fib_info:
      - nexthop_cmp to determine if 2 nexthops are the same
      - nexthop_path_fib_result to select a path for a multipath
        'struct nexthop'
      - nexthop_fib_nhc to select a specific fib_nh_common within a
        multipath 'struct nexthop'
      
      Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
      a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
      case.
      
      Update the fib_info functions to check for fi->nh and take a different
      path as needed:
      - free_fib_info_rcu - put the nexthop object reference
      - fib_release_info - remove the fib_info from the nexthop's fi_list
      - nh_comp - use nexthop_cmp when either fib_info references a nexthop
        object
      - fib_info_hashfn - use the nexthop id for the hashing vs the oif of
        each fib_nh in a fib_info
      - fib_nlmsg_size - add space for the RTA_NH_ID attribute
      - fib_create_info - verify nexthop reference can be taken, verify
        nexthop spec is valid for fib entry, and add fib_info to fi_list for
        a nexthop
      - fib_select_multipath - use the new nexthop_path_fib_result to select a
        path when nexthop objects are used
      - fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
        it the same as a fib entry using 'blackhole'
      
      The bulk of the changes are in fib_semantics.c and most of that is
      moving the existing change_nexthops into an else branch.
      
      Update the nexthop code to walk fi_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c7e8084
    • David Ahern's avatar
      ipv4: Prepare for fib6_nh from a nexthop object · dcb1ecb5
      David Ahern authored
      Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
      to use a fib6_nh based nexthop. In the end, only code not using a
      nexthop object in a fib_info should directly access fib_nh in a fib_info
      without checking the famiy and going through fib_nh_common. Those
      functions will be marked when it is not directly evident.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcb1ecb5
    • David Ahern's avatar
      ipv4: Use accessors for fib_info nexthop data · 5481d73f
      David Ahern authored
      Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
      fib_dev macro which is an alias for the first nexthop. Replacements:
      
        fi->fib_dev    --> fib_info_nh(fi, 0)->fib_nh_dev
        fi->fib_nh     --> fib_info_nh(fi, 0)
        fi->fib_nh[i]  --> fib_info_nh(fi, i)
        fi->fib_nhs    --> fib_info_num_path(fi)
      
      where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
      returns fi->fib_nhs.
      
      Move the existing fib_info_nhc to nexthop.h and define the new ones
      there. A later patch adds a check if a fib_info uses a nexthop object,
      and defining the helpers in nexthop.h avoid circular header
      dependencies.
      
      After this all remaining open coded references to fi->fib_nhs and
      fi->fib_nh are in:
      - fib_create_info and helpers used to lookup an existing fib_info
        entry, and
      - the netdev event functions fib_sync_down_dev and fib_sync_up.
      
      The latter two will not be reused for nexthops, and the fib_create_info
      will be updated to handle a nexthop in a fib_info.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5481d73f
  2. 04 Jun, 2019 9 commits