1. 27 Nov, 2018 6 commits
    • Andrey Ignatov's avatar
      libbpf: Verify versioned symbols · 306b267c
      Andrey Ignatov authored
      Since ABI versioning info is kept separately from the code it's easy to
      forget to update it while adding a new API.
      
      Add simple verification that all global symbols exported with LIBBPF_API
      are versioned in libbpf.map version script.
      
      The idea is to check that number of global symbols in libbpf-in.o, that
      is the input to the linker, matches with number of unique versioned
      symbols in libbpf.so, that is the output of the linker. If these numbers
      don't match, it may mean some symbol was not versioned and make will
      fail.
      
      "Unique" means that if a symbol is present in more than one version of
      ABI due to ABI changes, it'll be counted once.
      
      Another option to calculate number of global symbols in the "input"
      could be to count number of LIBBPF_ABI entries in C headers but it seems
      to be fragile.
      
      Example of output when a symbol is missing in version script:
      
          ...
          LD       libbpf-in.o
          LINK     libbpf.a
          LINK     libbpf.so
        Warning: Num of global symbols in libbpf-in.o (115) does NOT match
        with num of versioned symbols in libbpf.so (114). Please make sure all
        LIBBPF_API symbols are versioned in libbpf.map.
        make: *** [check_abi] Error 1
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      306b267c
    • Andrey Ignatov's avatar
      libbpf: Add version script for DSO · 16192a77
      Andrey Ignatov authored
      More and more projects use libbpf and one day it'll likely be packaged
      and distributed as DSO and that requires ABI versioning so that both
      compatible and incompatible changes to ABI can be introduced in a safe
      way in the future without breaking executables dynamically linked with a
      previous version of the library.
      
      Usual way to do ABI versioning is version script for the linker. Add
      such a script for libbpf. All global symbols currently exported via
      LIBBPF_API macro are added to the version script libbpf.map.
      
      The version name LIBBPF_0.0.1 is constructed from the name of the
      library + version specified by $(LIBBPF_VERSION) in Makefile.
      
      Version script does not duplicate the work done by LIBBPF_API macro, it
      rather complements it. The macro is used at compile time and can be used
      by compiler to do optimization that can't be done at link time, it is
      purely about global symbol visibility. The version script, in turn, is
      used at link time and takes care of ABI versioning. Both techniques are
      described in details in [1].
      
      Whenever ABI is changed in the future, version script should be changed
      appropriately.
      
      [1] https://www.akkadia.org/drepper/dsohowto.pdfSigned-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      16192a77
    • Martin KaFai Lau's avatar
      libbpf: Name changing for btf_get_from_id · 1d2f44ca
      Martin KaFai Lau authored
      s/btf_get_from_id/btf__get_from_id/ to restore the API naming convention.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d2f44ca
    • Alexei Starovoitov's avatar
      Merge branch 'non-jit-btf-func_info' · b89c2998
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      Commit 838e9690 ("bpf: Introduce bpf_func_info")
      added bpf func info support. The userspace is able
      to get better ksym's for bpf programs with jit, and
      is able to print out func prototypes.
      
      For a program containing func-to-func calls, the existing
      implementation returns user specified number of function
      calls and BTF types if jit is enabled. If the jit is not
      enabled, it only returns the type for the main function.
      
      This is undesirable. Interpreter may still be used
      and we should keep feature identical regardless of
      whether jit is enabled or not.
      This patch fixed this discrepancy.
      
      The following example shows bpftool output for
      the bpf program in selftests test_btf_haskv.o when jit
      is disabled:
        $ bpftool prog dump xlated id 1490
        int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
           0: (85) call pc+2#__bpf_prog_run_args32
           1: (b7) r0 = 0
           2: (95) exit
        int test_long_fname_1(struct dummy_tracepoint_args * arg):
           3: (85) call pc+1#__bpf_prog_run_args32
           4: (95) exit
        int test_long_fname_2(struct dummy_tracepoint_args * arg):
           5: (b7) r2 = 0
           6: (63) *(u32 *)(r10 -4) = r2
           7: (79) r1 = *(u64 *)(r1 +8)
           8: (15) if r1 == 0x0 goto pc+9
           9: (bf) r2 = r10
          10: (07) r2 += -4
          11: (18) r1 = map[id:1173]
          13: (85) call bpf_map_lookup_elem#77088
          14: (15) if r0 == 0x0 goto pc+3
          15: (61) r1 = *(u32 *)(r0 +4)
          16: (07) r1 += 1
          17: (63) *(u32 *)(r0 +4) = r1
          18: (95) exit
        $ bpftool prog dump jited id 1490
          no instructions returned
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b89c2998
    • Yonghong Song's avatar
      tools/bpf: change selftest test_btf for both jit and non-jit · 812dd689
      Yonghong Song authored
      The selftest test_btf is changed to test both jit and non-jit.
      The test result should be the same regardless of whether jit
      is enabled or not.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      812dd689
    • Yonghong Song's avatar
      bpf: btf: support proper non-jit func info · ba64e7d8
      Yonghong Song authored
      Commit 838e9690 ("bpf: Introduce bpf_func_info")
      added bpf func info support. The userspace is able
      to get better ksym's for bpf programs with jit, and
      is able to print out func prototypes.
      
      For a program containing func-to-func calls, the existing
      implementation returns user specified number of function
      calls and BTF types if jit is enabled. If the jit is not
      enabled, it only returns the type for the main function.
      
      This is undesirable. Interpreter may still be used
      and we should keep feature identical regardless of
      whether jit is enabled or not.
      This patch fixed this discrepancy.
      
      Fixes: 838e9690 ("bpf: Introduce bpf_func_info")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba64e7d8
  2. 26 Nov, 2018 5 commits
  3. 25 Nov, 2018 10 commits
  4. 24 Nov, 2018 19 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d146194f
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas::
      
       - Fix wrong conflict resolution around CONFIG_ARM64_SSBD
      
       - Fix sparse warning on unsigned long constant
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
        arm64: sysreg: fix sparse warnings
      d146194f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 857fa628
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.
      
       2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.
      
       3) Fix socket wmem accounting in SCTP, from Xin Long.
      
       4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.
      
       5) qed driver passes bytes instead of bits into second arg of
          bitmap_weight(). From Denis Bolotin.
      
       6) Fix reset deadlock in ibmvnic, from Juliet Kim.
      
       7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
          Machata.
      
       8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
          compression during this period, from Eric Dumazet.
      
       9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.
      
      10) Don't leave dangling error pointer if bpf_prog_add() fails in
          thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
          headers, set sq->tso_hdrs to NULL.
      
      11) Fix race condition over state variables in act_police, from Davide
          Caratti.
      
      12) Disable guest csum in the presence of XDP in virtio_net, from Jason
          Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
        net: gemini: Fix copy/paste error
        net: phy: mscc: fix deadlock in vsc85xx_default_config
        dt-bindings: dsa: Fix typo in "probed"
        net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
        net: amd: add missing of_node_put()
        team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
        virtio-net: fail XDP set if guest csum is negotiated
        virtio-net: disable guest csum during XDP set
        net/sched: act_police: add missing spinlock initialization
        net: don't keep lonely packets forever in the gro hash
        net/ipv6: re-do dad when interface has IFF_NOARP flag change
        packet: copy user buffers before orphan or clone
        ibmvnic: Update driver queues after change in ring size support
        ibmvnic: Fix RX queue buffer cleanup
        net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
        net/dim: Update DIM start sample after each DIM iteration
        net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
        net/smc: use after free fix in smc_wr_tx_put_slot()
        net/smc: atomic SMCD cursor handling
        net/smc: add SMC-D shutdown signal
        ...
      857fa628
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · abe72ff4
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Dave and I have continued our work fixing corruption problems that can
        be found when running long-term burn-in exercisers on xfs. Here are
        some patches fixing most of the problems, but there will likely be
        more. :/
      
         - Numerous corruption fixes for copy on write
      
         - Numerous corruption fixes for blocksize < pagesize writes
      
         - Don't miscalculate AG reservations for small final AGs
      
         - Fix page cache truncation to work properly for reflink and extent
           shifting
      
         - Fix use-after-free when retrying failed inode/dquot buffer logging
      
         - Fix corruptions seen when using copy_file_range in directio mode"
      
      * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: readpages doesn't zero page tail beyond EOF
        vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
        iomap: dio data corruption and spurious errors when pipes fill
        iomap: sub-block dio needs to zeroout beyond EOF
        iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
        xfs: delalloc -> unwritten COW fork allocation can go wrong
        xfs: flush removing page cache in xfs_reflink_remap_prep
        xfs: extent shifting doesn't fully invalidate page cache
        xfs: finobt AG reserves don't consider last AG can be a runt
        xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
        xfs: uncached buffer tracing needs to print bno
        xfs: make xfs_file_remap_range() static
        xfs: fix shared extent data corruption due to missing cow reservation
      abe72ff4
    • Andreas Fiedler's avatar
      net: gemini: Fix copy/paste error · 07093b76
      Andreas Fiedler authored
      The TX stats should be started with the tx_stats_syncp,
      there seems to be a copy/paste error in the driver.
      Signed-off-by: default avatarAndreas Fiedler <andreas.fiedler@gmx.net>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07093b76
    • Quentin Schulz's avatar
      net: phy: mscc: fix deadlock in vsc85xx_default_config · 3fa528b7
      Quentin Schulz authored
      The vsc85xx_default_config function called in the vsc85xx_config_init
      function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
      mistakenly calls phy_read and phy_write in-between phy_select_page and
      phy_restore_page.
      
      phy_select_page and phy_restore_page actually take and release the MDIO
      bus lock and phy_write and phy_read take and release the lock to write
      or read to a PHY register.
      
      Let's fix this deadlock by using phy_modify_paged which handles
      correctly a read followed by a write in a non-standard page.
      
      Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
      Signed-off-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa528b7
    • Fabio Estevam's avatar
      dt-bindings: dsa: Fix typo in "probed" · e7b9fb4f
      Fabio Estevam authored
      The correct form is "can be probed", so fix the typo.
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b9fb4f
    • Lorenzo Bianconi's avatar
      net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue · ef2a7cf1
      Lorenzo Bianconi authored
      Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
      since it is used to check if tso dma descriptor queue has been previously
      allocated. The issue can be triggered with the following reproducer:
      
      $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
      $ip link set dev enP2p1s0v0 xdpdrv off
      
      [  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
      [  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
      [  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
      [  341.526654] pc : __vunmap+0x98/0xe0
      [  341.530132] lr : __vunmap+0x98/0xe0
      [  341.533609] sp : ffff00001c5db860
      [  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
      [  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
      [  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
      [  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
      [  341.558117] x21: 0000000000000000 x20: 0000000000000000
      [  341.563418] x19: ffff000017e57000 x18: 0000000000000000
      [  341.568719] x17: 0000000000000000 x16: 0000000000000000
      [  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
      [  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
      [  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
      [  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
      [  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
      [  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
      [  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
      [  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
      [  341.616428] Call trace:
      [  341.618866]  __vunmap+0x98/0xe0
      [  341.621997]  vunmap+0x3c/0x50
      [  341.624961]  arch_dma_free+0x68/0xa0
      [  341.628534]  dma_direct_free+0x50/0x80
      [  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
      [  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
      [  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
      [  341.647066]  __dev_close_many+0x9c/0x108
      [  341.650977]  dev_close_many+0xa4/0x158
      [  341.654720]  rollback_registered_many+0x140/0x530
      [  341.659414]  rollback_registered+0x54/0x80
      [  341.663499]  unregister_netdevice_queue+0x9c/0xe8
      [  341.668192]  unregister_netdev+0x28/0x38
      [  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
      [  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
      [  341.680630]  pci_device_shutdown+0x44/0x88
      [  341.684720]  device_shutdown+0x144/0x250
      [  341.688640]  kernel_restart_prepare+0x44/0x50
      [  341.692986]  kernel_restart+0x20/0x68
      [  341.696638]  __se_sys_reboot+0x210/0x238
      [  341.700550]  __arm64_sys_reboot+0x24/0x30
      [  341.704555]  el0_svc_handler+0x94/0x110
      [  341.708382]  el0_svc+0x8/0xc
      [  341.711252] ---[ end trace 3f4019c8439959c9 ]---
      [  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
      [  341.723872] flags: 0x1fffe000000000()
      [  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
      [  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
      [  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      where xdp_dummy.c is a simple bpf program that forwards the incoming
      frames to the network stack (available here:
      https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Fixes: 4863dea3 ("net: Adding support for Cavium ThunderX network controller")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef2a7cf1
    • YueHaibing's avatar
      ptp: Fix pass zero to ERR_PTR() in ptp_clock_register · aea0a897
      YueHaibing authored
      Fix smatch warning:
      
      drivers/ptp/ptp_clock.c:298 ptp_clock_register() warn:
       passing zero to 'ERR_PTR'
      
      'err' should be set while device_create_with_groups and
      pps_register_source fails
      
      Fixes: 85a66e55 ("ptp: create "pins" together with the rest of attributes")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aea0a897
    • David S. Miller's avatar
      Merge branch 'switchdev-blocking-notifiers' · 06d21290
      David S. Miller authored
      Petr Machata says:
      
      ====================
      switchdev: Convert switchdev_port_obj_{add,del}() to notifiers
      
      An offloading driver may need to have access to switchdev events on
      ports that aren't directly under its control. An example is a VXLAN port
      attached to a bridge offloaded by a driver. The driver needs to know
      about VLANs configured on the VXLAN device. However the VXLAN device
      isn't stashed between the bridge and a front-panel-port device (such as
      is the case e.g. for LAG devices), so the usual switchdev ops don't
      reach the driver.
      
      VXLAN is likely not the only device type like this: in theory any L2
      tunnel device that needs offloading will prompt requirement of this
      sort.
      
      A way to fix this is to give up the notion of port object addition /
      deletion as a switchdev operation, which assumes somewhat tight coupling
      between the message producer and consumer. And instead send the message
      over a notifier chain.
      
      The series starts with a clean-up patch #1, where
      SWITCHDEV_OBJ_PORT_{VLAN, MDB}() are fixed up to lift the constraint
      that the passed-in argument be a simple variable named "obj".
      
      switchdev_port_obj_add and _del are invoked in a context that permits
      blocking. Not only that, at least for the VLAN notification, being able
      to signal failure is actually important. Therefore introduce a new
      blocking notifier chain that the new events will be sent on. That's done
      in patch #2. Retain the current (atomic) notifier chain for the
      preexisting notifications.
      
      In patch #3, introduce two new switchdev notifier types,
      SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
      communicate the same event as the corresponding switchdev op, except in
      a form of a notification. struct switchdev_notifier_port_obj_info was
      added to carry the fields that correspond to the switchdev op arguments.
      An additional field, handled, will be used to communicate back to
      switchdev that the event has reached an interested party, which will be
      important for the two-phase commit.
      
      In patches #4, #5, and #7, rocker, DSA resp. ethsw are updated to
      subscribe to the switchdev blocking notifier chain, and handle the new
      notifier types. #6 introduces a helper to determine whether a
      netdevice corresponds to a front panel port.
      
      What these three drivers have in common is that their ports don't
      support any uppers besides bridge. That makes it possible to ignore any
      notifiers that don't reference a front-panel port device, because they
      are certainly out of scope.
      
      Unlike the previous three, mlxsw and ocelot drivers admit stacked
      devices as uppers. While the current switchdev code recursively descends
      through layers of lower devices, eventually calling the op on a
      front-panel port device, the notifier would reference a stacking device
      that's one of front-panel ports uppers. The filtering is thus more
      complex.
      
      For ocelot, such iteration is currently pretty much required, because
      there's no bookkeeping of LAG devices. mlxsw does keep the list of LAGs,
      however it iterates the lower devices anyway when deciding whether an
      event on a tunnel device pertains to the driver or not.
      
      Therefore this patch set instead introduces, in patch #8, a helper to
      iterate through lowers, much like the current switchdev code does,
      looking for devices that match a given predicate.
      
      Then in patches #9 and #10, first mlxsw and then ocelot are updated to
      dispatch the newly-added notifier types to the preexisting
      port_obj_add/_del handlers. The dispatch is done via the new helper, to
      recursively descend through lower devices.
      
      Finally in patch #11, the actual switch is made, retiring the current
      SDO-based code in favor of a notifier.
      
      Now that the event is distributed through a notifier, the explicit
      netdevice check in rocker, DSA and ethsw doesn't let through any events
      except those done on a front-panel port itself. It is therefore
      unnecessary to check in VLAN-handling code whether a VLAN was added to
      the bridge itself: such events will simply be ignored much sooner.
      Therefore remove it in patch #12.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06d21290
    • Petr Machata's avatar
      rocker, dsa, ethsw: Don't filter VLAN events on bridge itself · ab4a1686
      Petr Machata authored
      Due to an explicit check in rocker_world_port_obj_vlan_add(),
      dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects
      that are added to a device that is not a front-panel port device are
      ignored. Therefore this check is immaterial.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab4a1686
    • Petr Machata's avatar
      switchdev: Replace port obj add/del SDO with a notification · d17d9f5e
      Petr Machata authored
      Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
      this field from all clients, which were migrated to use switchdev
      notification in the previous patches.
      
      Add a new function switchdev_port_obj_notify() that sends the switchdev
      notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.
      
      Update switchdev_port_obj_del_now() to dispatch to this new function.
      Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
      likewise.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d17d9f5e
    • Petr Machata's avatar
      ocelot: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL · 0e332c85
      Petr Machata authored
      Following patches will change the way of distributing port object
      changes from a switchdev operation to a switchdev notifier. The
      switchdev code currently recursively descends through layers of lower
      devices, eventually calling the op on a front-panel port device. The
      notifier will instead be sent referencing the bridge port device, which
      may be a stacking device that's one of front-panel ports uppers, or a
      completely unrelated device.
      
      Dispatch the new events to ocelot_port_obj_add() resp. _del() to
      maintain the same behavior that the switchdev operation based code
      currently has. Pass through switchdev_handle_port_obj_add() / _del() to
      handle the recursive descend, because Ocelot supports LAG uppers.
      
      Register to the new switchdev blocking notifier chain to get the new
      events when they start getting distributed.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e332c85
    • Petr Machata's avatar
      mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL · 52a227b3
      Petr Machata authored
      Following patches will change the way of distributing port object
      changes from a switchdev operation to a switchdev notifier. The
      switchdev code currently recursively descends through layers of lower
      devices, eventually calling the op on a front-panel port device. The
      notifier will instead be sent referencing the bridge port device, which
      may be a stacking device that's one of front-panel ports uppers, or a
      completely unrelated device.
      
      To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
      notifier chain. Dispatch to mlxsw_sp_port_obj_add() resp. _del() to
      maintain the behavior that the switchdev operation based code currently
      has. Defer to switchdev_handle_port_obj_add() / _del() to handle the
      recursive descend, because mlxsw supports a number of upper types.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52a227b3
    • Petr Machata's avatar
      switchdev: Add helpers to aid traversal through lower devices · f30f0601
      Petr Machata authored
      After the transition from switchdev operations to notifier chain (which
      will take place in following patches), the onus is on the driver to find
      its own devices below possible layer of LAG or other uppers.
      
      The logic to do so is fairly repetitive: each driver is looking for its
      own devices among the lowers of the notified device. For those that it
      finds, it calls a handler. To indicate that the event was handled,
      struct switchdev_notifier_port_obj_info.handled is set. The differences
      lie only in what constitutes an "own" device and what handler to call.
      
      Therefore abstract this logic into two helpers,
      switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
      a driver only supports physical ports under a bridge device, it will
      simply avoid this layer of indirection.
      
      One area where this helper diverges from the current switchdev behavior
      is the case of mixed lowers, some of which are switchdev ports and some
      of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
      The helper could do that for lowers for which the passed-in predicate
      doesn't hold. That would however break the case that switchdev ports
      from several different drivers are stashed under one master, a scenario
      that switchdev currently happily supports. Therefore tolerate any and
      all unknown netdevices, whether they are backed by a switchdev driver
      or not.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f30f0601
    • Petr Machata's avatar
      staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL · a39b8888
      Petr Machata authored
      Following patches will change the way of distributing port object
      changes from a switchdev operation to a switchdev notifier. The
      switchdev code currently recursively descends through layers of lower
      devices, eventually calling the op on a front-panel port device. The
      notifier will instead be sent referencing the bridge port device, which
      may be a stacking device that's one of front-panel ports uppers, or a
      completely unrelated device.
      
      ethsw currently doesn't support any uppers other than bridge.
      SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on
      the bridge port device. Thus the only case that a stacked device could
      be validly referenced by port object notifications are bridge
      notifications for VLAN objects added to the bridge itself. But the
      driver explicitly rejects such notifications in port_vlans_add(). It is
      therefore safe to assume that the only interesting case is that the
      notification is on a front-panel port netdevice.
      
      To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
      notifier chain. Dispatch to swdev_port_obj_add() resp. _del() to
      maintain the behavior that the switchdev operation based code currently
      has.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a39b8888
    • Petr Machata's avatar
      staging: fsl-dpaa2: ethsw: Introduce ethsw_port_dev_check() · bb896805
      Petr Machata authored
      ethsw currently uses an open-coded comparison of netdev_ops to determine
      whether whether a device represents a front panel port. Wrap this into a
      named function to simplify reuse.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb896805
    • Petr Machata's avatar
      net: dsa: slave: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL · 2b239f67
      Petr Machata authored
      Following patches will change the way of distributing port object
      changes from a switchdev operation to a switchdev notifier. The
      switchdev code currently recursively descends through layers of lower
      devices, eventually calling the op on a front-panel port device. The
      notifier will instead be sent referencing the bridge port device, which
      may be a stacking device that's one of front-panel ports uppers, or a
      completely unrelated device.
      
      DSA currently doesn't support any other uppers than bridge.
      SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on
      the bridge port device. Thus the only case that a stacked device could
      be validly referenced by port object notifications are bridge
      notifications for VLAN objects added to the bridge itself. But the
      driver explicitly rejects such notifications in dsa_port_vlan_add(). It
      is therefore safe to assume that the only interesting case is that the
      notification is on a front-panel port netdevice. Therefore keep the
      filtering by dsa_slave_dev_check() in place.
      
      To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
      notifier chain. Dispatch to rocker_port_obj_add() resp. _del() to
      maintain the behavior that the switchdev operation based code currently
      has.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b239f67
    • Petr Machata's avatar
      rocker: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL · c6fa35b2
      Petr Machata authored
      Following patches will change the way of distributing port object
      changes from a switchdev operation to a switchdev notifier. The
      switchdev code currently recursively descends through layers of lower
      devices, eventually calling the op on a front-panel port device. The
      notifier will instead be sent referencing the bridge port device, which
      may be a stacking device that's one of front-panel ports uppers, or a
      completely unrelated device.
      
      rocker currently doesn't support any uppers other than bridge. Thus the
      only case that a stacked device could be validly referenced by port
      object notifications are bridge notifications for VLAN objects added to
      the bridge itself. But the driver explicitly rejects such notifications
      in rocker_world_port_obj_vlan_add(). It is therefore safe to assume that
      the only interesting case is that the notification is on a front-panel
      port netdevice.
      
      Subscribe to the blocking notifier chain. In the handler, filter out
      notifications on any foreign netdevices. Dispatch the new notifiers to
      rocker_port_obj_add() resp. _del() to maintain the behavior that the
      switchdev operation based code currently has.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6fa35b2
    • Petr Machata's avatar
      switchdev: Add SWITCHDEV_PORT_OBJ_ADD, SWITCHDEV_PORT_OBJ_DEL · aa4efe21
      Petr Machata authored
      An offloading driver may need to have access to switchdev events on
      ports that aren't directly under its control. An example is a VXLAN port
      attached to a bridge offloaded by a driver. The driver needs to know
      about VLANs configured on the VXLAN device. However the VXLAN device
      isn't stashed between the bridge and a front-panel-port device (such as
      is the case e.g. for LAG devices), so the usual switchdev ops don't
      reach the driver.
      
      VXLAN is likely not the only device type like this: in theory any L2
      tunnel device that needs offloading will prompt requirement of this
      sort. This falsifies the assumption that only the lower devices of a
      front panel port need to be notified to achieve flawless offloading.
      
      A way to fix this is to give up the notion of port object addition /
      deletion as a switchdev operation, which assumes somewhat tight coupling
      between the message producer and consumer. And instead send the message
      over a notifier chain.
      
      To that end, introduce two new switchdev notifier types,
      SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
      communicate the same event as the corresponding switchdev op, except in
      a form of a notification. struct switchdev_notifier_port_obj_info was
      added to carry the fields that the switchdev op carries. An additional
      field, handled, will be used to communicate back to switchdev that the
      event has reached an interested party, which will be important for the
      two-phase commit.
      
      The two switchdev operations themselves are kept in place. Following
      patches first convert individual clients to the notifier protocol, and
      only then are the operations removed.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa4efe21