1. 20 Nov, 2019 40 commits
    • Heiner Kallweit's avatar
      r8169: add check for PHY_MDIO_CHG to rtl_nic_fw_data_ok · df0120f1
      Heiner Kallweit authored
      Only values 0 and 1 are currently defined as parameters for
      PHY_MDIO_CHG. Instead of silently ignoring unknown values and
      misinterpreting the firmware code let's explicitly check.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df0120f1
    • Heiner Kallweit's avatar
      r8169: use macro FIELD_SIZEOF in definition of FW_OPCODE_SIZE · cfccde80
      Heiner Kallweit authored
      Using macro FIELD_SIZEOF makes this define easier understandable.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfccde80
    • Heiner Kallweit's avatar
      r8169: change mdelay to msleep in rtl_fw_write_firmware · e20c43db
      Heiner Kallweit authored
      We're not in atomic context here, therefore switch to msleep.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e20c43db
    • Thomas Bogendoerfer's avatar
      net: ipconfig: Wait for deferred device probes · e2ffe3ff
      Thomas Bogendoerfer authored
      If network device drives are using deferred probing, it was possible
      that waiting for devices to show up in ipconfig was already over,
      when the device eventually showed up. By calling wait_for_device_probe()
      we now make sure deferred probing is done before checking for available
      devices.
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2ffe3ff
    • Mao Wenan's avatar
      vsock/vmci: make vmci_vsock_cb_host_called static · 2be8ca97
      Mao Wenan authored
      When using make C=2 drivers/misc/vmw_vmci/vmci_driver.o
      to compile, below warning can be seen:
      drivers/misc/vmw_vmci/vmci_driver.c:33:6: warning:
      symbol 'vmci_vsock_cb_host_called' was not declared. Should it be static?
      
      This patch make symbol vmci_vsock_cb_host_called static.
      
      Fixes: b1bba80a ("vsock/vmci: register vmci_transport only when VMCI guest/host are active")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2be8ca97
    • David S. Miller's avatar
      Merge branch 'page_pool-DMA-sync' · e07e7541
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      add DMA-sync-for-device capability to page_pool API
      
      Introduce the possibility to sync DMA memory for device in the page_pool API.
      This feature allows to sync proper DMA size and not always full buffer
      (dma_sync_single_for_device can be very costly).
      Please note DMA-sync-for-CPU is still device driver responsibility.
      Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of
      about 170Kpps:
      
      - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
      - XDP_DROP DMA sync managed by page_pool API:	~585Kpps
      
      Do not change naming convention for the moment since the changes will hit other
      drivers as well. I will address it in another series.
      
      Changes since v4:
      - do not allow the driver to set max_len to 0
      - convert PP_FLAG_DMA_MAP/PP_FLAG_DMA_SYNC_DEV to BIT() macro
      
      Changes since v3:
      - move dma_sync_for_device before putting the page in ptr_ring in
        __page_pool_recycle_into_ring since ptr_ring can be consumed
        concurrently. Simplify the code moving dma_sync_for_device
        before running __page_pool_recycle_direct/__page_pool_recycle_into_ring
      
      Changes since v2:
      - rely on PP_FLAG_DMA_SYNC_DEV flag instead of dma_sync
      
      Changes since v1:
      - rename sync in dma_sync
      - set dma_sync_size to 0xFFFFFFFF in page_pool_recycle_direct and
        page_pool_put_page routines
      - Improve documentation
      ====================
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e07e7541
    • Lorenzo Bianconi's avatar
      net: mvneta: get rid of huge dma sync in mvneta_rx_refill · 07e13edb
      Lorenzo Bianconi authored
      Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
      since now the driver can let page_pool API to manage needed DMA
      sync with a proper size.
      
      - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
      - XDP_DROP DMA sync managed by page_pool API:	~585Kpps
      Tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07e13edb
    • Lorenzo Bianconi's avatar
      net: page_pool: add the possibility to sync DMA memory for device · e68bc756
      Lorenzo Bianconi authored
      Introduce the following parameters in order to add the possibility to sync
      DMA memory for device before putting allocated pages in the page_pool
      caches:
      - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
        the driver gets from page_pool will be DMA-synced-for-device according
        to the length provided by the device driver. Please note DMA-sync-for-CPU
        is still device driver responsibility
      - offset: DMA address offset where the DMA engine starts copying rx data
      - max_len: maximum DMA memory size page_pool is allowed to flush. This
        is currently used in __page_pool_alloc_pages_slow routine when pages
        are allocated from page allocator
      These parameters are supposed to be set by device drivers.
      
      This optimization reduces the length of the DMA-sync-for-device.
      The optimization is valid because pages are initially
      DMA-synced-for-device as defined via max_len. At RX time, the driver
      will perform a DMA-sync-for-CPU on the memory for the packet length.
      What is important is the memory occupied by packet payload, because
      this is the area CPU is allowed to read and modify. As we don't track
      cache-lines written into by the CPU, simply use the packet payload length
      as dma_sync_size at page_pool recycle time. This also take into account
      any tail-extend.
      Tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68bc756
    • Lorenzo Bianconi's avatar
      net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp · f383b295
      Lorenzo Bianconi authored
      Rely on page_pool_recycle_direct and not on xdp_return_buff in
      mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
      to the one strictly necessary
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f383b295
    • Gautam Ramakrishnan's avatar
      net: sched: pie: enable timestamp based delay calculation · cec2975f
      Gautam Ramakrishnan authored
      RFC 8033 suggests an alternative approach to calculate the queue
      delay in PIE by using a timestamp on every enqueued packet. This
      patch adds an implementation of that approach and sets it as the
      default method to calculate queue delay. The previous method (based
      on Little's law) to calculate queue delay is set as optional.
      Signed-off-by: default avatarGautam Ramakrishnan <gautamramk@gmail.com>
      Signed-off-by: default avatarLeslie Monis <lesliemonis@gmail.com>
      Signed-off-by: default avatarMohit P. Tahiliani <tahiliani@nitk.edu.in>
      Acked-by: default avatarDave Taht <dave.taht@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cec2975f
    • Krzysztof Kozlowski's avatar
      isdn: Fix Kconfig indentation · f01b437d
      Krzysztof Kozlowski authored
      Adjust indentation from spaces to tab (+optional two spaces) as in
      coding style with command like:
      	$ sed -e 's/^        /\t/' -i */Kconfig
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f01b437d
    • Krzysztof Kozlowski's avatar
      nfc: Fix Kconfig indentation · 041ccdb6
      Krzysztof Kozlowski authored
      Adjust indentation from spaces to tab (+optional two spaces) as in
      coding style with command like:
      	$ sed -e 's/^        /\t/' -i */Kconfig
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      041ccdb6
    • David S. Miller's avatar
      Merge branch 'cxgb4-add-TC-MATCHALL-classifier-offload' · 07def463
      David S. Miller authored
      Rahul Lakkireddy says:
      
      ====================
      cxgb4: add TC-MATCHALL classifier offload
      
      This series of patches add support to offload TC-MATCHALL classifier
      to hardware to classify all outgoing and incoming traffic on the
      underlying port. Only 1 egress and 1 ingress rule each can be
      offloaded on the underlying port.
      
      Patch 1 adds support for TC-MATCHALL classifier offload on the egress
      side. TC-POLICE is the only action that can be offloaded on the egress
      side and is used to rate limit all outgoing traffic to specified max
      rate.
      
      Patch 2 adds logic to reject the current rule offload if its priority
      conflicts with existing rules in the TCAM.
      
      Patch 3 adds support for TC-MATCHALL classifier offload on the ingress
      side. The same set of actions supported by existing TC-FLOWER
      classifier offload can be applied on all the incoming traffic.
      
      v5:
      - Fixed commit message and comment to include comparison for equal
        priority in patch 2.
      
      v4:
      - Removed check in patch 1 to reject police offload if prio is not 1.
      - Moved TC_SETUP_BLOCK code to separate function in patch 1.
      - Added logic to ensure the prio passed by TC doesn't conflict with
        other rules in TCAM in patch 2.
      - Higher index has lower priority than lower index in TCAM. So, rework
        cxgb4_get_free_ftid() to search free index from end of TCAM in
        descending order in patch 2.
      - Added check to ensure the matchall rule's prio doesn't conflict with
        other rules in TCAM in patch 3.
      - Added logic to fill default mask for VIID, if none has been
        provided, to prevent conflict with duplicate VIID rules in patch 3.
      - Used existing variables in private structure to fill VIID info,
        instead of extracting the info manually in patch 3.
      
      v3:
      - Added check in patch 1 to reject police offload if prio is not 1.
      - Assign block_shared variable only for TC_SETUP_BLOCK in patch 1.
      
      v2:
      - Added check to reject flow block sharing for policers in patch 1.
      - Removed logic to fetch free index from end of TCAM in patch 2.
        Must maintain the same ordering as in kernel.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07def463
    • Rahul Lakkireddy's avatar
      cxgb4: add TC-MATCHALL classifier ingress offload · 21c4c60b
      Rahul Lakkireddy authored
      Add TC-MATCHALL classifier ingress offload support. The same actions
      supported by existing TC-FLOWER offload can be applied to all incoming
      traffic on the underlying interface.
      
      Ensure the rule priority doesn't conflict with existing rules in the
      TCAM. Only 1 ingress matchall rule can be active at a time on the
      underlying interface.
      
      v5:
      - No change.
      
      v4:
      - Added check to ensure the matchall rule's prio doesn't conflict with
        other rules in TCAM.
      - Added logic to fill default mask for VIID, if none has been
        provided, to prevent conflict with duplicate VIID rules.
      - Used existing variables in private structure to fill VIID info,
        instead of extracting the info manually.
      
      v3:
      - No change.
      
      v2:
      - Removed logic to fetch free index from end of TCAM. Must maintain
        same ordering as in kernel.
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21c4c60b
    • Rahul Lakkireddy's avatar
      cxgb4: check rule prio conflicts before offload · 41ec03e5
      Rahul Lakkireddy authored
      Only offload rule if it satisfies both of the following conditions:
      1. The immediate previous rule has priority <= current rule's priority.
      2. The immediate next rule has priority >= current rule's priority.
      
      Also rework free entry fetch logic to search from end of TCAM, instead
      of beginning, because higher indices have lower priority than lower
      indices. This is similar to how TC auto generates priority values.
      
      v5:
      - Fixed commit message and comment to include comparison for equal
        priority.
      
      v4:
      - Patch added in this version.
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41ec03e5
    • Rahul Lakkireddy's avatar
      cxgb4: add TC-MATCHALL classifier egress offload · 4ec4762d
      Rahul Lakkireddy authored
      Add TC-MATCHALL classifier offload with TC-POLICE action applied for
      all outgoing traffic on the underlying interface. Split flow block
      offload to support both egress and ingress classification.
      
      For example, to rate limit all outgoing traffic to 1 Gbps:
      
      $ tc qdisc add dev enp2s0f4 clsact
      $ tc filter add dev enp2s0f4 egress matchall skip_sw \
      	action police rate 1Gbit burst 8Kbit
      
      Note that skip_sw is important. Otherwise, both stack and hardware
      will end up doing policing. Policing can't be shared across flow
      blocks. Only 1 egress matchall rule can be active at a time on the
      underlying interface.
      
      v5:
      - No change.
      
      v4:
      - Removed check to reject police offload if prio is not 1.
      - Moved TC_SETUP_BLOCK code to separate function.
      
      v3:
      - Added check to reject police offload if prio is not 1.
      - Assign block_shared variable only for TC_SETUP_BLOCK.
      
      v2:
      - Added check to reject flow block sharing for policers.
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ec4762d
    • David S. Miller's avatar
      Merge branch 'page_pool-API-for-numa-node-change-handling' · 77c05d2f
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      page_pool: API for numa node change handling
      
      This series extends page pool API to allow page pool consumers to update
      page pool numa node on the fly. This is required since on some systems,
      rx rings irqs can migrate between numa nodes, due to irq balancer or user
      defined scripts, current page pool has no way to know of such migration
      and will keep allocating and holding on to pages from a wrong numa node,
      which is bad for the consumer performance.
      
      1) Add API to update numa node id of the page pool
      Consumers will call this API to update the page pool numa node id.
      
      2) Don't recycle non-reusable pages:
      Page pool will check upon page return whether a page is suitable for
      recycling or not.
       2.1) when it belongs to a different num node.
       2.2) when it was allocated under memory pressure.
      
      3) mlx5 will use the new API to update page pool numa id on demand.
      
      The series is a joint work between me and Jonathan, we tested it and it
      proved itself worthy to avoid page allocator bottlenecks and improve
      packet rate and cpu utilization significantly for the described
      scenarios above.
      
      Performance testing:
      XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
      while migrating rx ring irq from close to far numa:
      
      mlx5 internal page cache was locally disabled to get pure page pool
      results.
      
      CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
      NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
      
      XDP Drop/TX single core:
      NUMA  | XDP  | Before    | After
      ---------------------------------------
      Close | Drop | 11   Mpps | 10.9 Mpps
      Far   | Drop | 4.4  Mpps | 5.8  Mpps
      
      Close | TX   | 6.5 Mpps  | 6.5 Mpps
      Far   | TX   | 3.5 Mpps  | 4   Mpps
      
      Improvement is about 30% drop packet rate, 15% tx packet rate for numa
      far test.
      No degradation for numa close tests.
      
      TCP single/multi cpu/stream:
      NUMA  | #cpu | Before  | After
      --------------------------------------
      Close | 1    | 18 Gbps | 18 Gbps
      Far   | 1    | 15 Gbps | 18 Gbps
      Close | 12   | 80 Gbps | 80 Gbps
      Far   | 12   | 68 Gbps | 80 Gbps
      
      In all test cases we see improvement for the far numa case, and no
      impact on the close numa case.
      
      ==================
      
      Performance analysis and conclusions by Jesper [1]:
      Impact on XDP drop x86_64 is inconclusive and shows only 0.3459ns
      slow-down, as this is below measurement accuracy of system.
      
      v2->v3:
       - Rebase on top of latest net-next and Jesper's page pool object
         release patchset [2]
       - No code changes
       - Performance analysis by Jesper added to the cover letter.
      
      v1->v2:
        - Drop last patch, as requested by Ilias and Jesper.
        - Fix documentation's performance numbers order.
      
      [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool04_inflight_changes.org#performance-notes
      [2] https://patchwork.ozlabs.org/cover/1192098/
      ====================
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77c05d2f
    • Saeed Mahameed's avatar
      net/mlx5e: Rx, Update page pool numa node when changed · 6849c6d8
      Saeed Mahameed authored
      Once every napi poll cycle, check if numa node is different than
      the page pool's numa id, and update it using page_pool_update_nid().
      
      Alternatively, we could have registered an irq affinity change handler,
      but page_pool_update_nid() must be called from napi context anyways, so
      the handler won't actually help.
      
      Performance testing:
      XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
      while migrating rx ring irq from close to far numa:
      
      mlx5 internal page cache was locally disabled to get pure page pool
      results.
      
      CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
      NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
      
      XDP Drop/TX single core:
      NUMA  | XDP  | Before    | After
      ---------------------------------------
      Close | Drop | 11   Mpps | 10.9 Mpps
      Far   | Drop | 4.4  Mpps | 5.8  Mpps
      
      Close | TX   | 6.5 Mpps  | 6.5 Mpps
      Far   | TX   | 3.5 Mpps  | 4  Mpps
      
      Improvement is about 30% drop packet rate, 15% tx packet rate for numa
      far test.
      No degradation for numa close tests.
      
      TCP single/multi cpu/stream:
      NUMA  | #cpu | Before  | After
      --------------------------------------
      Close | 1    | 18 Gbps | 18 Gbps
      Far   | 1    | 15 Gbps | 18 Gbps
      Close | 12   | 80 Gbps | 80 Gbps
      Far   | 12   | 68 Gbps | 80 Gbps
      
      In all test cases we see improvement for the far numa case, and no
      impact on the close numa case.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6849c6d8
    • Saeed Mahameed's avatar
      page_pool: Don't recycle non-reusable pages · d5394610
      Saeed Mahameed authored
      A page is NOT reusable when at least one of the following is true:
      1) allocated when system was under some pressure. (page_is_pfmemalloc)
      2) belongs to a different NUMA node than pool->p.nid.
      
      To update pool->p.nid users should call page_pool_update_nid().
      
      Holding on to such pages in the pool will hurt the consumer performance
      when the pool migrates to a different numa node.
      
      Performance testing:
      XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
      while migrating rx ring irq from close to far numa:
      
      mlx5 internal page cache was locally disabled to get pure page pool
      results.
      
      CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
      NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
      
      XDP Drop/TX single core:
      NUMA  | XDP  | Before    | After
      ---------------------------------------
      Close | Drop | 11   Mpps | 10.9 Mpps
      Far   | Drop | 4.4  Mpps | 5.8  Mpps
      
      Close | TX   | 6.5 Mpps  | 6.5 Mpps
      Far   | TX   | 3.5 Mpps  | 4  Mpps
      
      Improvement is about 30% drop packet rate, 15% tx packet rate for numa
      far test.
      No degradation for numa close tests.
      
      TCP single/multi cpu/stream:
      NUMA  | #cpu | Before  | After
      --------------------------------------
      Close | 1    | 18 Gbps | 18 Gbps
      Far   | 1    | 15 Gbps | 18 Gbps
      Close | 12   | 80 Gbps | 80 Gbps
      Far   | 12   | 68 Gbps | 80 Gbps
      
      In all test cases we see improvement for the far numa case, and no
      impact on the close numa case.
      
      The impact of adding a check per page is very negligible, and shows no
      performance degradation whatsoever, also functionality wise it seems more
      correct and more robust for page pool to verify when pages should be
      recycled, since page pool can't guarantee where pages are coming from.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5394610
    • Saeed Mahameed's avatar
      page_pool: Add API to update numa node · bc836748
      Saeed Mahameed authored
      Add page_pool_update_nid() to be called by page pool consumers when they
      detect numa node changes.
      
      It will update the page pool nid value to start allocating from the new
      effective numa node.
      
      This is to mitigate page pool allocating pages from a wrong numa node,
      where the pool was originally allocated, and holding on to pages that
      belong to a different numa node, which causes performance degradation.
      
      For pages that are already being consumed and could be returned to the
      pool by the consumer, in next patch we will add a check per page to avoid
      recycling them back to the pool and return them to the page allocator.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc836748
    • David S. Miller's avatar
      Merge branch 'cpsw-switchdev' · 1f12177b
      David S. Miller authored
      Grygorii Strashko says:
      
      ====================
      net: ethernet: ti: introduce new cpsw switchdev based driver
      
      Thank you All for review of v6.
      
      There are no significant changes in this version, just fixed comments to v6.
      
      --- v6
      The major change in this version is DT bindings conversation to json-schema, and
      fixed other comments to v5. Also added patch to clean up ALE on init and netif
      restart.
      
      --- v5
      The major part of work done in this iteration is rebasing on top of net-next
      with XDP series from Ivan Khoronzhuk [3], and enable XDP support in the new
      CPSW switchdev driver (it was little bit painful ;(). There are mostly no
      functional changes in new CPSW driver, just few fixes, sync with old driver
      and cleanups/optimizations. So, I've kept rest of cover letter unchanged.
      
      ---
      This series originally based on work [1][2] done by
      Ilias Apalodimas <ilias.apalodimas@linaro.org>.
      
      This the RFC v5 which introduces new CPSW switchdev based driver which is
      operating in dual-emac mode by default, thus working as 2 individual
      network interfaces. The Switch mode can be enabled by configuring devlink driver
      parameter "switch_mode" to 1/true:
      	devlink dev param set platform/48484000.switch \
      	name switch_mode value 1 cmode runtime
      This can be done regardless of the state of Port's netdev devices - UP/DOWN, but
      Port's netdev devices have to be in UP before joining the bridge to avoid
      overwriting of bridge configuration as CPSW switch driver completely reloads its
      configuration when first Port changes its state to UP.
      When the both interfaces joined the bridge - CPSW switch driver will start
      marking packets with offload_fwd_mark flag unless "ale_bypass=0".
      All configuration is implemented via switchdev API.
      
      The previous solution of tracking both Ports joined the bridge
      (from netdevice_notifier) proved to be not correct as changing CPSW switch
      driver mode required cleanup of ALE table and CPSW settings which happens
      while second Port is joined bridge and as result configuration loaded
      by bridge for the first Port became corrupted.
      
      The introduction of the new CPSW switchdev based driver (cpsw_new.c) is split
      on two parts: Part 1 - basic dual-emac driver; Part 2 switchdev support.
      Such approach has simplified code development and testing alot. And, I hope,
      it will help with better review.
      
      patches #1 - 5: preparation patches which also moves common code to cpsw_priv.c
      patches #6 - 9: Introduce TI CPSW switch driver based on switchdev and new
       DT bindings
      patch #10: new CPSW switchdev driver documentation
      patch #11: adds DT nodes for new CPSW switchdev driver added for DRA7 SoC
      patch #12: adds DT nodes for new cpsw switchdev driver for am571x-idk board
      patch #13: enables build of TI CPSW driver
      
      Most of the contents of the previous cover-letter have been added in
      new driver documentation, so please refer to that for configuration,
      testing and future work.
      
      These patches can be found at (branch contains some additional patches required
      for testing on top of net-next):
       https://github.com/grygoriyS/linux.git
       branch: lkml-5.4-switch-tbd-v7
      
      changes in v7:
       - patch 2: added check for devm_kmalloc_array() return value
       - patch 6: fixed comments
      
      changes in v6: https://lkml.org/lkml/2019/11/9/108
       - DT bindings converted to json-schema
       - netdev initialization is split on creation and registration.
         The netdevs registration happens now at the end of the pobe.
       - reworked cpsw_set_pauseparam() to use PHYlib APIs.
       - other comments for v5 fixed
      
      v5: https://patchwork.kernel.org/cover/11208785/
       - rebase on top of net-next with XDP series from Ivan Khoronzhuk [3],
         and enable XDP support in the new CPSW switchdev driver
         cpsw driver (tested XDP_DROP only)
       - sync with old cpsw driver
       - implement comments from  Ivan Khoronzhuk and Rob Herring
       - fixed "NETDEV WATCHDOG: .." warning after interface after interface UP/DOWN,
         missed TX wake in cpsw_adjust_link()
      
      v4: https://patchwork.kernel.org/cover/11010523/
       - finished split of common CPSW code
       - added devlink support
       - changed CPSW mode configuration approach: from netdevice_notifier to devlink
         parameter
       - refactor and clean up ALE changes which allows to modify VLANs/MDBs entries
       - added missed support for port QDISC_CBS and QDISC_MQPRIO
       - the CPSW is split on two parts: basic dual_mac driver and switchdev support
       - added missed callback .ndo_get_port_parent_id()
       - reworked ingress frames marking in switch mode (offload_fwd_mark)
       - applied comments from Andrew Lunn
      
      v3: https://lwn.net/Articles/786677/
      Changes in v3:
      - alot of work done to split properly common code between legacy and switchdev
        CPSW drivers and clean up code
      - CPSW switchdev interface updated to the current LKML switchdev interface
      - actually new CPSW switchdev based driver introduced
      - optimized dual_mac mode in new driver. Main change is that in promiscuous
      mode P0_UNI_FLOOD (both ports) is enabled in addition to ALLMULTI (current
      port) instead of ALE_BYPASS.  So, port in non promiscuous mode will keep
      possibility of mcast and vlan filtering.
      - changed bridge join sequnce: now switch mode will be enabled only when
      both ports joined the bridge. CPSW will be switched to dual_mac mode if any
      port leave bridge. ALE table is completly cleared and then refiled while
      switching to switch mode - this simplidies code a lot, but introduces some
      limitation to bridge setup sequence:
       ip link add name br0 type bridge
       ip link set dev br0 type bridge ageing_time 1000
       ip link set dev br0 type bridge vlan_filtering 0 <- disable
       echo 0 > /sys/class/net/br0/bridge/default_vlan
      
       ip link set dev sw0p1 up <- add ports
       ip link set dev sw0p2 up
       ip link set dev sw0p1 master br0
       ip link set dev sw0p2 master br0
      
       echo 1 > /sys/class/net/br0/bridge/default_vlan <- enable
       ip link set dev br0 type bridge vlan_filtering 1
       bridge vlan add dev br0 vid 1 pvid untagged self
      - STP tested with vlan_filtering 1/0. To make STP work I've had to set
        NO_SA_UPDATE for all slave ports (see comment in code). It also required to
        statically register STP mcast address {0x01, 0x80, 0xc2, 0x0, 0x0, 0x0};
      - allowed build both TI_CPSW and TI_CPSW_SWITCHDEV drivers
      - PTP can be enabled on both ports in dual_mac mode
      
      [1] https://patchwork.ozlabs.org/cover/929367/
      [2] https://patches.linaro.org/cover/136709/
      [3] https://patchwork.kernel.org/cover/11035813/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f12177b
    • Grygorii Strashko's avatar
      arm: omap2plus_defconfig: enable new cpsw switchdev driver · 3727d259
      Grygorii Strashko authored
      Add CONFIG_TI_CPSW_SWITCHDEV option to enable new cpsw switchdev driver
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3727d259
    • Grygorii Strashko's avatar
      ARM: dts: am571x-idk: enable for new cpsw switch dev driver · 15b991ad
      Grygorii Strashko authored
      Add DT nodes for new cpsw switchdev driver for am571x-idk board for now to
      enable testing of the new solution.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15b991ad
    • Grygorii Strashko's avatar
      ARM: dts: dra7: add dt nodes for new cpsw switch dev driver · 39331a49
      Grygorii Strashko authored
      Add DT nodes for new cpsw switch dev driver.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39331a49
    • Ilias Apalodimas's avatar
      Documentation: networking: add cpsw switchdev based driver documentation · 14c815a9
      Ilias Apalodimas authored
      A new cpsw dirver based on switchdev was added. Add documentation about
      basic configuration and future features
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14c815a9
    • Grygorii Strashko's avatar
      phy: ti: phy-gmii-sel: dependency from ti cpsw-switchdev driver · da84e50c
      Grygorii Strashko authored
      Add dependency from TI_CPSW_SWITCHDEV.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da84e50c
    • Ilias Apalodimas's avatar
      net: ethernet: ti: introduce cpsw switchdev based driver part 2 - switch · 111cf1ab
      Ilias Apalodimas authored
      CPSW switchdev based driver which is operating in dual-emac mode by
      default, thus working as 2 individual network interfaces. The Switch mode
      can be enabled by configuring devlink driver parameter "switch_mode" to 1:
      
      	devlink dev param set platform/48484000.switch \
      	name switch_mode value 1 cmode runtime
      
      This can be done regardless of the state of Port's netdevs - UP/DOWN, but
      Port's netdev devices have to be UP before joining the bridge to avoid
      overwriting of bridge configuration as CPSW switch driver completely
      reloads its configuration when first Port changes its state to UP.
      When the both interfaces joined the bridge - CPSW switch driver will start
      marking packets with offload_fwd_mark flag unless "ale_bypass=0".
      
      All configuration is implemented via switchdev API and notifiers.
      Supported:
       - SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS
       - SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: BR_MCAST_FLOOD
       - SWITCHDEV_ATTR_ID_PORT_STP_STATE
       - SWITCHDEV_OBJ_ID_PORT_VLAN
       - SWITCHDEV_OBJ_ID_PORT_MDB
       - SWITCHDEV_OBJ_ID_HOST_MDB
      
      Hence CPSW switchdev driver supports:
      - FDB offloading
      - MDB offloading
      - VLAN filtering and offloading
      - STP
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      111cf1ab
    • Ilias Apalodimas's avatar
      net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac · ed3525ed
      Ilias Apalodimas authored
      Part 1:
       Introduce basic CPSW dual_mac driver (cpsw_new.c) which is operating in
      dual-emac mode by default, thus working as 2 individual network interfaces.
      Main differences from legacy CPSW driver are:
      
       - optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in
      addition to ALLMULTI (current port) instead of ALE_BYPASS. So, Ports in
      promiscuous mode will keep possibility of mcast and vlan filtering, which
      is provides significant benefits when ports are joined to the same bridge,
      but without enabling "switch" mode, or to different bridges.
       - learning disabled on ports as it make not too much sense for
         segregated ports - no forwarding in HW.
       - enabled basic support for devlink.
      
      	devlink dev show
      		platform/48484000.switch
      
      	devlink dev param show
      	 platform/48484000.switch:
      	name ale_bypass type driver-specific
      	 values:
      		cmode runtime value false
      
       - "ale_bypass" devlink driver parameter allows to enable
      ALE_CONTROL(4).BYPASS mode for debug purposes.
       - updated DT bindings.
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed3525ed
    • Grygorii Strashko's avatar
      dt-bindings: net: ti: add new cpsw switch driver bindings · ef63fe72
      Grygorii Strashko authored
      Add bindings for the new TI CPSW switch driver. Comparing to the legacy
      bindings (net/cpsw.txt):
      - ports definition follows DSA bindings (net/dsa/dsa.txt) and ports can be
      marked as "disabled" if not physically wired.
      - all deprecated properties dropped;
      - all legacy propertiies dropped which represent constant HW cpapbilities
      (cpdma_channels, ale_entries, bd_ram_size, mac_control, slaves,
      active_slave)
      - TI CPTS DT properties are reused as is, but grouped in "cpts" sub-node
      - TI Davinci MDIO DT bindings are reused as is, because Davinci MDIO is
      reused.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef63fe72
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: move set of common functions in cpsw_priv · c5013ac1
      Grygorii Strashko authored
      As a preparatory patch to add support for a switchdev based cpsw driver,
      move common functions to cpsw-priv.c so that they can be used across both
      drivers.
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5013ac1
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: resolve build deps of cpsw drivers · 51a95337
      Grygorii Strashko authored
      A following patches introduce new CPSW switchdev driver which uses common
      code with legacy CPSW driver. This will introduce build dependency between
      CPSW switchdev and CPSW legacy drivers related to for_each_slave() and
      cpsw_slave_index() - they can be compiled both, but only one of them will
      be not functional depending in Kconfig settings due to duffrences in Slave
      Ports indexes calculation.
      
      To fix this make for_each_slave() local (it's used now only by legacy CPSW
      driver) and convert cpsw_slave_index() to be a function pointer which is
      assigned in probe. Driver to probe is defined by DT.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51a95337
    • Ilias Apalodimas's avatar
      net: ethernet: ti: ale: modify vlan/mdb api for switchdev · e85c1437
      Ilias Apalodimas authored
      A following patch introduces switchdev functionality, so modify
      ALE engine VLANs/MDBs API:
      - cpsw_ale_del_mcast(): update so it will remove only selected ports from
      mcast port_mask or delete whole mcast record if !port_mask
      - cpsw_ale_del_vlan(): update so it will remove only selected ports from
      all VLAN record's masks or delete whole VLAN record if !port_mask
      - add cpsw_ale_vlan_add_modify() to add or modify existing VLAN record's
      masks
      - add cpsw_ale_set_unreg_mcast() for enabling unreg mcast on port VLANs
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e85c1437
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: allow untagged traffic on host port · 4b41d343
      Grygorii Strashko authored
      Now untagged vlan traffic is not support on Host P0 port. This patch adds
      in ALE context bitmap of VLANs for which Host P0 port bit set in Force
      Untagged Packet Egress bitmask in VLANs ALE entries, and adds corresponding
      check in VLAN incapsulation header parsing function cpsw_rx_vlan_encap().
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b41d343
    • Grygorii Strashko's avatar
      net: ethernet: ti: ale: clean ale tbl on init and intf restart · 7fe579df
      Grygorii Strashko authored
      Clean CPSW ALE on init and intf restart (up/down) to avoid reading obsolete
      or garbage entries from ALE table.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fe579df
    • David S. Miller's avatar
      Merge branch 'nf_tables_offload-vlan-matching-support' · b9242da6
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      nf_tables_offload: vlan matching support
      
      The following patchset contains Netfilter support for vlan matching
      offloads:
      
      1) Constify nft_reg_load() as a preparation patch.
      2) Restrict rule matching to ingress interface type ARPHRD_ETHER.
      3) Add new vlan_tci field to flow_dissector_key_vlan structure,
         to allow to set up vlan_id, vlan_dei and vlan_priority in one go.
      4) C-VLAN matching support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9242da6
    • Pablo Neira Ayuso's avatar
      netfilter: nft_payload: add C-VLAN offload support · 89d8fd44
      Pablo Neira Ayuso authored
      Match on h_vlan_encapsulated_proto and set up protocol dependency. Check
      for protocol dependency before accessing the tci field. Allow to match
      on the encapsulated ethertype too.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89d8fd44
    • Pablo Neira Ayuso's avatar
      netfilter: nft_payload: add VLAN offload support · a82055af
      Pablo Neira Ayuso authored
      Match on ethertype and set up protocol dependency. Check for protocol
      dependency before accessing the tci field. Allow to match on the
      encapsulated ethertype too.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a82055af
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables_offload: allow ethernet interface type only · 8819efc9
      Pablo Neira Ayuso authored
      Hardware offload support at this stage assumes an ethernet device in
      place. The flow dissector provides the intermediate representation to
      express this selector, so extend it to allow to store the interface
      type. Flower does not uses this, so skb_flow_dissect_meta() is not
      extended to match on this new field.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8819efc9
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: constify nft_reg_load{8, 16, 64}() · 7cd9a58d
      Pablo Neira Ayuso authored
      This patch constifies the pointer to source register data that is passed
      as an input parameter.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cd9a58d
    • Xin Long's avatar
      lwtunnel: add support for multiple geneve opts · 2f1d370b
      Xin Long authored
      geneve RFC (draft-ietf-nvo3-geneve-14) allows a geneve packet to carry
      multiple geneve opts, so it's necessary for lwtunnel to support adding
      multiple geneve opts in one lwtunnel route. But vxlan and erspan opts
      are still only allowed to add one option.
      
      With this patch, iproute2 could make it like:
      
        # ip r a 1.1.1.0/24 encap ip id 1 geneve_opts 0:0:12121212,1:2:12121212 \
          dst 10.1.0.2 dev geneve1
      
        # ip r a 1.1.1.0/24 encap ip id 1 vxlan_opts 456 \
          dst 10.1.0.2 dev erspan1
      
        # ip r a 1.1.1.0/24 encap ip id 1 erspan_opts 1:123:0:0 \
          dst 10.1.0.2 dev erspan1
      
      Which are pretty much like cls_flower and act_tunnel_key.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f1d370b