- 15 Oct, 2021 40 commits
-
-
Maciej Fijalkowski authored
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Maciej Fijalkowski authored
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
David S. Miller authored
Ioana Ciornei says: ==================== dpaa2-eth: add support for IRQ coalescing This patch set adds support for interrupts coalescing in dpaa2-eth. The first patches add support for the hardware level configuration of the IRQ coalescing in the dpio driver, while the ones that touch the dpaa2-eth driver are responsible for the ethtool user interraction. With the adaptive IRQ coalescing in place and enabled we have observed the following changes in interrupt rates on one A72 core @2.2GHz (LX2160A) while running a Rx TCP flow. The TCP stream is sent on a 10Gbit link and the only cpu that does Rx is fully utilized. IRQ rate (irqs / sec) before: 4.59 Gbits/sec 24k after: 5.67 Gbits/sec 1.3k ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Add support for adaptive interrupt coalescing to the dpaa2-eth driver. First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported coalesce parameter and the requested state is configured through the dpio APIs added in the previous patch. Besides the ethtool API interaction, we keep track of how many bytes and frames are dequeued per CDAN (Channel Data Availability Notification) and update the Net DIM instance through the dpaa2_io_update_net_dim() API. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Use the generic dynamic interrupt moderation (dim) framework to implement adaptive interrupt coalescing on Rx. With the per-packet interrupt scheme, a high interrupt rate has been noted for moderate traffic flows leading to high CPU utilization. The dpio driver exports new functions to enable/disable adaptive IRQ coalescing on a DPIO object, to query the state or to update Net DIM with a new set of bytes and frames dequeued. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Use the newly exported dpio driver API to manually configure the IRQ coalescing parameters requested by the user. The .get_coalesce() and .set_coalesce() net_device callbacks are implemented and directly export or setup the rx-usecs on all the channels configured. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
In DPAA2 based SoCs, the IRQ coalesing support per software portal has 2 configurable parameters: - the IRQ timeout period (QBMAN_CINH_SWP_ITPR): how many 256 QBMAN cycles need to pass until a dequeue interrupt is asserted. - the IRQ threshold (QBMAN_CINH_SWP_DQRR_ITR): how many dequeue responses in the DQRR ring would generate an IRQ. Add support for setting up and querying these IRQ coalescing related parameters. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Through the dpio_get_attributes() firmware call the dpio driver has access to the QBMAN clock frequency. Extend the structure which holds the firmware's response so that we can have access to this information. This will be needed in the next patches which also add support for interrupt coalescing which needs to be configured based on the frequency. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== net/sched: implement L4S style ce_threshold_ect1 marking As suggested by Ingemar Johansson, Neal Cardwell, and others, fq_codel can be used for Low Latency, Low Loss, Scalable Throughput (L4S) with a small change. In ce_threshold_ect1 mode, only ECT(1) packets can be marked to CE if their sojourn time is above the threshold. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency, Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold. If enabled, only packets with ECT(1) can be transformed to CE if their sojourn time is above the ce_threshold. Note that this new option does not change rules for codel law. In particular, if TCA_FQ_CODEL_ECN is left enabled (this is the default when fq_codel qdisc is created), ECT(0) packets can still get CE if codel law (as governed by limit/target) decides so. Section 4.3.b of current draft [1] states: b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can be used for L4S. For instance within each queue of an FQ-CoDel system, as well as a CoDel AQM, there is typically also ECN marking at an immediate (unsmoothed) shallow threshold to support use in data centres (see Sec.5.2.7 of [RFC8290]). This can be modified so that the shallow threshold is solely applied to ECT(1) packets. Then if there is a flow of non-ECN or ECT(0) packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is applied; while if there is a flow of ECT(1) packets in the queue, the shallower (typically sub-millisecond) threshold is applied. Tested: tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec netperf ... -t TCP_STREAM -- K dctcp tc -s -d qd sh dev eth1 qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64 Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013) backlog 0b 0p requeues 152013 maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639 new_flows_len 0 old_flows_len 0 [1] L4S current draft: https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-archSigned-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Cc: Bob Briscoe <in@bobbriscoe.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
skb_get_dsfield(skb) gets dsfield from skb, or -1 if an error was found. This is basically a wrapper around ipv4_get_dsfield() and ipv6_get_dsfield(). Used by following patch for fq_codel. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Use of percpu_counter structure to track count of orphaned sockets is causing problems on modern hosts with 256 cpus or more. Stefan Bach reported a serious spinlock contention in real workloads, that I was able to reproduce with a netfilter rule dropping incoming FIN packets. 53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | --53.51%--_raw_spin_lock_irqsave | --53.51%--__percpu_counter_sum tcp_check_oom | |--39.03%--__tcp_close | tcp_close | inet_release | inet6_release | sock_close | __fput | ____fput | task_work_run | exit_to_usermode_loop | do_syscall_64 | entry_SYSCALL_64_after_hwframe | __GI___libc_close | --14.48%--tcp_out_of_resources tcp_write_timeout tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn expire_timers __run_timers run_timer_softirq __softirqentry_text_start As explained in commit cf86a086 ("net/dst: use a smaller percpu_counter batch for dst entries accounting"), default batch size is too big for the default value of tcp_max_orphans (262144). But even if we reduce batch sizes, there would still be cases where the estimated count of orphans is beyond the limit, and where tcp_too_many_orphans() has to call the expensive percpu_counter_sum_positive(). One solution is to use plain per-cpu counters, and have a timer to periodically refresh this cache. Updating this cache every 100ms seems about right, tcp pressure state is not radically changing over shorter periods. percpu_counter was nice 15 years ago while hosts had less than 16 cpus, not anymore by current standards. v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m, reported by kernel test robot <lkp@intel.com> Remove unused socket argument from tcp_too_many_orphans() Fixes: dd24c001 ("net: Use a percpu_counter for orphan_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stefan Bach <sfb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matt Johnston authored
mctp_key_alloc() returns a key already referenced. The mctp_route_input() path receives a packet for a bind socket and allocates a key. It passes the key to mctp_key_add() which takes a refcount and adds the key to lists. mctp_route_input() should then release its own refcount when setting the key pointer to NULL. In the mctp_alloc_local_tag() path (for mctp_local_output()) we similarly need to unref the key before returning (mctp_reserve_tag() takes a refcount and adds the key to lists). Fixes: 73c61845 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ansuel Smith says: ==================== Multiple improvement for qca8337 switch This series is the final step of a long process of porting 80+ devices to use the new qca8k driver instead of the hacky qca one based on never merged swconfig platform. Some background to justify all these additions. QCA used a special binding to declare raw initval to set the swich. I made a script to convert all these magic values and convert 80+ dts and scan all the needed "unsupported regs". We find a baseline where we manage to find the common and used regs so in theory hopefully we don't have to add anymore things. We discovered lots of things with this, especially about how differently qca8327 works compared to qca8337. In short, we found that qca8327 have some problem with suspend/resume for their internal phy. It instead sets some dedicated regs that suspend the phy without setting the standard bit. First 4 patch are to fix this. There is also a patch about preferring master. This is directly from the original driver and it seems to be needed to prevent some problem with the pause frame. Every ipq806x target sets the mac power sel and this specific reg regulates the output voltage of the regulator. Without this some instability can occur. Some configuration (for some reason) swap mac6 with mac0. We add support for this. Also, we discovered that some device doesn't work at all with pll enabled for sgmii line. In the original code this was based on the switch revision. In later revision the pll regs were decided based on the switch type (disabled for qca8327 and enabled for qca8337) but still some device had that disabled in the initval regs. Considering we found at least one qca8337 device that required pll disabled to work (no traffic problem) we decided to introduce a binding to enable pll and set it only with that. Lastly, we add support for led open drain that require the power-on-sel to set. Also, some device have only the power-on-sel set in the initval so we add also support for that. This is needed for the correct function of the switch leds. Qca8327 have a special reg in the pws regs that set it to a reduced 48pin layout. This is needed or the switch doesn't work. These are all the special configuration we find on all these devices that are from various targets. Mostly ath79, ipq806x and bcm53xx. Changes v7: - Fix missing newline in yaml - Handle error with wrong cpu port detected - Move yaml commit as last to fix bot error Changes v6: - Convert Documentation to yaml - Add extra check for cpu port and invalid phy mode - Add co developed by tag to give credits to Matthew Changes v5: - Swap patch. Document first then implement. - Fix some grammar error reported. - Rework function. Remove phylink mac_config DT scan and move everything to dedicated function in probe. - Introduce new logic for delay selection where is also supported with internal delay declared and rgmii set as phy mode - Start working on ymal conversion. Will later post this in v6 when we finally take final decision about mac swap. Changes v4: - Fix typo in SGMII falling edge about using PHY id instead of switch id Changes v3: - Drop phy patches (proposed separateley) - Drop special pwr binding. Rework to ipq806x specific - Better describe compatible and add serial print on switch chip - Drop mac exchange. Rework falling edge and move it to mac_config - Add support for port 6 cpu port. Drop hardcoded cpu port to port0 - Improve port stability with sgmii. QCA source have intenal delay also for sgmii - Add warning with pll enabled on wrong configuration Changes v2: - Reword Documentation patch to dt-bindings - Propose first 2 phy patch to net - Better describe and add hint on how to use all the new bindings - Rework delay scan function and move to phylink mac_config - Drop package48 wrong binding - Introduce support for qca8328 switch - Fix wrong binding name power-on-sel - Return error on wrong config with led open drain and ignore-power-on-sel not set ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matthew Hagan authored
Convert the qca8k bindings to YAML format. Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Fix warning now that we have qca8k switch Documentation using yaml. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Move ports related config to dedicated struct to keep things organized. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
QCA original code report port instability and sa that SGMII also require to set internal delay. Generalize the rgmii delay function and apply the advised value if they are not defined in DT. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
QCA8328 switch is the bigger brother of the qca8327. Same regs different chip. Change the function to set the correct pin layout and introduce a new match_data to differentiate the 2 switch as they have the same ID and their internal PHY have the same ID. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
QCA8328 is the bigger brother of qca8327. Document the new compatible binding and add some information to understand the various switch compatible. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Some qca8327 switch require to force the ignore of power on sel strapping. Some switch require to set the led open drain mode in regs instead of using strapping. While most of the device implements this using the correct way using pin strapping, there are still some broken device that require to be set using sw regs. Introduce a new binding and support these special configuration. As led open drain require to ignore pin strapping to work, the probe fails with EINVAL error with incorrect configuration. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Document new binding qca,ignore-power-on-sel used to ignore power on strapping and use sw regs instead. Document qca,led-open.drain to set led to open drain mode, the qca,ignore-power-on-sel is mandatory with this enabled or an error will be reported. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Support enabling PLL on the SGMII CPU port. Some device require this special configuration or no traffic is transmitted and the switch doesn't work at all. A dedicated binding is added to the CPU node port to apply the correct reg on mac config. Fail to correctly configure sgmii with qca8327 switch and warn if pll is used on qca8337 with a revision greater than 1. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Document qca,sgmii-enable-pll binding used in the CPU nodes to enable SGMII PLL on MAC config. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Future proof commit. This switch have 2 CPU ports and one valid configuration is first CPU port set to sgmii and second CPU port set to rgmii-id. The current implementation detects delay only for CPU port zero set to rgmii and doesn't count any delay set in a secondary CPU port. Drop the current delay scan function and move it to the sgmii parser function to generalize and implicitly add support for secondary CPU port set to rgmii-id. Introduce new logic where delay is enabled also with internal delay binding declared and rgmii set as PHY mode. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Currently CPU port is always hardcoded to port 0. This switch have 2 CPU ports. The original intention of this driver seems to be use the mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration where device have connected only the CPU port 6. To skip the introduction of a new binding, rework the driver to address the secondary CPU port as primary and drop any reference of hardcoded port. With configuration of mac06 exchange, just skip the definition of port0 and define the CPU port as a secondary. The driver will autoconfigure the switch to use that as the primary CPU port. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
The switch now support CPU port to be set 6 instead of be hardcoded to 0. Document support for it and describe logic selection. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Add support for this in the qca8k driver. Also add support for SGMII rx/tx clock falling edge. This is only present for pad0, pad5 and pad6 have these bit reserved from Documentation. Add a comment that this is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and setting falling in port0 applies to both configuration with sgmii used for port0 or port6. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Add names and descriptions of additional PORT0_PAD_CTRL properties. qca,sgmii-(rx|tx)clk-falling-edge are for setting the respective clock phase to failling edge. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ansuel Smith authored
Add missing mac power sel support needed for ipq8064/5 SoC that require 1.8v for the internal regulator port instead of the default 1.5v. If other device needs this, consider adding a dedicated binding to support this. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The variable err is being initialized with a value that is never read, it is being updated immediately afterwards. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yunsheng Lin authored
As the 32-bit arch with 64-bit DMA seems to rare those days, and page pool might carry a lot of code and complexity for systems that possibly. So disable dma mapping support for such systems, if drivers really want to work on such systems, they have to implement their own DMA-mapping fallback tracking outside page_pool. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Srujana Challa says: ==================== octeontx2-af: Miscellaneous changes for CPT This patchset consists of miscellaneous changes for CPT. First patch enables the CPT HW interrupts, second patch adds support for CPT LF teardown in non FLR path and final patch does CPT CTX flush in FLR handler. v2: - Fixed a warning reported by kernel test robot. ==================== Link: https://lore.kernel.org/r/20211013055621.1812301-1-schalla@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Srujana Challa authored
Adds support to flush or invalidate CPT CTX entries as part of FLR and also provides a mailbox to flush CPT CTX entries in case of graceful exit. This patch also adds support for AF -> CPT PF uplink mailbox messages and adds a new mbox message to submit a CPT instruction from AF. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Nithin Dabilpuram authored
Perform CPT LF teardown in non FLR path as well via cpt_lf_free() Currently CPT LF teardown and reset sequence is only done when FLR is handled with CPT LF still attached. This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in CPT_AF_LFX_CTL2 when being completely overwritten as that is the default value and is better for performance. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-