Commits · d28d6d2e37d10d607f931d25c835a0bd94d370e3 · Kirill Smelkov / linux

29 Nov, 2021 26 commits

net: lan966x: add port module support · d28d6d2e

Horatiu Vultur authored Nov 29, 2021

This patch adds support for netdev and phylink in the switch. The
injection + extraction is register based. This will be replaced with DMA
accees.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d28d6d2e

net: lan966x: add the basic lan966x driver · db8bcaad

Horatiu Vultur authored Nov 29, 2021

This patch adds basic SwitchDev driver framework for lan966x. It
includes only the IO range mapping and probing of the switch.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db8bcaad

dt-bindings: net: lan966x: Add lan966x-switch bindings · 642fcf53

Horatiu Vultur authored Nov 29, 2021

Document the lan966x switch device driver bindings
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

642fcf53

net: ixp4xx_hss: Convert to use DT probing · 35aefaad

Linus Walleij authored Nov 22, 2021

IXP4xx is being migrated to device tree only. Convert this
driver to use device tree probing.

Pull in all the boardfile code from the one boardfile and
make it local, pull all the boardfile parameters from the
device tree instead of the board file.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

35aefaad

dt-bindings: net: Add bindings for IXP4xx V.35 WAN HSS · 9c37b09d

Linus Walleij authored Nov 22, 2021

This adds device tree bindings for the IXP4xx V.35 WAN high
speed serial (HSS) link.

An example is added to the NPE example where the HSS appears
as a child.

Cc: devicetree@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c37b09d

net: dsa: rtl8365mb: set RGMII RX delay in steps of 0.3 ns · ef136837

Alvin Šipraga authored Nov 29, 2021

A contact at Realtek has clarified what exactly the units of RGMII RX
delay are. The answer is that the unit of RX delay is "about 0.3 ns".
Take this into account when parsing rx-internal-delay-ps by
approximating the closest step value. Delays of more than 2.1 ns are
rejected.

This obviously contradicts the previous assumption in the driver that a
step value of 4 was "about 2 ns", but Realtek also points out that it is
easy to find more than one RX delay step value which makes RGMII work.

Fixes: 4af2950c ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Cc: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef136837

net: dsa: rtl8365mb: fix garbled comment · 1ecab937

Alvin Šipraga authored Nov 29, 2021

Fixes: 4af2950c ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ecab937

net: dsa: realtek-smi: don't log an error on EPROBE_DEFER · b014861d

Alvin Šipraga authored Nov 29, 2021

Probe deferral is not an error, so don't log this as an error:

[0.590156] realtek-smi ethernet-switch: unable to register switch ret = -517
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b014861d

selftests: net: bridge: fix typo in vlan_filtering dependency test · 754d71be

Ivan Vecera authored Nov 29, 2021

Prior patch:
]# TESTS=vlmc_filtering_test ./bridge_vlan_mcast.sh
TEST: Vlan multicast snooping enable                                [ OK ]
Device "bridge" does not exist.
TEST: Disable multicast vlan snooping when vlan filtering is disabled   [FAIL]
        Vlan filtering is disabled but multicast vlan snooping is still enabled

After patch:
# TESTS=vlmc_filtering_test ./bridge_vlan_mcast.sh
TEST: Vlan multicast snooping enable                                [ OK ]
TEST: Disable multicast vlan snooping when vlan filtering is disabled   [ OK ]

Fixes: f5a9dd58 ("selftests: net: bridge: add test for vlan_filtering dependency")
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

754d71be

Merge branch 'mpls-cleanups' · fe42e885

David S. Miller authored Nov 29, 2021

Benjamin Poirier says:

====================
net: mpls: Cleanup nexthop iterator macros

The mpls macros for_nexthops and change_nexthops were probably copied
from decnet or ipv4 but they grew a superfluous variable and lost a
beneficial "const".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fe42e885

net: mpls: Make for_nexthops iterator const · f05b0b97

Benjamin Poirier authored Nov 29, 2021

There are separate for_nexthops and change_nexthops iterators. The
for_nexthops variant should use const.
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f05b0b97

net: mpls: Remove duplicate variable from iterator macro · 69d9c0d0

Benjamin Poirier authored Nov 29, 2021

__nh is just a copy of nh with a different type.
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69d9c0d0

Merge branch 'qualcomm-bam-dmux' · 688e0757

David S. Miller authored Nov 29, 2021

Stephan Gerhold says:

====================
net: wwan: Add Qualcomm BAM-DMUX WWAN network driver

The BAM Data Multiplexer provides access to the network data channels
of modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916
or MSM8974. This series adds a driver that allows using it.

All the changes in this patch series are based on a quite complicated
driver from Qualcomm [1]. The driver has been used in postmarketOS [2]
on various smartphones/tablets based on Qualcomm MSM8916 and MSM8974
for more than a year now with no reported problems. It works out of
the box with open-source WWAN userspace such as ModemManager.

[1]: https://source.codeaurora.org/quic/la/kernel/msm-3.10/tree/drivers/soc/qcom/bam_dmux.c?h=LA.BR.1.2.9.1-02310-8x16.0
[2]: https://postmarketos.org/

Changes in v3:
  - Clarify DT schema based on discussion
  - Drop bam_dma/dmaengine patches since they already landed in 5.16
  - Rebase on net-next
  - Simplify cover letter and commit messages

Changes in v2:
  - Rename "qcom,remote-power-collapse" -> "qcom,powered-remotely"
  - Rebase on net-next and fix conflicts
  - Rename network interfaces from "rmnet%d" -> "wwan%d"
  - Fix wrong file name in MAINTAINERS entry
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

688e0757

net: wwan: Add Qualcomm BAM-DMUX WWAN network driver · 21a0ffd9

Stephan Gerhold authored Nov 27, 2021

The BAM Data Multiplexer provides access to the network data channels of
modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916 or
MSM8974. It is built using a simple protocol layer on top of a DMA engine
(Qualcomm BAM) and bidirectional interrupts to coordinate power control.

The modem announces a fixed set of channels by sending an OPEN command.
The driver exports each channel as separate network interface so that
a connection can be established via QMI from userspace. The network
interface can work either in Ethernet or Raw-IP mode (configurable via
QMI). However, Ethernet mode seems to be broken with most firmwares
(network packets are actually received as Raw-IP), therefore the driver
only supports Raw-IP mode.

Note that the control channel (QMI/AT) is entirely separate from
BAM-DMUX and is already supported by the RPMSG_WWAN_CTRL driver.

The driver uses runtime PM to coordinate power control with the modem.
TX/RX buffers are put in a kind of "ring queue" and submitted via
the bam_dma driver of the DMAEngine subsystem.

The basic architecture looks roughly like this:

                   +------------+                +-------+
         [IPv4/6]  |  BAM-DMUX  |                |       |
         [Data...] |            |                |       |
        ---------->|wwan0       | [DMUX chan: x] |       |
         [IPv4/6]  | (chan: 0)  | [IPv4/6]       |       |
         [Data...] |            | [Data...]      |       |
        ---------->|wwan1       |--------------->| Modem |
                   | (chan: 1)  |      BAM       |       |
         [IPv4/6]  | ...        |  (DMA Engine)  |       |
         [Data...] |            |                |       |
        ---------->|wwan7       |                |       |
                   | (chan: 7)  |                |       |
                   +------------+                +-------+

Note that some newer firmware versions support QMAP ("rmnet" driver)
as additional multiplexing layer on top of BAM-DMUX, but this is not
currently supported by this driver.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

21a0ffd9

dt-bindings: net: Add schema for Qualcomm BAM-DMUX · f3aee7c9

Stephan Gerhold authored Nov 27, 2021

The BAM Data Multiplexer provides access to the network data channels of
modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916 or
MSM8974. It is built using a simple protocol layer on top of a DMA engine
(Qualcomm BAM) and bidirectional interrupts to coordinate power control.

The device tree node combines the incoming interrupt with the outgoing
interrupts (smem-states) as well as the two DMA channels, which allows
the BAM-DMUX driver to request all necessary resources.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3aee7c9

Merge branch 'vxlan-port' · fc1e5a36

David S. Miller authored Nov 29, 2021

Guangbin Huang says:

====================
net: vxlan: add macro definition for number of IANA VXLAN-GPE port

This series add macro definition for number of IANA VXLAN-GPE port for
cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fc1e5a36

net: hns3: use macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790 · e54b708c

Hao Chen authored Nov 27, 2021

This patch uses macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790 for
cleanup.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e54b708c

net: vxlan: add macro definition for number of IANA VXLAN-GPE port · ed618bd8

Hao Chen authored Nov 27, 2021

Add macro definition for number of IANA VXLAN-GPE port for generic use.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed618bd8

net: Write lock dev_base_lock without disabling bottom halves. · fd888e85

Sebastian Andrzej Siewior authored Nov 26, 2021

The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.

All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.

Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd888e85

net/l2tp: convert tunnel rwlock_t to rcu · 07b8ca37

Tom Parkin authored Nov 26, 2021

Previously commit e02d494d ("l2tp: Convert rwlock to RCU") converted
most, but not all, rwlock instances in the l2tp subsystem to RCU.

The remaining rwlock protects the per-tunnel hashlist of sessions which
is used for session lookups in the UDP-encap data path.

Convert the remaining rwlock to rcu to improve performance of UDP-encap
tunnels.

Note that the tunnel and session, which both live on RCU-protected
lists, use slightly different approaches to incrementing their refcounts
in the various getter functions.

The tunnel has to use refcount_inc_not_zero because the tunnel shutdown
process involves dropping the refcount to zero prior to synchronizing
RCU readers (via. kfree_rcu).

By contrast, the session shutdown removes the session from the list(s)
it is on, synchronizes with readers, and then decrements the session
refcount.  Since the getter functions increment the session refcount
with the RCU read lock held we prevent getters seeing a zero session
refcount, and therefore don't need to use refcount_inc_not_zero.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07b8ca37

Merge branch 'mvneta-next' · 275f37ea

David S. Miller authored Nov 29, 2021

Maxime Chevallier says:

====================
net: mvneta: mqprio cleanups and shaping support

This is the second version of the series that adds some improvements to the
existing mqprio implementation in mvneta, and adds support for
egress shaping offload.

The first 3 patches are some minor cleanups, such as using the
tc_mqprio_qopt_offload structure to get access to more offloading
options, cleaning the logic to detect whether or not we should offload
mqprio setting, and allowing to have a 1 to N mapping between TCs and
queues.

The last patch adds traffic shaping offload, using mvneta's per-queue
token buckets, allowing to limit rates from 10Kbps up to 5Gbps with
10Kbps increments.

This was tested only on an Armada 3720, with traffic up to 2.5Gbps.

Changes since V1 fixes the build for 32bits kernels, using the right
div helpers as suggested by Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

275f37ea

net: mvneta: Add TC traffic shaping offload · 2551dc9e

Maxime Chevallier authored Nov 26, 2021

The mvneta controller is able to do some tocken-bucket per-queue traffic
shaping. This commit adds support for setting these using the TC mqprio
interface.

The token-bucket parameters are customisable, but the current
implementation configures them to have a 10kbps resolution for the
rate limitation, since it allows to cover the whole range of max_rate
values from 10kbps to 5Gbps with 10kbps increments.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2551dc9e

net: mvneta: Allow having more than one queue per TC · e9f7099d

Maxime Chevallier authored Nov 26, 2021

The current mqprio implementation assumed that we are only using one
queue per TC. Use the offset and count parameters to allow using
multiple queues per TC. In that case, the controller will use a standard
round-robin algorithm to pick queues assigned to the same TC, with the
same priority.

This only applies to VLAN priorities in ingress traffic, each TC
corresponding to a vlan priority.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9f7099d

net: mvneta: Don't force-set the offloading flag · e7ca75fe

Maxime Chevallier authored Nov 26, 2021

The qopt->hw flag is set by the TC code according to the offloading mode
asked by user. Don't force-set it in the driver, but instead read it to
make sure we do what's asked.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7ca75fe

net: mvneta: Use struct tc_mqprio_qopt_offload for MQPrio configuration · 75fa71e3

Maxime Chevallier authored Nov 26, 2021

The struct tc_mqprio_qopt_offload is a container for struct tc_mqprio_qopt,
that allows passing extra parameters, such as traffic shaping. This commit
converts the current mqprio code to that new struct.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75fa71e3

net: mdio: ipq8064: replace ioremap() with devm_ioremap() · 2f7ed29f

Yang Yingliang authored Nov 26, 2021

Use devm_ioremap() instead of ioremap() to avoid iounmap() missing.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f7ed29f

27 Nov, 2021 14 commits

Merge branch 'af_unix-replace-unix_table_lock-with-per-hash-locks' · d40ce48c

Jakub Kicinski authored Nov 26, 2021

Kuniyuki Iwashima says:

====================
af_unix: Replace unix_table_lock with per-hash locks.

The hash table of AF_UNIX sockets is protected by a single big lock,
unix_table_lock.  This series replaces it with small per-hash locks.

1st -  2nd : Misc refactoring
3rd -  8th : Separate BSD/abstract address logics
9th - 11th : Prep to save a hash in each socket
12th       : Replace the big lock
13th       : Speed up autobind()

Note to maintainers:
The 12th patch adds two kinds of Sparse warnings on patchwork:

  about unix_table_double_lock/unlock()
    We can avoid this by adding two apparent acquires/releases annotations,
    but there are the same kinds of warnings about unix_state_double_lock().

  about unix_next_socket() and unix_seq_stop() (/proc/net/unix)
    This is because Sparse does not understand logic in unix_next_socket(),
    which leaves a spin lock held until it returns NULL.
    Also, tcp_seq_stop() causes a warning for the same reason.

These warnings seem reasonable, but let me know if there is any better way.
Please see [0] for details.

[0]: https://lore.kernel.org/netdev/20211117001611.74123-1-kuniyu@amazon.co.jp/
====================

Link: https://lore.kernel.org/r/20211124021431.48956-1-kuniyu@amazon.co.jpSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d40ce48c

af_unix: Relax race in unix_autobind(). · 9acbc584

Kuniyuki Iwashima authored Nov 24, 2021

When we bind an AF_UNIX socket without a name specified, the kernel selects
an available one from 0x00000 to 0xFFFFF. unix_autobind() starts searching
from a number in the 'static' variable and increments it after acquiring
two locks.

If multiple processes try autobind, they obtain the same lock and check if
a socket in the hash list has the same name. If not, one process uses it,
and all except one end up retrying the _next_ number (actually not, it may
be incremented by the other processes). The more we autobind sockets in
parallel, the longer the latency gets. We can avoid such a race by
searching for a name from a random number.

These show latency in unix_autobind() while 64 CPUs are simultaneously
autobind-ing 1024 sockets for each.

Without this patch:

usec : count distribution
0 : 1176 |*** |
2 : 3655 |*********** |
4 : 4094 |************* |
6 : 3831 |************ |
8 : 3829 |************ |
10 : 3844 |************ |
12 : 3638 |*********** |
14 : 2992 |********* |
16 : 2485 |******* |
18 : 2230 |******* |
20 : 2095 |****** |
22 : 1853 |***** |
24 : 1827 |***** |
26 : 1677 |***** |
28 : 1473 |**** |
30 : 1573 |***** |
32 : 1417 |**** |
34 : 1385 |**** |
36 : 1345 |**** |
38 : 1344 |**** |
40 : 1200 |*** |

With this patch:

usec : count distribution
0 : 1855 |****** |
2 : 6464 |********************* |
4 : 9936 |******************************** |
6 : 12107 |****************************************|
8 : 10441 |********************************** |
10 : 7264 |*********************** |
12 : 4254 |************** |
14 : 2538 |******** |
16 : 1596 |***** |
18 : 1088 |*** |
20 : 800 |** |
22 : 670 |** |
24 : 601 |* |
26 : 562 |* |
28 : 525 |* |
30 : 446 |* |
32 : 378 |* |
34 : 337 |* |
36 : 317 |* |
38 : 314 |* |
40 : 298 | |
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9acbc584

af_unix: Replace the big lock with small locks. · afd20b92

Kuniyuki Iwashima authored Nov 24, 2021

The hash table of AF_UNIX sockets is protected by the single lock. This
patch replaces it with per-hash locks.

The effect is noticeable when we handle multiple sockets simultaneously.
Here is a test result on an EC2 c5.24xlarge instance. It shows latency
(under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating
1024 sockets for each in parallel.

Without this patch:

nsec : count distribution
0 : 179 | |
500 : 3021 |********* |
1000 : 6271 |******************* |
1500 : 6318 |******************* |
2000 : 5828 |***************** |
2500 : 5124 |*************** |
3000 : 4426 |************* |
3500 : 3672 |*********** |
4000 : 3138 |********* |
4500 : 2811 |******** |
5000 : 2384 |******* |
5500 : 2023 |****** |
6000 : 1954 |***** |
6500 : 1737 |***** |
7000 : 1749 |***** |
7500 : 1520 |**** |
8000 : 1469 |**** |
8500 : 1394 |**** |
9000 : 1232 |*** |
9500 : 1138 |*** |
10000 : 994 |*** |

With this patch:

nsec : count distribution
0 : 1634 |**** |
500 : 13170 |****************************************|
1000 : 13156 |*************************************** |
1500 : 9010 |*************************** |
2000 : 6363 |******************* |
2500 : 4443 |************* |
3000 : 3240 |********* |
3500 : 2549 |******* |
4000 : 1872 |***** |
4500 : 1504 |**** |
5000 : 1247 |*** |
5500 : 1035 |*** |
6000 : 889 |** |
6500 : 744 |** |
7000 : 634 |* |
7500 : 498 |* |
8000 : 433 |* |
8500 : 355 |* |
9000 : 336 |* |
9500 : 284 | |
10000 : 243 | |
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

afd20b92

af_unix: Save hash in sk_hash. · e6b4b873

Kuniyuki Iwashima authored Nov 24, 2021

To replace unix_table_lock with per-hash locks in the next patch, we need
to save a hash in each socket because /proc/net/unix or BPF prog iterate
sockets while holding a hash table lock and release it later in a different
function.

Currently, we store a real/pseudo hash in struct unix_address.  However, we
do not allocate it to unbound sockets, nor should we do just for that.  For
this purpose, we can use sk_hash.  Then, we no longer use the hash field in
struct unix_address and can remove it.

Also, this patch does
  - rename unix_insert_socket() to unix_insert_unbound_socket()
  - remove the redundant list argument from __unix_insert_socket() and
     unix_insert_unbound_socket()
  - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
  - remove 'inline' from unix_remove_socket() and
     unix_insert_unbound_socket().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e6b4b873

af_unix: Add helpers to calculate hashes. · f452be49

Kuniyuki Iwashima authored Nov 24, 2021

This patch adds three helper functions that calculate hashes for unbound
sockets and bound sockets with BSD/abstract addresses.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f452be49

af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. · 5ce7ab49

Kuniyuki Iwashima authored Nov 24, 2021

In BSD and abstract address cases, we store sockets in the hash table with
keys between 0 and UNIX_HASH_SIZE - 1. However, the hash saved in a socket
varies depending on its address type; sockets with BSD addresses always
have UNIX_HASH_SIZE in their unix_sk(sk)->addr->hash.

This is just for the UNIX_ABSTRACT() macro used to check the address type.
The difference of the saved hashes comes from the first byte of the address
in the first place. So, we can test it directly.

Then we can keep a real hash in each socket and replace unix_table_lock
with per-hash locks in the later patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5ce7ab49

af_unix: Allocate unix_address in unix_bind_(bsd|abstract)(). · 12f21c49

Kuniyuki Iwashima authored Nov 24, 2021

To terminate address with '\0' in unix_bind_bsd(), we add
unix_create_addr() and call it in unix_bind_bsd() and unix_bind_abstract().

Also, unix_bind_abstract() does not return -EEXIST.  Only
kern_path_create() and vfs_mknod() in unix_bind_bsd() can return it,
so we move the last error check in unix_bind() to unix_bind_bsd().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

12f21c49

af_unix: Remove unix_mkname(). · 5c32a3ed

Kuniyuki Iwashima authored Nov 24, 2021

This patch removes unix_mkname() and postpones calculating a hash to
unix_bind_abstract().  Some BSD stuffs still remain in unix_bind()
though, the next patch packs them into unix_bind_bsd().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5c32a3ed

af_unix: Copy unix_mkname() into unix_find_(bsd|abstract)(). · d2d8c9fd

Kuniyuki Iwashima authored Nov 24, 2021

We should not call unix_mkname() before unix_find_other() and instead do
the same thing where necessary based on the address type:

  - terminating the address with '\0' in unix_find_bsd()
  - calculating the hash in unix_find_abstract().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d2d8c9fd

af_unix: Cut unix_validate_addr() out of unix_mkname(). · b8a58aa6

Kuniyuki Iwashima authored Nov 24, 2021

unix_mkname() tests socket address length and family and does some
processing based on the address type.  It is called in the early stage,
and therefore some instructions are redundant and can end up in vain.

The address length/family tests are done twice in unix_bind().  Also, the
address type is rechecked later in unix_bind() and unix_find_other(), where
we can do the same processing.  Moreover, in the BSD address case, the hash
is set to 0 but never used and confusing.

This patch moves the address tests out of unix_mkname(), and the following
patches move the other part into appropriate places and remove
unix_mkname() finally.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b8a58aa6

af_unix: Return an error as a pointer in unix_find_other(). · aed26f55

Kuniyuki Iwashima authored Nov 24, 2021

We can return an error as a pointer and need not pass an additional
argument to unix_find_other().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

aed26f55

af_unix: Factorise unix_find_other() based on address types. · fa39ef0e

Kuniyuki Iwashima authored Nov 24, 2021

As done in the commit fa42d910 ("unix_bind(): take BSD and abstract
address cases into new helpers"), this patch moves BSD and abstract address
cases from unix_find_other() into unix_find_bsd() and unix_find_abstract().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fa39ef0e

af_unix: Pass struct sock to unix_autobind(). · f7ed31f4

Kuniyuki Iwashima authored Nov 24, 2021

We do not use struct socket in unix_autobind() and pass struct sock to
unix_bind_bsd() and unix_bind_abstract().  Let's pass it to unix_autobind()
as well.

Also, this patch fixes these errors by checkpatch.pl.

  ERROR: do not use assignment in if condition
  #1795: FILE: net/unix/af_unix.c:1795:
  +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr

  CHECK: Logical continuations should be on the previous line
  #1796: FILE: net/unix/af_unix.c:1796:
  +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
  +	    && (err = unix_autobind(sock)) != 0)
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f7ed31f4

af_unix: Use offsetof() instead of sizeof(). · 755662ce

Kuniyuki Iwashima authored Nov 24, 2021

The length of the AF_UNIX socket address contains an offset to the member
sun_path of struct sockaddr_un.

Currently, the preceding member is just sun_family, and its type is
sa_family_t and resolved to short.  Therefore, the offset is represented by
sizeof(short).  However, it is not clear and fragile to changes in struct
sockaddr_storage or sockaddr_un.

This commit makes it clear and robust by rewriting sizeof() with
offsetof().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

755662ce