1. 15 Oct, 2020 1 commit
    • Mathieu Desnoyers's avatar
      ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) · e1e84eb5
      Mathieu Desnoyers authored
      As per RFC792, ICMP errors should be sent to the source host.
      
      However, in configurations with Virtual Routing and Forwarding tables,
      looking up which routing table to use is currently done by using the
      destination net_device.
      
      commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to
      determine L3 domain") changes the interface passed to
      l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
      to skb_dst(skb_in)->dev. This effectively uses the destination device
      rather than the source device for choosing which routing table should be
      used to lookup where to send the ICMP error.
      
      Therefore, if the source and destination interfaces are within separate
      VRFs, or one in the global routing table and the other in a VRF, looking
      up the source host in the destination interface's routing table will
      fail if the destination interface's routing table contains no route to
      the source host.
      
      One observable effect of this issue is that traceroute does not work in
      the following cases:
      
      - Route leaking between global routing table and VRF
      - Route leaking between VRFs
      
      Preferably use the source device routing table when sending ICMP error
      messages. If no source device is set, fall-back on the destination
      device routing table. Else, use the main routing table (index 0).
      
      [ It has been pointed out that a similar issue may exist with ICMP
        errors triggered when forwarding between network namespaces. It would
        be worthwhile to investigate, but is outside of the scope of this
        investigation. ]
      
      [ It has also been pointed out that a similar issue exists with
        unreachable / fragmentation needed messages, which can be triggered by
        changing the MTU of eth1 in r1 to 1400 and running:
      
        ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
      
        Some investigation points to raw_icmp_error() and raw_err() as being
        involved in this last scenario. The focus of this patch is TTL expired
        ICMP messages, which go through icmp_route_lookup.
        Investigation of failure modes related to raw_icmp_error() is beyond
        this investigation's scope. ]
      
      Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
      Link: https://tools.ietf.org/html/rfc792Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1e84eb5
  2. 14 Oct, 2020 4 commits
  3. 13 Oct, 2020 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_log: missing vlan offload tag and proto · 0d9826bc
      Pablo Neira Ayuso authored
      Dump vlan tag and proto for the usual vlan offload case if the
      NF_LOG_MACDECODE flag is set on. Without this information the logging is
      misleading as there is no reference to the VLAN header.
      
      [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
      [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1
      
      Fixes: 83e96d44 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d9826bc
    • Willem de Bruijn's avatar
      docs: networking: update XPS to account for netif_set_xps_queue · 254941f3
      Willem de Bruijn authored
      With the introduction of netif_set_xps_queue, XPS can be enabled
      by the driver at initialization.
      
      Update the documentation to reflect this, as otherwise users
      may incorrectly believe that the feature is off by default.
      
      Fixes: 537c00de ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      254941f3
  4. 12 Oct, 2020 5 commits
    • Marek Vasut's avatar
      net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() · 64a632da
      Marek Vasut authored
      The phy_reset_after_clk_enable() is always called with ndev->phydev,
      however that pointer may be NULL even though the PHY device instance
      already exists and is sufficient to perform the PHY reset.
      
      This condition happens in fec_open(), where the clock must be enabled
      first, then the PHY must be reset, and then the PHY IDs can be read
      out of the PHY.
      
      If the PHY still is not bound to the MAC, but there is OF PHY node
      and a matching PHY device instance already, use the OF PHY node to
      obtain the PHY device instance, and then use that PHY device instance
      when triggering the PHY reset.
      
      Fixes: 1b0a83ac ("net: fec: add phy_reset_after_clk_enable() support")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: NXP Linux Team <linux-imx@nxp.com>
      Cc: Richard Leitner <richard.leitner@skidata.com>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64a632da
    • Jonathan Lemon's avatar
      mlx4: handle non-napi callers to napi_poll · b2b8a927
      Jonathan Lemon authored
      netcons calls napi_poll with a budget of 0 to transmit packets.
      Handle this by:
       - skipping RX processing
       - do not try to recycle TX packets to the RX cache
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2b8a927
    • Valentin Vidic's avatar
      net: korina: fix kfree of rx/tx descriptor array · 3af5f0f5
      Valentin Vidic authored
      kmalloc returns KSEG0 addresses so convert back from KSEG1
      in kfree. Also make sure array is freed when the driver is
      unloaded from the kernel.
      
      Fixes: ef11291b ("Add support the Korina (IDT RC32434) Ethernet MAC")
      Signed-off-by: default avatarValentin Vidic <vvidic@valentin-vidic.from.hr>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3af5f0f5
    • Christian Eggers's avatar
      net: dsa: microchip: fix race condition · 8098bd69
      Christian Eggers authored
      Between queuing the delayed work and finishing the setup of the dsa
      ports, the process may sleep in request_module() (via
      phy_device_create()) and the queued work may be executed prior to the
      switch net devices being registered. In ksz_mib_read_work(), a NULL
      dereference will happen within netof_carrier_ok(dp->slave).
      
      Not queuing the delayed work in ksz_init_mib_timer() makes things even
      worse because the work will now be queued for immediate execution
      (instead of 2000 ms) in ksz_mac_link_down() via
      dsa_port_link_register_of().
      
      Call tree:
      ksz9477_i2c_probe()
      \--ksz9477_switch_register()
         \--ksz_switch_register()
            +--dsa_register_switch()
            |  \--dsa_switch_probe()
            |     \--dsa_tree_setup()
            |        \--dsa_tree_setup_switches()
            |           +--dsa_switch_setup()
            |           |  +--ksz9477_setup()
            |           |  |  \--ksz_init_mib_timer()
            |           |  |     |--/* Start the timer 2 seconds later. */
            |           |  |     \--schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000));
            |           |  \--__mdiobus_register()
            |           |     \--mdiobus_scan()
            |           |        \--get_phy_device()
            |           |           +--get_phy_id()
            |           |           \--phy_device_create()
            |           |              |--/* sleeping, ksz_mib_read_work() can be called meanwhile */
            |           |              \--request_module()
            |           |
            |           \--dsa_port_setup()
            |              +--/* Called for non-CPU ports */
            |              +--dsa_slave_create()
            |              |  +--/* Too late, ksz_mib_read_work() may be called beforehand */
            |              |  \--port->slave = ...
            |             ...
            |              +--Called for CPU port */
            |              \--dsa_port_link_register_of()
            |                 \--ksz_mac_link_down()
            |                    +--/* mib_read must be initialized here */
            |                    +--/* work is already scheduled, so it will be executed after 2000 ms */
            |                    \--schedule_delayed_work(&dev->mib_read, 0);
            \-- /* here port->slave is setup properly, scheduling the delayed work should be safe */
      
      Solution:
      1. Do not queue (only initialize) delayed work in ksz_init_mib_timer().
      2. Only queue delayed work in ksz_mac_link_down() if init is completed.
      3. Queue work once in ksz_switch_register(), after dsa_register_switch()
      has completed.
      
      Fixes: 7c6ff470 ("net: dsa: microchip: add MIB counter reading support")
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8098bd69
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: extend error reporting for chain updates · 98a381a7
      Pablo Neira Ayuso authored
      The initial support for netlink extended ACK is missing the chain update
      path, which results in misleading error reporting in case of EEXIST.
      
      Fixes 36dd1bcc ("netfilter: nf_tables: initial support for extended ACK reporting")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      98a381a7
  5. 11 Oct, 2020 2 commits
  6. 10 Oct, 2020 9 commits
    • David Ahern's avatar
      ipv4: Restore flowi4_oif update before call to xfrm_lookup_route · 874fb9e2
      David Ahern authored
      Tobias reported regressions in IPsec tests following the patch
      referenced by the Fixes tag below. The root cause is dropping the
      reset of the flowi4_oif after the fib_lookup. Apparently it is
      needed for xfrm cases, so restore the oif update to ip_route_output_flow
      right before the call to xfrm_lookup_route.
      
      Fixes: 2fbc6e89 ("ipv4: Update exception handling for multipath routes via same device")
      Reported-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      874fb9e2
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-some-fallback-fixes' · 49fb2f33
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      mptcp: some fallback fixes
      
      pktdrill pointed-out we currently don't handle properly some
      fallback scenario for MP_JOIN subflows
      
      The first patch addresses such issue.
      
      Patch 2/2 fixes a related pre-existing issue that is more
      evident after 1/2: we could keep using for MPTCP signaling
      closed subflows.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49fb2f33
    • Paolo Abeni's avatar
      mptcp: subflows garbage collection · 0e4f35d7
      Paolo Abeni authored
      The msk can close MP_JOIN subflows if the initial handshake
      fails. Currently such subflows are kept alive in the
      conn_list until the msk itself is closed.
      
      Beyond the wasted memory, we could end-up sending the
      DATA_FIN and the DATA_FIN ack on such socket, even after a
      reset.
      
      Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e4f35d7
    • Paolo Abeni's avatar
      mptcp: fix fallback for MP_JOIN subflows · d5824847
      Paolo Abeni authored
      Additional/MP_JOIN subflows that do not pass some initial handshake
      tests currently causes fallback to TCP. That is an RFC violation:
      we should instead reset the subflow and leave the the msk untouched.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/91
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d5824847
    • Pujin Shi's avatar
      net: smc: fix missing brace warning for old compilers · 16cb3653
      Pujin Shi authored
      For older versions of gcc, the array = {0}; will cause warnings:
      
      net/smc/smc_llc.c: In function 'smc_llc_add_link_local':
      net/smc/smc_llc.c:1212:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_add_link add_llc = {0};
               ^
      net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') [-Wmissing-braces]
      net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local':
      net/smc/smc_llc.c:1245:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_del_link del_llc = {0};
               ^
      net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') [-Wmissing-braces]
      
      2 warnings generated
      
      Fixes: 4dadd151 ("net/smc: enqueue local LLC messages")
      Signed-off-by: default avatarPujin Shi <shipujin.t@gmail.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      16cb3653
    • Pujin Shi's avatar
      net: smc: fix missing brace warning for old compilers · 7e94e46c
      Pujin Shi authored
      For older versions of gcc, the array = {0}; will cause warnings:
      
      net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all':
      net/smc/smc_llc.c:1317:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_del_link delllc = {0};
               ^
      net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') [-Wmissing-braces]
      
      1 warnings generated
      
      Fixes: f3811fd7 ("net/smc: send DELETE_LINK, ALL message and wait for send to complete")
      Signed-off-by: default avatarPujin Shi <shipujin.t@gmail.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e94e46c
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.9-20201008' of... · b54fa649
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.9-20201008' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      ====================
      linux-can-fixes-for-5.9-20201008
      
      The first patch is by Lucas Stach and fixes m_can driver by removing an
      erroneous call to m_can_class_suspend() in runtime suspend. Which causes the
      pinctrl state to get stuck on the "sleep" state, which breaks all CAN
      functionality on SoCs where this state is defined.
      
      The last two patches target the j1939 protocol: Cong Wang fixes a syzbot
      finding of an uninitialized variable in the j1939 transport protocol. I
      contribute a patch, that fixes the initialization of a same uninitialized
      variable in a different function.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b54fa649
    • Hoang Huu Le's avatar
      tipc: fix NULL pointer dereference in tipc_named_rcv · 7b50ee3d
      Hoang Huu Le authored
      In the function node_lost_contact(), we call __skb_queue_purge() without
      grabbing the list->lock. This can cause to a race-condition why processing
      the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue().
      
          [] BUG: kernel NULL pointer dereference, address: 0000000000000000
          [] #PF: supervisor read access in kernel mode
          [] #PF: error_code(0x0000) - not-present page
          [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0
          [] Oops: 0000 [#1] SMP NOPTI
          [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G  O  5.9.0-rc6+ #2
          [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...]
          [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc]
          [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...]
          [] RSP: 0018:ffffc900000a7c58 EFLAGS: 00000282
          [] RAX: 00000000000012ec RBX: 0000000000000000 RCX: ffff88807bde1270
          [] RDX: 0000000000002c7c RSI: 0000000000002c7c RDI: ffff88807b38f1a8
          [] RBP: ffff88807b006288 R08: ffff88806a367800 R09: ffff88806a367900
          [] R10: ffff88806a367a00 R11: ffff88806a367b00 R12: ffff88807b006258
          [] R13: ffff88807b00628a R14: ffff888069334d00 R15: ffff88806a434600
          [] FS:  0000000000000000(0000) GS:ffff888079480000(0000) knlGS:0[...]
          [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [] CR2: 0000000000000000 CR3: 0000000077320000 CR4: 00000000000006e0
          [] Call Trace:
          []  ? tipc_bcast_rcv+0x9a/0x1a0 [tipc]
          []  tipc_rcv+0x40d/0x670 [tipc]
          []  ? _raw_spin_unlock+0xa/0x20
          []  tipc_l2_rcv_msg+0x55/0x80 [tipc]
          []  __netif_receive_skb_one_core+0x8c/0xa0
          []  process_backlog+0x98/0x140
          []  net_rx_action+0x13a/0x420
          []  __do_softirq+0xdb/0x316
          []  ? smpboot_thread_fn+0x2f/0x1e0
          []  ? smpboot_thread_fn+0x74/0x1e0
          []  ? smpboot_thread_fn+0x14e/0x1e0
          []  run_ksoftirqd+0x1a/0x40
          []  smpboot_thread_fn+0x149/0x1e0
          []  ? sort_range+0x20/0x20
          []  kthread+0x131/0x150
          []  ? kthread_unuse_mm+0xa0/0xa0
          []  ret_from_fork+0x22/0x30
          [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...]
          [] CR2: 0000000000000000
          [] ---[ end trace 65c276a8e2e2f310 ]---
      
      To fix this, we need to grab the lock of the 'namedq' list on both
      path calling.
      
      Fixes: cad2929d ("tipc: update a binding service via broadcast")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Huu Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b50ee3d
    • Cong Wang's avatar
      tipc: fix the skb_unshare() in tipc_buf_append() · ed42989e
      Cong Wang authored
      skb_unshare() drops a reference count on the old skb unconditionally,
      so in the failure case, we end up freeing the skb twice here.
      And because the skb is allocated in fclone and cloned by caller
      tipc_msg_reassemble(), the consequence is actually freeing the
      original skb too, thus triggered the UAF by syzbot.
      
      Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy().
      
      Fixes: ff48b622 ("tipc: use skb_unshare() instead in tipc_buf_append()")
      Reported-and-tested-by: syzbot+e96a7ba46281824cc46a@syzkaller.appspotmail.com
      Cc: Jon Maloy <jmaloy@redhat.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed42989e
  7. 09 Oct, 2020 8 commits
  8. 08 Oct, 2020 9 commits