1. 28 Feb, 2018 11 commits
    • David S. Miller's avatar
      Merge branch 'tcp-revert-a-F-RTO-extension-due-to-broken-middle-boxes' · 55e84dd7
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      tcp: revert a F-RTO extension due to broken middle-boxes
      
      This patch series reverts a (non-standard) TCP F-RTO extension that aimed
      to detect more spurious timeouts. Unfortunately it could result in poor
      performance due to broken middle-boxes that modify TCP packets. E.g.
      https://www.spinics.net/lists/netdev/msg484154.html
      We believe the best and simplest solution is to just revert the change.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55e84dd7
    • Yuchung Cheng's avatar
      tcp: revert F-RTO extension to detect more spurious timeouts · fc68e171
      Yuchung Cheng authored
      This reverts commit 89fe18e4.
      
      While the patch could detect more spurious timeouts, it could cause
      poor TCP performance on broken middle-boxes that modifies TCP packets
      (e.g. receive window, SACK options). Since the performance gain is
      much smaller compared to the potential loss. The best solution is
      to fully revert the change.
      
      Fixes: 89fe18e4 ("tcp: extend F-RTO to catch more spurious timeouts")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc68e171
    • Yuchung Cheng's avatar
      tcp: revert F-RTO middle-box workaround · d4131f09
      Yuchung Cheng authored
      This reverts commit cc663f4d. While fixing
      some broken middle-boxes that modifies receive window fields, it does not
      address middle-boxes that strip off SACK options. The best solution is
      to fully revert this patch and the root F-RTO enhancement.
      
      Fixes: cc663f4d ("tcp: restrict F-RTO to work-around broken middle-boxes")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4131f09
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · c8431622
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-02-27
      
      please apply some more qeth patches for -net and stable.
      
      One patch fixes a performance bug in the TSO path. Then there's several
      more fixes for IP management on L3 devices - including a revert, so that
      the subsequent fix cleanly applies to earlier kernels.
      The final patch takes care of a race in the control IO code that causes
      qeth to miss the cmd response, and subsequently trigger device recovery.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8431622
    • Julian Wiedmann's avatar
      s390/qeth: fix IPA command submission race · d22ffb5a
      Julian Wiedmann authored
      If multiple IPA commands are build & sent out concurrently,
      fill_ipacmd_header() may assign a seqno value to a command that's
      different from what send_control_data() later assigns to this command's
      reply.
      This is due to other commands passing through send_control_data(),
      and incrementing card->seqno.ipa along the way.
      
      So one IPA command has no reply that's waiting for its seqno, while some
      other IPA command has multiple reply objects waiting for it.
      Only one of those waiting replies wins, and the other(s) times out and
      triggers a recovery via send_ipa_cmd().
      
      Fix this by making sure that the same seqno value is assigned to
      a command and its reply object.
      Do so immediately before submitting the command & while holding the
      irq_pending "lock", to produce nicely ascending seqnos.
      
      As a side effect, *all* IPA commands now use a reply object that's
      waiting for its actual seqno. Previously, early IPA commands that were
      submitted while the card was still DOWN used the "catch-all" IDX seqno.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d22ffb5a
    • Julian Wiedmann's avatar
      s390/qeth: fix IP address lookup for L3 devices · c5c48c58
      Julian Wiedmann authored
      Current code ("qeth_l3_ip_from_hash()") matches a queried address object
      against objects in the IP table by IP address, Mask/Prefix Length and
      MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
      require is either
      a) "is this IP address registered" (ie. match by IP address only),
      before adding a new address.
      b) or "is this address object registered" (ie. match all relevant
         attributes), before deleting an address.
      
      Right now
      1. the ADD path is too strict in its lookup, and eg. doesn't detect
      conflicts between an existing NORMAL address and a new VIPA address
      (because the NORMAL address will have mask != 0, while VIPA has
      a mask == 0),
      2. the DELETE path is not strict enough, and eg. allows del_rxip() to
      delete a VIPA address as long as the IP address matches.
      
      Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
      that do the appropriate checking.
      
      Note that the ADD path for NORMAL addresses is special, as qeth keeps
      track of how many times such an address is in use (and there is no
      immediate way of returning errors to the caller). So when a requested
      NORMAL address _fully_ matches an existing one, it's not considered a
      conflict and we merely increment the refcount.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c48c58
    • Julian Wiedmann's avatar
      Revert "s390/qeth: fix using of ref counter for rxip addresses" · 4964c66f
      Julian Wiedmann authored
      This reverts commit cb816192.
      
      The issue this attempted to fix never actually occurs.
      l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
      was previously added to the card. If so, it returns -EEXIST and doesn't
      call l3_add_ip().
      As a result, the "address exists" path in l3_add_ip() is never taken
      for rxip addresses, and this patch had no effect.
      
      Fixes: cb816192 ("s390/qeth: fix using of ref counter for rxip addresses")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4964c66f
    • Julian Wiedmann's avatar
      s390/qeth: fix double-free on IP add/remove race · 14d066c3
      Julian Wiedmann authored
      Registering an IPv4 address with the HW takes quite a while, so we
      temporarily drop the ip_htable lock. Any concurrent add/remove of the
      same IP adjusts the IP's use count, and (on remove) is then blocked by
      addr->in_progress.
      After the register call has completed, we check the use count for
      concurrently attempted add/remove calls - and possibly straight-away
      deregister the IP again. This happens via l3_delete_ip(), which
      1) looks up the queried IP in the htable (getting a reference to the
         *same* queried object),
      2) deregisters the IP from the HW, and
      3) frees the IP object.
      
      The caller in l3_add_ip() then does a second free on the same object.
      
      For this case, skip all the extra checks and lookups in l3_delete_ip()
      and just deregister & free the IP object ourselves.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14d066c3
    • Julian Wiedmann's avatar
      s390/qeth: fix IP removal on offline cards · 98d823ab
      Julian Wiedmann authored
      If the HW is not reachable, then none of the IPs in qeth's internal
      table has been registered with the HW yet. So when deleting such an IP,
      there's no need to stage it for deregistration - just drop it from
      the table.
      
      This fixes the "add-delete-add" scenario on an offline card, where the
      the second "add" merely increments the IP's use count. But as the IP is
      still set to DISP_ADDR_DELETE from the previous "delete" step,
      l3_recover_ip() won't register it with the HW when the card goes online.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98d823ab
    • Julian Wiedmann's avatar
      s390/qeth: fix overestimated count of buffer elements · 12472af8
      Julian Wiedmann authored
      qeth_get_elements_for_range() doesn't know how to handle a 0-length
      range (ie. start == end), and returns 1 when it should return 0.
      Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
      of the skb's linear data) are skipped when mapping the skb into regular
      buffer elements.
      
      This overestimation may cause several performance-related issues:
      1. sub-optimal IO buffer selection, where the next buffer gets selected
         even though the skb would actually still fit into the current buffer.
      2. forced linearization, if the element count for a non-linear skb
         exceeds QETH_MAX_BUFFER_ELEMENTS.
      
      Rather than modifying qeth_get_elements_for_range() and adding overhead
      to every caller, fix up those callers that are in risk of passing a
      0-length range.
      
      Fixes: 2863c613 ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12472af8
    • Claudiu Manoil's avatar
      gianfar: Fix Rx byte accounting for ndev stats · 590399dd
      Claudiu Manoil authored
      Don't include in the Rx bytecount of the packet sent up the stack:
      the FCB (frame control block), and the padding bytes inserted by
      the controller into the frame payload, nor the FCS. All these are
      being pulled out of the skb by gfar_process_frame().
      This issue is old, likely from the driver's beginnings, however
      it was amplified by recent:
      commit d903ec77 ("gianfar: simplify FCS handling and fix memory leak")
      which basically added the FCS to the Rx bytecount, and so brought
      this to my attention.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      590399dd
  2. 27 Feb, 2018 14 commits
  3. 26 Feb, 2018 15 commits
    • Thomas Winter's avatar
      ip_tunnel: Do not use mark in skb by default · 4e994776
      Thomas Winter authored
      This reverts commit 5c38bd1b.
      
      skb->mark contains the mark the encapsulated traffic which
      can result in incorrect routing decisions being made such
      as routing loops if the route chosen is via tunnel itself.
      The correct method should be to use tunnel->fwmark.
      Signed-off-by: default avatarThomas Winter <thomas.winter@alliedtelesis.co.nz>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e994776
    • Ido Schimmel's avatar
      bridge: Fix VLAN reference count problem · 0e5a82ef
      Ido Schimmel authored
      When a VLAN is added on a port, a reference is taken on the
      corresponding master VLAN entry. If it does not already exist, then it
      is created and a reference taken.
      
      However, in the second case a reference is not really taken when
      CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by
      refcount_inc_not_zero().
      
      Fix this by using refcount_set() on a newly created master VLAN entry.
      
      Fixes: 25127759 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e5a82ef
    • Sergei Shtylyov's avatar
      DT: net: renesas,ravb: document R8A77980 bindings · 3a291aa1
      Sergei Shtylyov authored
      Renesas R-Car V3H (R8A77980) SoC has the R-Car gen3 compatible EtherAVB
      device, so document the SoC specific bindings.
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a291aa1
    • Ramon Fried's avatar
      qrtr: add MODULE_ALIAS macro to smd · c77f5fbb
      Ramon Fried authored
      Added MODULE_ALIAS("rpmsg:IPCRTR") to ensure qrtr-smd and qrtr will load
      when IPCRTR channel is detected.
      Signed-off-by: default avatarRamon Fried <rfried@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c77f5fbb
    • Denis Du's avatar
      hdlc_ppp: carrier detect ok, don't turn off negotiation · b6c3bad1
      Denis Du authored
      Sometimes when physical lines have a just good noise to make the protocol
      handshaking fail, but the carrier detect still good. Then after remove of
      the noise, nobody will trigger this protocol to be start again to cause
      the link to never come back. The fix is when the carrier is still on, not
      terminate the protocol handshaking.
      Signed-off-by: default avatarDenis Du <dudenis2000@yahoo.ca>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6c3bad1
    • Jason Wang's avatar
      tuntap: correctly add the missing XDP flush · 1bb4f2e8
      Jason Wang authored
      We don't flush batched XDP packets through xdp_do_flush_map(), this
      will cause packets stall at TX queue. Consider we don't do XDP on NAPI
      poll(), the only possible fix is to call xdp_do_flush_map()
      immediately after xdp_do_redirect().
      
      Note, this in fact won't try to batch packets through devmap, we could
      address in the future.
      Reported-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bb4f2e8
    • Jason Wang's avatar
      tuntap: disable preemption during XDP processing · 23e43f07
      Jason Wang authored
      Except for tuntap, all other drivers' XDP was implemented at NAPI
      poll() routine in a bh. This guarantees all XDP operation were done at
      the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
      for tuntap, we do it in process context and we try to protect XDP
      processing by RCU reader lock. This is insufficient since
      CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
      breaks the assumption that all XDP were processed in the same CPU.
      
      Fixing this by simply disabling preemption during XDP processing.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23e43f07
    • Jason Wang's avatar
      Revert "tuntap: add missing xdp flush" · f249be4d
      Jason Wang authored
      This reverts commit 762c330d. The
      reason is we try to batch packets for devmap which causes calling
      xdp_do_flush() in the process context. Simply disabling preemption
      may not work since process may move among processors which lead
      xdp_do_flush() to miss some flushes on some processors.
      
      So simply revert the patch, a follow-up patch will add the xdp flush
      correctly.
      Reported-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Fixes: 762c330d ("tuntap: add missing xdp flush")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f249be4d
    • Emil Tantilov's avatar
      ixgbe: fix crash in build_skb Rx code path · 0c5661ec
      Emil Tantilov authored
      Add check for build_skb enabled ring in ixgbe_dma_sync_frag().
      In that case &skb_shinfo(skb)->frags[0] may not always be set which
      can lead to a crash. Instead we derive the page offset from skb->data.
      
      Fixes: 42073d91
      ("ixgbe: Have the CPU take ownership of the buffers sooner")
      CC: stable <stable@vger.kernel.org>
      Reported-by: default avatarAmbarish Soman <asoman@redhat.com>
      Suggested-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c5661ec
    • David S. Miller's avatar
      ARM: orion5x: Revert commit 4904dbda. · 13a55372
      David S. Miller authored
      It is not valid for orion5x to use mac_pton().
      
      First of all, the orion5x buffer is not NULL terminated.  mac_pton()
      has no business operating on non-NULL terminated buffers because
      only the caller can know that this is valid and in what manner it
      is ok to parse this NULL'less buffer.
      
      Second of all, orion5x operates on an __iomem pointer, which cannot
      be dereferenced using normal C pointer operations.  Accesses to
      such areas much be performed with the proper iomem accessors.
      
      Fixes: 4904dbda ("ARM: orion5x: use mac_pton() helper")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13a55372
    • David S. Miller's avatar
      Merge branch 'l2tp-fix-API-races-discovered-by-syzbot' · 44e524cf
      David S. Miller authored
      James Chapman says:
      
      ====================
      l2tp: fix API races discovered by syzbot
      
      This patch series addresses several races with L2TP APIs discovered by
      syzbot. There are no functional changes.
      
      The set of patches 1-5 in combination fix the following syzbot reports.
      
      19c09769f WARNING in debug_print_object
      347bd5acd KASAN: use-after-free Read in inet_shutdown
      6e6a5ec8d general protection fault in pppol2tp_connect
      9df43faf0 KASAN: use-after-free Read in pppol2tp_connect
      
      My first attempts to fix these issues were as net-next patches but
      the series included other refactoring and cleanup work. I was asked to
      separate out the bugfixes and redo for the net tree, which is what
      these patches are.
      
      The changes are:
      
       1. Fix inet_shutdown races when L2TP tunnels and sessions close. (patches 1-2)
       2. Fix races with tunnel and its socket. (patch 3)
       3. Fix race in pppol2tp_release with session and its socket. (patch 4)
       4. Fix tunnel lookup use-after-free. (patch 5)
      
      All of the syzbot reproducers hit races in the tunnel and pppol2tp
      session create and destroy paths. These tests create and destroy
      pppol2tp tunnels and sessions rapidly using multiple threads,
      provoking races in several tunnel/session create/destroy paths. The
      key problem was that each tunnel/session socket could be destroyed
      while its associated tunnel/session object still existed (patches 3,
      4). Patch 5 addresses a problem with the way tunnels are removed from
      the tunnel list. Patch 5 is tagged that it addresses all four syzbot
      issues, though all 5 patches are needed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44e524cf
    • James Chapman's avatar
      l2tp: fix tunnel lookup use-after-free race · 28f5bfb8
      James Chapman authored
      l2tp_tunnel_get walks the tunnel list to find a matching tunnel
      instance and if a match is found, its refcount is increased before
      returning the tunnel pointer. But when tunnel objects are destroyed,
      they are on the tunnel list after their refcount hits zero. Fix this
      by moving the code that removes the tunnel from the tunnel list from
      the tunnel socket destructor into in the l2tp_tunnel_delete path,
      before the tunnel refcount is decremented.
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 3 PID: 13507 at lib/refcount.c:153 refcount_inc+0x47/0x50
      Modules linked in:
      CPU: 3 PID: 13507 Comm: syzbot_6e6a5ec8 Not tainted 4.16.0-rc2+ #36
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      RIP: 0010:refcount_inc+0x47/0x50
      RSP: 0018:ffff8800136ffb20 EFLAGS: 00010286
      RAX: dffffc0000000008 RBX: ffff880017068e68 RCX: ffffffff814d3333
      RDX: 0000000000000000 RSI: ffff88001a59f6d8 RDI: ffff88001a59f6d8
      RBP: ffff8800136ffb28 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8800136ffab0 R11: 0000000000000000 R12: ffff880017068e50
      R13: 0000000000000000 R14: ffff8800174da800 R15: 0000000000000004
      FS:  00007f403ab1e700(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205fafd2 CR3: 0000000016770000 CR4: 00000000000006e0
      Call Trace:
       l2tp_tunnel_get+0x2dd/0x4e0
       pppol2tp_connect+0x428/0x13c0
       ? pppol2tp_session_create+0x170/0x170
       ? __might_fault+0x115/0x1d0
       ? lock_downgrade+0x860/0x860
       ? __might_fault+0xe5/0x1d0
       ? security_socket_connect+0x8e/0xc0
       SYSC_connect+0x1b6/0x310
       ? SYSC_bind+0x280/0x280
       ? __do_page_fault+0x5d1/0xca0
       ? up_read+0x1f/0x40
       ? __do_page_fault+0x3c8/0xca0
       SyS_connect+0x29/0x30
       ? SyS_accept+0x40/0x40
       do_syscall_64+0x1e0/0x730
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x7f403a42f259
      RSP: 002b:00007f403ab1dee8 EFLAGS: 00000296 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 00000000205fafe4 RCX: 00007f403a42f259
      RDX: 000000000000002e RSI: 00000000205fafd2 RDI: 0000000000000004
      RBP: 00007f403ab1df20 R08: 00007f403ab1e700 R09: 0000000000000000
      R10: 00007f403ab1e700 R11: 0000000000000296 R12: 0000000000000000
      R13: 00007ffc81906cbf R14: 0000000000000000 R15: 00007f403ab2b040
      Code: 3b ff 5b 5d c3 e8 ca 5f 3b ff 80 3d 49 8e 66 04 00 75 ea e8 bc 5f 3b ff 48 c7 c7 60 69 64 85 c6 05 34 8e 66 04 01 e8 59 49 15 ff <0f> 0b eb ce 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49
      
      Fixes: f8ccac0e ("l2tp: put tunnel socket release on a workqueue")
      Reported-and-tested-by: syzbot+19c09769f14b48810113@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+347bd5acde002e353a36@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+6e6a5ec8de31a94cd015@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+9df43faf09bd400f2993@syzkaller.appspotmail.com
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28f5bfb8
    • James Chapman's avatar
      l2tp: fix race in pppol2tp_release with session object destroy · d02ba2a6
      James Chapman authored
      pppol2tp_release uses call_rcu to put the final ref on its socket. But
      the session object doesn't hold a ref on the session socket so may be
      freed while the pppol2tp_put_sk RCU callback is scheduled. Fix this by
      having the session hold a ref on its socket until the session is
      destroyed. It is this ref that is dropped via call_rcu.
      
      Sessions are also deleted via l2tp_tunnel_closeall. This must now also put
      the final ref via call_rcu. So move the call_rcu call site into
      pppol2tp_session_close so that this happens in both destroy paths. A
      common destroy path should really be implemented, perhaps with
      l2tp_tunnel_closeall calling l2tp_session_delete like pppol2tp_release
      does, but this will be looked at later.
      
      ODEBUG: activate active (active state 1) object type: rcu_head hint:           (null)
      WARNING: CPU: 3 PID: 13407 at lib/debugobjects.c:291 debug_print_object+0x166/0x220
      Modules linked in:
      CPU: 3 PID: 13407 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      RIP: 0010:debug_print_object+0x166/0x220
      RSP: 0018:ffff880013647a00 EFLAGS: 00010082
      RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff814d3333
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001a59f6d0
      RBP: ffff880013647a40 R08: 0000000000000000 R09: 0000000000000001
      R10: ffff8800136479a8 R11: 0000000000000000 R12: 0000000000000001
      R13: ffffffff86161420 R14: ffffffff85648b60 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020e77000 CR3: 0000000006022000 CR4: 00000000000006e0
      Call Trace:
       debug_object_activate+0x38b/0x530
       ? debug_object_assert_init+0x3b0/0x3b0
       ? __mutex_unlock_slowpath+0x85/0x8b0
       ? pppol2tp_session_destruct+0x110/0x110
       __call_rcu.constprop.66+0x39/0x890
       ? __call_rcu.constprop.66+0x39/0x890
       call_rcu_sched+0x17/0x20
       pppol2tp_release+0x2c7/0x440
       ? fcntl_setlk+0xca0/0xca0
       ? sock_alloc_file+0x340/0x340
       sock_release+0x92/0x1e0
       sock_close+0x1b/0x20
       __fput+0x296/0x6e0
       ____fput+0x1a/0x20
       task_work_run+0x127/0x1a0
       do_exit+0x7f9/0x2ce0
       ? SYSC_connect+0x212/0x310
       ? mm_update_next_owner+0x690/0x690
       ? up_read+0x1f/0x40
       ? __do_page_fault+0x3c8/0xca0
       do_group_exit+0x10d/0x330
       ? do_group_exit+0x330/0x330
       SyS_exit_group+0x22/0x30
       do_syscall_64+0x1e0/0x730
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x7f362e471259
      RSP: 002b:00007ffe389abe08 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f362e471259
      RDX: 00007f362e471259 RSI: 000000000000002e RDI: 0000000000000000
      RBP: 00007ffe389abe30 R08: 0000000000000000 R09: 00007f362e944270
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
      R13: 00007ffe389abf50 R14: 0000000000000000 R15: 0000000000000000
      Code: 8d 3c dd a0 8f 64 85 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 8b 14 dd a0 8f 64 85 4c 89 f6 48 c7 c7 20 85 64 85 e
      8 2a 55 14 ff <0f> 0b 83 05 ad 2a 68 04 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41
      
      Fixes: ee40fb2e ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d02ba2a6
    • James Chapman's avatar
      l2tp: fix races with tunnel socket close · d00fa9ad
      James Chapman authored
      The tunnel socket tunnel->sock (struct sock) is accessed when
      preparing a new ppp session on a tunnel at pppol2tp_session_init. If
      the socket is closed by a thread while another is creating a new
      session, the threads race. In pppol2tp_connect, the tunnel object may
      be created if the pppol2tp socket is associated with the special
      session_id 0 and the tunnel socket is looked up using the provided
      fd. When handling this, pppol2tp_connect cannot sock_hold the tunnel
      socket to prevent it being destroyed during pppol2tp_connect since
      this may itself may race with the socket being destroyed. Doing
      sockfd_lookup in pppol2tp_connect isn't sufficient to prevent
      tunnel->sock going away either because a given tunnel socket fd may be
      reused between calls to pppol2tp_connect. Instead, have
      l2tp_tunnel_create sock_hold the tunnel socket before it does
      sockfd_put. This ensures that the tunnel's socket is always extant
      while the tunnel object exists. Hold a ref on the socket until the
      tunnel is destroyed and ensure that all tunnel destroy paths go
      through a common function (l2tp_tunnel_delete) since this will do the
      final sock_put to release the tunnel socket.
      
      Since the tunnel's socket is now guaranteed to exist if the tunnel
      exists, we no longer need to use sockfd_lookup via l2tp_sock_to_tunnel
      to derive the tunnel from the socket since this is always
      sk_user_data.
      
      Also, sessions no longer sock_hold the tunnel socket since sessions
      already hold a tunnel ref and the tunnel sock will not be freed until
      the tunnel is freed. Removing these sock_holds in
      l2tp_session_register avoids a possible sock leak in the
      pppol2tp_connect error path if l2tp_session_register succeeds but
      attaching a ppp channel fails. The pppol2tp_connect error path could
      have been fixed instead and have the sock ref dropped when the session
      is freed, but doing a sock_put of the tunnel socket when the session
      is freed would require a new session_free callback. It is simpler to
      just remove the sock_hold of the tunnel socket in
      l2tp_session_register, now that the tunnel socket lifetime is
      guaranteed.
      
      Finally, some init code in l2tp_tunnel_create is reordered to ensure
      that the new tunnel object's refcount is set and the tunnel socket ref
      is taken before the tunnel socket destructor callbacks are set.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 4360 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #34
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      RIP: 0010:pppol2tp_session_init+0x1d6/0x500
      RSP: 0018:ffff88001377fb40 EFLAGS: 00010212
      RAX: dffffc0000000000 RBX: ffff88001636a940 RCX: ffffffff84836c1d
      RDX: 0000000000000045 RSI: 0000000055976744 RDI: 0000000000000228
      RBP: ffff88001377fb60 R08: ffffffff84836bc8 R09: 0000000000000002
      R10: ffff88001377fab8 R11: 0000000000000001 R12: 0000000000000000
      R13: ffff88001636aac8 R14: ffff8800160f81c0 R15: 1ffff100026eff76
      FS:  00007ffb3ea66700(0000) GS:ffff88001a400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020e77000 CR3: 0000000016261000 CR4: 00000000000006f0
      Call Trace:
       pppol2tp_connect+0xd18/0x13c0
       ? pppol2tp_session_create+0x170/0x170
       ? __might_fault+0x115/0x1d0
       ? lock_downgrade+0x860/0x860
       ? __might_fault+0xe5/0x1d0
       ? security_socket_connect+0x8e/0xc0
       SYSC_connect+0x1b6/0x310
       ? SYSC_bind+0x280/0x280
       ? __do_page_fault+0x5d1/0xca0
       ? up_read+0x1f/0x40
       ? __do_page_fault+0x3c8/0xca0
       SyS_connect+0x29/0x30
       ? SyS_accept+0x40/0x40
       do_syscall_64+0x1e0/0x730
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x7ffb3e376259
      RSP: 002b:00007ffeda4f6508 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 0000000020e77012 RCX: 00007ffb3e376259
      RDX: 000000000000002e RSI: 0000000020e77000 RDI: 0000000000000004
      RBP: 00007ffeda4f6540 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
      R13: 00007ffeda4f6660 R14: 0000000000000000 R15: 0000000000000000
      Code: 80 3d b0 ff 06 02 00 0f 84 07 02 00 00 e8 13 d6 db fc 49 8d bc 24 28 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f
      a 48 c1 ea 03 <80> 3c 02 00 0f 85 ed 02 00 00 4d 8b a4 24 28 02 00 00 e8 13 16
      
      Fixes: 80d84ef3 ("l2tp: prevent l2tp_tunnel_delete racing with userspace close")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00fa9ad
    • James Chapman's avatar
      l2tp: don't use inet_shutdown on ppp session destroy · 225eb264
      James Chapman authored
      Previously, if a ppp session was closed, we called inet_shutdown to mark
      the socket as unconnected such that userspace would get errors and
      then close the socket. This could race with userspace closing the
      socket. Instead, leave userspace to close the socket in its own time
      (our session will be detached anyway).
      
      BUG: KASAN: use-after-free in inet_shutdown+0x5d/0x1c0
      Read of size 4 at addr ffff880010ea3ac0 by task syzbot_347bd5ac/8296
      
      CPU: 3 PID: 8296 Comm: syzbot_347bd5ac Not tainted 4.16.0-rc1+ #91
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      Call Trace:
       dump_stack+0x101/0x157
       ? inet_shutdown+0x5d/0x1c0
       print_address_description+0x78/0x260
       ? inet_shutdown+0x5d/0x1c0
       kasan_report+0x240/0x360
       __asan_load4+0x78/0x80
       inet_shutdown+0x5d/0x1c0
       ? pppol2tp_show+0x80/0x80
       pppol2tp_session_close+0x68/0xb0
       l2tp_tunnel_closeall+0x199/0x210
       ? udp_v6_flush_pending_frames+0x90/0x90
       l2tp_udp_encap_destroy+0x6b/0xc0
       ? l2tp_tunnel_del_work+0x2e0/0x2e0
       udpv6_destroy_sock+0x8c/0x90
       sk_common_release+0x47/0x190
       udp_lib_close+0x15/0x20
       inet_release+0x85/0xd0
       inet6_release+0x43/0x60
       sock_release+0x53/0x100
       ? sock_alloc_file+0x260/0x260
       sock_close+0x1b/0x20
       __fput+0x19f/0x380
       ____fput+0x1a/0x20
       task_work_run+0xd2/0x110
       exit_to_usermode_loop+0x18d/0x190
       do_syscall_64+0x389/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      RIP: 0033:0x7fe240a45259
      RSP: 002b:00007fe241132df8 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe240a45259
      RDX: 00007fe240a45259 RSI: 0000000000000000 RDI: 00000000000000a5
      RBP: 00007fe241132e20 R08: 00007fe241133700 R09: 0000000000000000
      R10: 00007fe241133700 R11: 0000000000000297 R12: 0000000000000000
      R13: 00007ffc49aff84f R14: 0000000000000000 R15: 00007fe241141040
      
      Allocated by task 8331:
       save_stack+0x43/0xd0
       kasan_kmalloc+0xad/0xe0
       kasan_slab_alloc+0x12/0x20
       kmem_cache_alloc+0x144/0x3e0
       sock_alloc_inode+0x22/0x130
       alloc_inode+0x3d/0xf0
       new_inode_pseudo+0x1c/0x90
       sock_alloc+0x30/0x110
       __sock_create+0xaa/0x4c0
       SyS_socket+0xbe/0x130
       do_syscall_64+0x128/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      
      Freed by task 8314:
       save_stack+0x43/0xd0
       __kasan_slab_free+0x11a/0x170
       kasan_slab_free+0xe/0x10
       kmem_cache_free+0x88/0x2b0
       sock_destroy_inode+0x49/0x50
       destroy_inode+0x77/0xb0
       evict+0x285/0x340
       iput+0x429/0x530
       dentry_unlink_inode+0x28c/0x2c0
       __dentry_kill+0x1e3/0x2f0
       dput.part.21+0x500/0x560
       dput+0x24/0x30
       __fput+0x2aa/0x380
       ____fput+0x1a/0x20
       task_work_run+0xd2/0x110
       exit_to_usermode_loop+0x18d/0x190
       do_syscall_64+0x389/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      225eb264