1. 12 Jul, 2018 18 commits
    • Willem de Bruijn's avatar
      nsh: set mac len based on inner packet · bab2c80e
      Willem de Bruijn authored
      When pulling the NSH header in nsh_gso_segment, set the mac length
      based on the encapsulated packet type.
      
      skb_reset_mac_len computes an offset to the network header, which
      here still points to the outer packet:
      
        >     skb_reset_network_header(skb);
        >     [...]
        >     __skb_pull(skb, nsh_len);
        >     skb_reset_mac_header(skb);    // now mac hdr starts nsh_len == 8B after net hdr
        >     skb_reset_mac_len(skb);       // mac len = net hdr - mac hdr == (u16) -8 == 65528
        >     [..]
        >     skb_mac_gso_segment(skb, ..)
      
      Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com
      Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com
      Fixes: c411ed85 ("nsh: add GSO support")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bab2c80e
    • Stefano Brivio's avatar
      net: Don't copy pfmemalloc flag in __copy_skb_header() · 8b700862
      Stefano Brivio authored
      The pfmemalloc flag indicates that the skb was allocated from
      the PFMEMALLOC reserves, and the flag is currently copied on skb
      copy and clone.
      
      However, an skb copied from an skb flagged with pfmemalloc
      wasn't necessarily allocated from PFMEMALLOC reserves, and on
      the other hand an skb allocated that way might be copied from an
      skb that wasn't.
      
      So we should not copy the flag on skb copy, and rather decide
      whether to allow an skb to be associated with sockets unrelated
      to page reclaim depending only on how it was allocated.
      
      Move the pfmemalloc flag before headers_start[0] using an
      existing 1-bit hole, so that __copy_skb_header() doesn't copy
      it.
      
      When cloning, we'll now take care of this flag explicitly,
      contravening to the warning comment of __skb_clone().
      
      While at it, restore the newline usage introduced by commit
      b1937227 ("net: reorganize sk_buff for faster
      __copy_skb_header()") to visually separate bytes used in
      bitfields after headers_start[0], that was gone after commit
      a9e419dc ("netfilter: merge ctinfo into nfct pointer storage
      area"), and describe the pfmemalloc flag in the kernel-doc
      structure comment.
      
      This doesn't change the size of sk_buff or cacheline boundaries,
      but consolidates the 15 bits hole before tc_index into a 2 bytes
      hole before csum, that could now be filled more easily.
      Reported-by: default avatarPatrick Talbert <ptalbert@redhat.com>
      Fixes: c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC reserves")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b700862
    • David S. Miller's avatar
      Merge branch 'sfc-filter-locking-fixes' · 1ff9c66b
      David S. Miller authored
      Bert Kenward says:
      
      ====================
      sfc: filter locking fixes
      
      Two fixes for sfc ef10 filter table locking. Initially spotted
      by lockdep, but one issue has also been seen in normal use.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ff9c66b
    • Bert Kenward's avatar
      sfc: hold filter_sem consistently during reset · 193f2003
      Bert Kenward authored
      We should take and release the filter_sem consistently during the
      reset process, in the same manner as the mac_lock and reset_lock.
      
      For lockdep consistency we also take the filter_sem for write around
      other calls to efx->type->init().
      
      Fixes: c2bebe37 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      193f2003
    • Bert Kenward's avatar
      sfc: avoid hang from nested use of the filter_sem · 1c56c099
      Bert Kenward authored
      In some situations we may end up calling down_read while already
      holding the semaphore for write, thus hanging. This has been seen
      when setting the MAC address for the interface. The hung task log
      in this situation includes this stack:
        down_read
        efx_ef10_filter_insert
        efx_ef10_filter_insert_addr_list
        efx_ef10_filter_vlan_sync_rx_mode
        efx_ef10_filter_add_vlan
        efx_ef10_filter_table_probe
        efx_ef10_set_mac_address
        efx_set_mac_address
        dev_set_mac_address
      
      In addition, lockdep rightly points out that nested calling of
      down_read is incorrect.
      
      Fixes: c2bebe37 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
      Tested-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c56c099
    • Florian Fainelli's avatar
      net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite · 9e3bff92
      Florian Fainelli authored
      SYSTEMPORT Lite reversed the logic compared to SYSTEMPORT, the
      GIB_FCS_STRIP bit is set when the Ethernet FCS is stripped, and that bit
      is not set by default. Fix the logic such that we properly check whether
      that bit is set or not and we don't forward an extra 4 bytes to the
      network stack.
      
      Fixes: 44a4524c ("net: systemport: Add support for SYSTEMPORT Lite")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e3bff92
    • Stefan Baranoff's avatar
      tcp: allow user to create repair socket without window probes · 70b7ff13
      Stefan Baranoff authored
      Under rare conditions where repair code may be used it is possible that
      window probes are either unnecessary or undesired. If the user knows that
      window probes are not wanted or needed this change allows them to skip
      sending them when a socket comes out of repair.
      Signed-off-by: default avatarStefan Baranoff <sbaranoff@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70b7ff13
    • Stefan Baranoff's avatar
      tcp: fix sequence numbers for repaired sockets re-using TIME-WAIT sockets · 21684dc4
      Stefan Baranoff authored
      This patch fixes a bug where the sequence numbers of a socket created using
      TCP repair functionality are lower than set after connect is called.
      This occurs when the repair socket overlaps with a TIME-WAIT socket and
      triggers the re-use code. The amount lower is equal to the number of times
      that a particular IP/port set is re-used and then put back into TIME-WAIT.
      Re-using the first time the sequence number is 1 lower, closing that socket
      and then re-opening (with repair) a new socket with the same addresses/ports
      puts the sequence number 2 lower than set via setsockopt. The third time is
      3 lower, etc. I have not tested what the limit of this acrewal is, if any.
      
      The fix is, if a socket is in repair mode, to respect the already set
      sequence number and timestamp when it would have already re-used the
      TIME-WAIT socket.
      Signed-off-by: default avatarStefan Baranoff <sbaranoff@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21684dc4
    • Jacob Keller's avatar
      sch_fq_codel: zero q->flows_cnt when fq_codel_init fails · 83fe6b87
      Jacob Keller authored
      When fq_codel_init fails, qdisc_create_dflt will cleanup by using
      qdisc_destroy. This function calls the ->reset() op prior to calling the
      ->destroy() op.
      
      Unfortunately, during the failure flow for sch_fq_codel, the ->flows
      parameter is not initialized, so the fq_codel_reset function will null
      pointer dereference.
      
         kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
         kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel]
         kernel: PGD 0 P4D 0
         kernel: Oops: 0000 [#1] SMP PTI
         kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci
         kernel:  mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e]
         kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G           OE    4.16.13custom-fq-codel-test+ #3
         kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015
         kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel]
         kernel: RSP: 0018:ffffbfbf4c1fb620 EFLAGS: 00010246
         kernel: RAX: 0000000000000400 RBX: 0000000000000000 RCX: 00000000000005b9
         kernel: RDX: 0000000000000000 RSI: ffff9d03264a60c0 RDI: ffff9cfd17b31c00
         kernel: RBP: 0000000000000001 R08: 00000000000260c0 R09: ffffffffb679c3e9
         kernel: R10: fffff1dab06a0e80 R11: ffff9cfd163af800 R12: ffff9cfd17b31c00
         kernel: R13: 0000000000000001 R14: ffff9cfd153de600 R15: 0000000000000001
         kernel: FS:  00007fdec2f92800(0000) GS:ffff9d0326480000(0000) knlGS:0000000000000000
         kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         kernel: CR2: 0000000000000008 CR3: 0000000c1956a006 CR4: 00000000000606e0
         kernel: Call Trace:
         kernel:  qdisc_destroy+0x56/0x140
         kernel:  qdisc_create_dflt+0x8b/0xb0
         kernel:  mq_init+0xc1/0xf0
         kernel:  qdisc_create_dflt+0x5a/0xb0
         kernel:  dev_activate+0x205/0x230
         kernel:  __dev_open+0xf5/0x160
         kernel:  __dev_change_flags+0x1a3/0x210
         kernel:  dev_change_flags+0x21/0x60
         kernel:  do_setlink+0x660/0xdf0
         kernel:  ? down_trylock+0x25/0x30
         kernel:  ? xfs_buf_trylock+0x1a/0xd0 [xfs]
         kernel:  ? rtnl_newlink+0x816/0x990
         kernel:  ? _xfs_buf_find+0x327/0x580 [xfs]
         kernel:  ? _cond_resched+0x15/0x30
         kernel:  ? kmem_cache_alloc+0x20/0x1b0
         kernel:  ? rtnetlink_rcv_msg+0x200/0x2f0
         kernel:  ? rtnl_calcit.isra.30+0x100/0x100
         kernel:  ? netlink_rcv_skb+0x4c/0x120
         kernel:  ? netlink_unicast+0x19e/0x260
         kernel:  ? netlink_sendmsg+0x1ff/0x3c0
         kernel:  ? sock_sendmsg+0x36/0x40
         kernel:  ? ___sys_sendmsg+0x295/0x2f0
         kernel:  ? ebitmap_cmp+0x6d/0x90
         kernel:  ? dev_get_by_name_rcu+0x73/0x90
         kernel:  ? skb_dequeue+0x52/0x60
         kernel:  ? __inode_wait_for_writeback+0x7f/0xf0
         kernel:  ? bit_waitqueue+0x30/0x30
         kernel:  ? fsnotify_grab_connector+0x3c/0x60
         kernel:  ? __sys_sendmsg+0x51/0x90
         kernel:  ? do_syscall_64+0x74/0x180
         kernel:  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
         kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00
         kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: ffffbfbf4c1fb620
         kernel: CR2: 0000000000000008
         kernel: ---[ end trace e81a62bede66274e ]---
      
      This is caused because flows_cnt is non-zero, but flows hasn't been
      initialized. fq_codel_init has left the private data in a partially
      initialized state.
      
      To fix this, reset flows_cnt to 0 when we fail to initialize.
      Additionally, to make the state more consistent, also cleanup the flows
      pointer when the allocation of backlogs fails.
      
      This fixes the NULL pointer dereference, since both the for-loop and
      memset in fq_codel_reset will be no-ops when flow_cnt is zero.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83fe6b87
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 35288486
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-07-12
      
      This series contains updates to ixgbe and e100/e1000 kernel documentation.
      
      Alex fixes ixgbe to ensure that we are more explicit about the ordering
      of updates to the receive address register (RAR) table.
      
      Dan Carpenter fixes an issue where we were reading one element beyond
      the end of the array.
      
      Mauro Carvalho Chehab fixes formatting issues in the e100.rst and
      e1000.rst that were causing errors during 'make htmldocs'.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35288486
    • Mauro Carvalho Chehab's avatar
      networking: e1000.rst: Get rid of Sphinx warnings · 8dc4b1a7
      Mauro Carvalho Chehab authored
      Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
          Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent.
          Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent.
          Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent.
      
      While here, fix highlights and mark a table as such.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8dc4b1a7
    • Mauro Carvalho Chehab's avatar
      networking: e100.rst: Get rid of Sphinx warnings · b203cc7a
      Mauro Carvalho Chehab authored
      Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found.
          Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found.
          Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found.
          Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found.
          Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string.
      
      While here, fix some highlights.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b203cc7a
    • Dan Carpenter's avatar
      ixgbe: Off by one in ixgbe_ipsec_tx() · c4111041
      Dan Carpenter authored
      The ipsec->tx_tbl[] has IXGBE_IPSEC_MAX_SA_COUNT elements so the > needs
      to be changed to >= so we don't read one element beyond the end of the
      array.
      
      Fixes: 59259470 ("ixgbe: process the Tx ipsec offload")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c4111041
    • Alexander Duyck's avatar
      ixgbe: Be more careful when modifying MAC filters · d14c780c
      Alexander Duyck authored
      This change makes it so that we are much more explicit about the ordering
      of updates to the receive address register (RAR) table. Prior to this patch
      I believe we may have been updating the table while entries were still
      active, or possibly allowing for reordering of things since we weren't
      explicitly flushing writes to either the lower or upper portion of the
      register prior to accessing the other half.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d14c780c
    • David S. Miller's avatar
      Merge branch 'ieee802154-for-davem-2018-07-11' of... · 672f5cce
      David S. Miller authored
      Merge branch 'ieee802154-for-davem-2018-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2018-07-11
      
      An update from ieee802154 for your *net* tree.
      
      Build system fix for a missing include from Arnd Bergmann.
      Setting the IFLA_LINK for the lowpan parent from Lubomir Rintel.
      Fixes for some RX corner cases in adf7242 driver by Michael Hennerich.
      And some small patches to cleanup our BUG_ON vs WARN_ON usage.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      672f5cce
    • Ewan D. Milne's avatar
      qed: fix spelling mistake "successffuly" -> "successfully" · 20c4515a
      Ewan D. Milne authored
      Trivial fix to spelling mistake in qed_probe message.
      Signed-off-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20c4515a
    • Russell King's avatar
      sfp: fix module initialisation with netdev already up · 576cd320
      Russell King authored
      It was been observed that with a particular order of initialisation,
      the netdev can be up, but the SFP module still has its TX_DISABLE
      signal asserted.  This occurs when the network device brought up before
      the SFP kernel module has been inserted by userspace.
      
      This occurs because sfp-bus layer does not hear about the change in
      network device state, and so assumes that it is still down.  Set
      netdev->sfp when the upstream is registered to work around this problem.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      576cd320
    • Russell King's avatar
      sfp: ensure we clean up properly on bus registration failure · f20a4c46
      Russell King authored
      We fail to correctly clean up after a bus registration failure, which
      can lead to an incorrect assumption about the registration state of
      the upstream or sfp cage.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f20a4c46
  2. 09 Jul, 2018 17 commits
    • Taehee Yoo's avatar
      rhashtable: add restart routine in rhashtable_free_and_destroy() · 0026129c
      Taehee Yoo authored
      rhashtable_free_and_destroy() cancels re-hash deferred work
      then walks and destroys elements. at this moment, some elements can be
      still in future_tbl. that elements are not destroyed.
      
      test case:
      nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
      all elements of sets before destroying sets and chains.
      But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
      so that splat occurred.
      
      test script:
         %cat test.nft
         table ip aa {
      	   map map1 {
      		   type ipv4_addr : verdict;
      		   elements = {
      			   0 : jump a0,
      			   1 : jump a0,
      			   2 : jump a0,
      			   3 : jump a0,
      			   4 : jump a0,
      			   5 : jump a0,
      			   6 : jump a0,
      			   7 : jump a0,
      			   8 : jump a0,
      			   9 : jump a0,
      		}
      	   }
      	   chain a0 {
      	   }
         }
         flush ruleset
         table ip aa {
      	   map map1 {
      		   type ipv4_addr : verdict;
      		   elements = {
      			   0 : jump a0,
      			   1 : jump a0,
      			   2 : jump a0,
      			   3 : jump a0,
      			   4 : jump a0,
      			   5 : jump a0,
      			   6 : jump a0,
      			   7 : jump a0,
      			   8 : jump a0,
      			   9 : jump a0,
      		   }
      	   }
      	   chain a0 {
      	   }
         }
         flush ruleset
      
         %while :; do nft -f test.nft; done
      
      Splat looks like:
      [  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
      [  200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
      [  200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
      [  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
      [  200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
      [  200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
      [  200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
      [  200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
      [  200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
      [  200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
      [  200.906354] FS:  00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
      [  200.915533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
      [  200.930353] Call Trace:
      [  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
      [  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
      [  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
      [  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
      [  200.959532]  ? nla_parse+0xab/0x230
      [  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
      [  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
      [  200.975525]  ? debug_show_all_locks+0x290/0x290
      [  200.980363]  ? debug_show_all_locks+0x290/0x290
      [  200.986356]  ? sched_clock_cpu+0x132/0x170
      [  200.990352]  ? find_held_lock+0x39/0x1b0
      [  200.994355]  ? sched_clock_local+0x10d/0x130
      [  200.999531]  ? memset+0x1f/0x40
      
      V2:
       - free all tables requested by Herbert Xu
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0026129c
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 252dd176
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      These are bug fixes in error code paths, TC Flower VLAN TCI flow
      checking bug fix, proper filtering of Broadcast packets if IFF_BROADCAST
      is not set, and a bug fix in bnxt_get_max_rings() to return 0 ring
      parameters when the return value is -ENOMEM.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      252dd176
    • Vikas Gupta's avatar
      bnxt_en: Fix for system hang if request_irq fails · c58387ab
      Vikas Gupta authored
      Fix bug in the error code path when bnxt_request_irq() returns failure.
      bnxt_disable_napi() should not be called in this error path because
      NAPI has not been enabled yet.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c58387ab
    • Michael Chan's avatar
      bnxt_en: Do not modify max IRQ count after RDMA driver requests/frees IRQs. · 30f52947
      Michael Chan authored
      Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
      freed by the RDMA driver is flawed.  The max IRQ count is checked when
      re-initializing the IRQ vectors and this can happen multiple times
      during ifup or ethtool -L.  If the max IRQ is reduced and the RDMA
      driver is operational, we may not initailize IRQs correctly.  This
      problem shows up on VFs with very small number of MSIX.
      
      There is no other logic that relies on the IRQ count excluding the ones
      used by RDMA.  So we fix it by just removing the call to subtract or
      add the IRQs used by RDMA.
      
      Fixes: a588e458 ("bnxt_en: Add interface to support RDMA driver.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30f52947
    • Michael Chan's avatar
      bnxt_en: Support clearing of the IFF_BROADCAST flag. · 30e33848
      Michael Chan authored
      Currently, the driver assumes IFF_BROADCAST is always set and always sets
      the broadcast filter.  Modify the code to set or clear the broadcast
      filter according to the IFF_BROADCAST flag.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e33848
    • Michael Chan's avatar
      bnxt_en: Always set output parameters in bnxt_get_max_rings(). · 78f058a4
      Michael Chan authored
      The current code returns -ENOMEM and does not bother to set the output
      parameters to 0 when no rings are available.  Some callers, such as
      bnxt_get_channels() will display garbage ring numbers when that happens.
      Fix it by always setting the output parameters.
      
      Fixes: 6e6c5a57 ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78f058a4
    • Michael Chan's avatar
      bnxt_en: Fix inconsistent BNXT_FLAG_AGG_RINGS logic. · 07f4fde5
      Michael Chan authored
      If there aren't enough RX rings available, the driver will attempt to
      use a single RX ring without the aggregation ring.  If that also
      fails, the BNXT_FLAG_AGG_RINGS flag is cleared but the other ring
      parameters are not set consistently to reflect that.  If more RX
      rings become available at the next open, the RX rings will be in
      an inconsistent state and may crash when freeing the RX rings.
      
      Fix it by restoring the BNXT_FLAG_AGG_RINGS if not enough RX rings are
      available to run without aggregation rings.
      
      Fixes: bdbd1eb5 ("bnxt_en: Handle no aggregation ring gracefully.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07f4fde5
    • Venkat Duvvuru's avatar
      bnxt_en: Fix the vlan_tci exact match check. · e32d4e60
      Venkat Duvvuru authored
      It is possible that OVS may set don’t care for DEI/CFI bit in
      vlan_tci mask. Hence, checking for vlan_tci exact match will endup
      in a vlan flow rejection.
      
      This patch fixes the problem by checking for vlan_pcp and vid
      separately, instead of checking for the entire vlan_tci.
      
      Fixes: e85a9be9 (bnxt_en: do not allow wildcard matches for L2 flows)
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e32d4e60
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 26420d9c
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree:
      
      1) Missing module autoloadfor icmp and icmpv6 x_tables matches,
         from Florian Westphal.
      
      2) Possible non-linear access to TCP header from tproxy, from
         Mate Eckl.
      
      3) Do not allow rbtree to be used for single elements, this patch
         moves all set backend into one single module since such thing
         can only happen if hashtable module is explicitly blacklisted,
         which should not ever be done.
      
      4) Reject error and standard targets from nft_compat for sanity
         reasons, they are never used from there.
      
      5) Don't crash on double hashsize module parameter, from Andrey
         Ryabinin.
      
      6) Drop dst on skb before placing it in the fragmentation
         reassembly queue, from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26420d9c
    • Florian Westphal's avatar
      netfilter: ipv6: nf_defrag: drop skb dst before queueing · 84379c9a
      Florian Westphal authored
      Eric Dumazet reports:
       Here is a reproducer of an annoying bug detected by syzkaller on our production kernel
       [..]
       ./b78305423 enable_conntrack
       Then :
       sleep 60
       dmesg | tail -10
       [  171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2
       [  181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2
       [  191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2
       [  201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2
       [  211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2
       [  221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook.
      skb gets queued until frag timer expiry -- 1 minute.
      
      Normally nf_conntrack_reasm gets called during prerouting, so skb has
      no dst yet which might explain why this wasn't spotted earlier.
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      84379c9a
    • Andrey Ryabinin's avatar
      netfilter: nf_conntrack: Fix possible possible crash on module loading. · 2045cdfa
      Andrey Ryabinin authored
      Loading the nf_conntrack module with doubled hashsize parameter, i.e.
      	  modprobe nf_conntrack hashsize=12345 hashsize=12345
      causes NULL-ptr deref.
      
      If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function
      will be called also twice.
      The first nf_conntrack_set_hashsize() call will set the
      'nf_conntrack_htable_size' variable:
      
      	nf_conntrack_set_hashsize()
      		...
      		/* On boot, we can set this without any fancy locking. */
      		if (!nf_conntrack_htable_size)
      			return param_set_uint(val, kp);
      
      But on the second invocation, the nf_conntrack_htable_size is already set,
      so the nf_conntrack_set_hashsize() will take a different path and call
      the nf_conntrack_hash_resize() function. Which will crash on the attempt
      to dereference 'nf_conntrack_hash' pointer:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      	RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack]
      	Call Trace:
      	 nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack]
      	 parse_args+0x1f9/0x5a0
      	 load_module+0x1281/0x1a50
      	 __se_sys_finit_module+0xbe/0xf0
      	 do_syscall_64+0x7c/0x390
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this, by checking !nf_conntrack_hash instead of
      !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only
      after the module loaded, so the second invocation of the
      nf_conntrack_set_hashsize() won't crash, it will just reinitialize
      nf_conntrack_htable_size again.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2045cdfa
    • Florian Westphal's avatar
      netfilter: nft_compat: explicitly reject ERROR and standard target · 21d5e078
      Florian Westphal authored
      iptables-nft never requests these, but make this explicitly illegal.
      If it were quested, kernel could oops as ->eval is NULL, furthermore,
      the builtin targets have no owning module so its possible to rmmod
      eb/ip/ip6_tables module even if they would be loaded.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      21d5e078
    • Michael Hennerich's avatar
      net: ieee802154: adf7242: Fix OCL calibration runs · 58e9683d
      Michael Hennerich authored
      Reissuing RC_RX every 400ms - to adjust for offset drift in
      receiver see datasheet page 61, OCL section.
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      58e9683d
    • Michael Hennerich's avatar
      net: ieee802154: adf7242: Fix erroneous RX enable · 36d26d6b
      Michael Hennerich authored
      Only enable RX mode if the netdev is opened.
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      36d26d6b
    • Stefan Schmidt's avatar
      ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem · 8f2fbc6c
      Stefan Schmidt authored
      The check is valid but it does not warrant to crash the kernel. A
      WARN_ON() is good enough here.
      Found by checkpatch.
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      8f2fbc6c
    • Stefan Schmidt's avatar
      ieee802154: at86rf230: use __func__ macro for debug messages · 8a81388e
      Stefan Schmidt authored
      Instead of having the function name hard-coded (it might change and we
      forgot to update them in the debug output) we can use __func__ instead
      and also shorter the line so we do not need to break it. Also fix an
      extra blank line while being here.
      Found by checkpatch.
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      8a81388e
    • Stefan Schmidt's avatar
      ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem · 20f33045
      Stefan Schmidt authored
      The check is valid but it does not warrant to crash the kernel. A
      WARN_ON() is good enough here.
      Found by checkpatch.
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      20f33045
  3. 08 Jul, 2018 5 commits
    • Eric Dumazet's avatar
      tcp: cleanup copied_seq and urg_data in tcp_disconnect · 6508b678
      Eric Dumazet authored
      tcp_zerocopy_receive() relies on tcp_inq() to limit number of bytes
      requested by user.
      
      syzbot found that after tcp_disconnect(), tcp_inq() was returning
      a stale value (number of bytes in queue before the disconnect).
      
      Note that after this patch, ioctl(fd, SIOCINQ, &val) is also fixed
      and returns 0, so this might be a candidate for all known linux kernels.
      
      While we are at this, we probably also should clear urg_data to
      avoid other syzkaller reports after it discovers how to deal with
      urgent data.
      
      syzkaller repro :
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
      bind(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("224.0.0.1")}, 16) = 0
      connect(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
      send(3, ..., 4096, 0) = 4096
      connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 128) = 0
      getsockopt(3, SOL_TCP, TCP_ZEROCOPY_RECEIVE, ..., [16]) = 0 // CRASH
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6508b678
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7f93d129
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-07-07
      
      The following pull-request contains BPF updates for your *net* tree.
      
      Plenty of fixes for different components:
      
      1) A set of critical fixes for sockmap and sockhash, from John Fastabend.
      
      2) fixes for several race conditions in af_xdp, from Magnus Karlsson.
      
      3) hash map refcnt fix, from Mauricio Vasquez.
      
      4) samples/bpf fixes, from Taeung Song.
      
      5) ifup+mtu check for xdp_redirect, from Toshiaki Makita.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f93d129
    • Paolo Abeni's avatar
      ipfrag: really prevent allocation on netns exit · f6f2a4a2
      Paolo Abeni authored
      Setting the low threshold to 0 has no effect on frags allocation,
      we need to clear high_thresh instead.
      
      The code was pre-existent to commit 648700f7 ("inet: frags:
      use rhashtables for reassembly units"), but before the above,
      such assignment had a different role: prevent concurrent eviction
      from the worker and the netns cleanup helper.
      
      Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6f2a4a2
    • Lorenzo Colitti's avatar
      net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort · acc2cf4e
      Lorenzo Colitti authored
      When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first
      frees it by calling inet_csk_reqsk_queue_drop_and_and_put in
      tcp_abort, and then frees it again by calling sock_gen_put.
      
      Since tcp_abort only has one caller, and all the other codepaths
      in tcp_abort don't free the socket, just remove the free in that
      function.
      
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Tested: passes Android sock_diag_test.py, which exercises this codepath
      Fixes: d7226c7a ("net: diag: Fix refcnt leak in error path destroying socket")
      Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acc2cf4e
    • David Ahern's avatar
      net/ipv4: Set oif in fib_compute_spec_dst · e7372197
      David Ahern authored
      Xin reported that icmp replies may not use the address on the device the
      echo request is received if the destination address is broadcast. Instead
      a route lookup is done without considering VRF context. Fix by setting
      oif in flow struct to the master device if it is enslaved. That directs
      the lookup to the VRF table. If the device is not enslaved, oif is still
      0 so no affect.
      
      Fixes: cd2fbe1b ("net: Use VRF device index for lookups on RX")
      Reported-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7372197