1. 19 Oct, 2017 2 commits
    • Netanel Belgazal's avatar
      net: ena: fix rare kernel crash when bar memory remap fails · 411838e7
      Netanel Belgazal authored
      This failure is rare and only found on testing where deliberately fail
      devm_ioremap()
      
      [  451.170464] ena 0000:04:00.0: failed to remap regs bar
      451.170549] Workqueue: pciehp-1 pciehp_power_thread
      [  451.170551] task: ffff88085a5f2d00 task.stack: ffffc9000756c000
      [  451.170552] RIP: 0010:devm_iounmap+0x2d/0x40
      [  451.170553] RSP: 0018:ffffc9000756fac0 EFLAGS: 00010282
      [  451.170554] RAX: 00000000fffffffe RBX: 0000000000000000 RCX:
      0000000000000000
      [  451.170555] RDX: ffffffff813a7e00 RSI: 0000000000000282 RDI:
      0000000000000282
      [  451.170556] RBP: ffffc9000756fac8 R08: 00000000fffffffe R09:
      00000000000009b7
      [  451.170557] R10: 0000000000000005 R11: 00000000000009b6 R12:
      ffff880856c9d0a0
      [  451.170558] R13: ffffc9000f5c90c0 R14: ffff880856c9d0a0 R15:
      0000000000000028
      [  451.170559] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
      knlGS:0000000000000000
      [  451.170560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  451.170561] CR2: 00007f169038b000 CR3: 0000000001c09000 CR4:
      00000000003406f0
      [  451.170562] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  451.170562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [  451.170563] Call Trace:
      [  451.170572]  ena_release_bars.isra.48+0x34/0x60 [ena]
      [  451.170574]  ena_probe+0x144/0xd90 [ena]
      [  451.170579]  ? ida_simple_get+0x98/0x100
      [  451.170585]  ? kernfs_next_descendant_post+0x40/0x50
      [  451.170591]  local_pci_probe+0x45/0xa0
      [  451.170592]  pci_device_probe+0x157/0x180
      [  451.170599]  driver_probe_device+0x2a8/0x460
      [  451.170600]  __device_attach_driver+0x7e/0xe0
      [  451.170602]  ? driver_allows_async_probing+0x30/0x30
      [  451.170603]  bus_for_each_drv+0x68/0xb0
      [  451.170605]  __device_attach+0xdd/0x160
      [  451.170607]  device_attach+0x10/0x20
      [  451.170610]  pci_bus_add_device+0x4f/0xa0
      [  451.170611]  pci_bus_add_devices+0x39/0x70
      [  451.170613]  pciehp_configure_device+0x96/0x120
      [  451.170614]  pciehp_enable_slot+0x1b3/0x290
      [  451.170616]  pciehp_power_thread+0x3b/0xb0
      [  451.170622]  process_one_work+0x149/0x360
      [  451.170623]  worker_thread+0x4d/0x3c0
      [  451.170626]  kthread+0x109/0x140
      [  451.170627]  ? rescuer_thread+0x380/0x380
      [  451.170628]  ? kthread_park+0x60/0x60
      [  451.170632]  ret_from_fork+0x25/0x30
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      411838e7
    • Netanel Belgazal's avatar
      net: ena: reduce the severity of some printouts · cd7aea18
      Netanel Belgazal authored
      Decrease log level of checksum errors as these messages can be
      triggered remotely by bad packets.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd7aea18
  2. 18 Oct, 2017 4 commits
    • Jakub Kicinski's avatar
      bpf: disallow arithmetic operations on context pointer · 28e33f9d
      Jakub Kicinski authored
      Commit f1174f77 ("bpf/verifier: rework value tracking")
      removed the crafty selection of which pointer types are
      allowed to be modified.  This is OK for most pointer types
      since adjust_ptr_min_max_vals() will catch operations on
      immutable pointers.  One exception is PTR_TO_CTX which is
      now allowed to be offseted freely.
      
      The intent of aforementioned commit was to allow context
      access via modified registers.  The offset passed to
      ->is_valid_access() verifier callback has been adjusted
      by the value of the variable offset.
      
      What is missing, however, is taking the variable offset
      into account when the context register is used.  Or in terms
      of the code adding the offset to the value passed to the
      ->convert_ctx_access() callback.  This leads to the following
      eBPF user code:
      
           r1 += 68
           r0 = *(u32 *)(r1 + 8)
           exit
      
      being translated to this in kernel space:
      
         0: (07) r1 += 68
         1: (61) r0 = *(u32 *)(r1 +180)
         2: (95) exit
      
      Offset 8 is corresponding to 180 in the kernel, but offset
      76 is valid too.  Verifier will "accept" access to offset
      68+8=76 but then "convert" access to offset 8 as 180.
      Effective access to offset 248 is beyond the kernel context.
      (This is a __sk_buff example on a debug-heavy kernel -
      packet mark is 8 -> 180, 76 would be data.)
      
      Dereferencing the modified context pointer is not as easy
      as dereferencing other types, because we have to translate
      the access to reading a field in kernel structures which is
      usually at a different offset and often of a different size.
      To allow modifying the pointer we would have to make sure
      that given eBPF instruction will always access the same
      field or the fields accessed are "compatible" in terms of
      offset and size...
      
      Disallow dereferencing modified context pointers and add
      to selftests the test case described here.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28e33f9d
    • Johannes Berg's avatar
      netlink: fix netlink_ack() extack race · 48044eb4
      Johannes Berg authored
      It seems that it's possible to toggle NETLINK_F_EXT_ACK
      through setsockopt() while another thread/CPU is building
      a message inside netlink_ack(), which could then trigger
      the WARN_ON()s I added since if it goes from being turned
      off to being turned on between allocating and filling the
      message, the skb could end up being too small.
      
      Avoid this whole situation by storing the value of this
      flag in a separate variable and using that throughout the
      function instead.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48044eb4
    • Thomas Falcon's avatar
      ibmvnic: Fix calculation of number of TX header descriptors · 2de09681
      Thomas Falcon authored
      This patch correctly sets the number of additional header descriptors
      that will be sent in an indirect SCRQ entry.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de09681
    • Ido Schimmel's avatar
      mlxsw: core: Fix possible deadlock · d965465b
      Ido Schimmel authored
      When an EMAD is transmitted, a timeout work item is scheduled with a
      delay of 200ms, so that another EMAD will be retried until a maximum of
      five retries.
      
      In certain situations, it's possible for the function waiting on the
      EMAD to be associated with a work item that is queued on the same
      workqueue (`mlxsw_core`) as the timeout work item. This results in
      flushing a work item on the same workqueue.
      
      According to commit e159489b ("workqueue: relax lockdep annotation
      on flush_work()") the above may lead to a deadlock in case the workqueue
      has only one worker active or if the system in under memory pressure and
      the rescue worker is in use. The latter explains the very rare and
      random nature of the lockdep splats we have been seeing:
      
      [   52.730240] ============================================
      [   52.736179] WARNING: possible recursive locking detected
      [   52.742119] 4.14.0-rc3jiri+ #4 Not tainted
      [   52.746697] --------------------------------------------
      [   52.752635] kworker/1:3/599 is trying to acquire lock:
      [   52.758378]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
      [   52.767837]
                     but task is already holding lock:
      [   52.774360]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.784495]
                     other info that might help us debug this:
      [   52.791794]  Possible unsafe locking scenario:
      [   52.798413]        CPU0
      [   52.801144]        ----
      [   52.803875]   lock(mlxsw_core_driver_name);
      [   52.808556]   lock(mlxsw_core_driver_name);
      [   52.813236]
                      *** DEADLOCK ***
      [   52.819857]  May be due to missing lock nesting notation
      [   52.827450] 3 locks held by kworker/1:3/599:
      [   52.832221]  #0:  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.842846]  #1:  ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.854537]  #2:  (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
      [   52.863021]
                     stack backtrace:
      [   52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
      [   52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [   52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
      [   52.894060] Call Trace:
      [   52.909122]  __lock_acquire+0xf6f/0x2a10
      [   53.025412]  lock_acquire+0x158/0x440
      [   53.047557]  flush_work+0x3c4/0x5e0
      [   53.087571]  __cancel_work_timer+0x3ca/0x5e0
      [   53.177051]  cancel_delayed_work_sync+0x13/0x20
      [   53.182142]  mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
      [   53.194571]  mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
      [   53.225365]  mlxsw_reg_query+0x10/0x20 [mlxsw_core]
      [   53.230882]  mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
      [   53.237801]  process_one_work+0x8f1/0x12f0
      [   53.321804]  worker_thread+0x1fd/0x10c0
      [   53.435158]  kthread+0x28e/0x370
      [   53.448703]  ret_from_fork+0x2a/0x40
      [   53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
      [   53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
      [   53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
      [   53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications
      
      Fix this by creating another workqueue for EMAD timeouts, thereby
      preventing the situation of a work item trying to flush a work item
      queued on the same workqueue.
      
      Fixes: caf7297e ("mlxsw: core: Introduce support for asynchronous EMAD register access")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d965465b
  3. 16 Oct, 2017 13 commits
  4. 15 Oct, 2017 10 commits
  5. 13 Oct, 2017 4 commits
  6. 12 Oct, 2017 1 commit
    • Samuel Mendoza-Jonas's avatar
      net/ncsi: Don't limit vids based on hot_channel · 6e9c0075
      Samuel Mendoza-Jonas authored
      Currently we drop any new VLAN ids if there are more than the current
      (or last used) channel can support. Most importantly this is a problem
      if no channel has been selected yet, resulting in a segfault.
      
      Secondly this does not necessarily reflect the capabilities of any other
      channels. Instead only drop a new VLAN id if we are already tracking the
      maximum allowed by the NCSI specification. Per-channel limits are
      already handled by ncsi_add_filter(), but add a message to set_one_vid()
      to make it obvious that the channel can not support any more VLAN ids.
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e9c0075
  7. 11 Oct, 2017 4 commits
  8. 10 Oct, 2017 2 commits