1. 25 Jul, 2019 7 commits
    • Paul E. McKenney's avatar
      time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint · 84ec3a07
      Paul E. McKenney authored
      time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
      
      The TASKS03 and TREE04 rcutorture scenarios produce the following
      lockdep complaint:
      
      	WARNING: inconsistent lock state
      	5.2.0-rc1+ #513 Not tainted
      	--------------------------------
      	inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      	migration/1/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
      	(____ptrval____) (tick_broadcast_lock){?...}, at: tick_broadcast_offline+0xf/0x70
      	{IN-HARDIRQ-W} state was registered at:
      	  lock_acquire+0xb0/0x1c0
      	  _raw_spin_lock_irqsave+0x3c/0x50
      	  tick_broadcast_switch_to_oneshot+0xd/0x40
      	  tick_switch_to_oneshot+0x4f/0xd0
      	  hrtimer_run_queues+0xf3/0x130
      	  run_local_timers+0x1c/0x50
      	  update_process_times+0x1c/0x50
      	  tick_periodic+0x26/0xc0
      	  tick_handle_periodic+0x1a/0x60
      	  smp_apic_timer_interrupt+0x80/0x2a0
      	  apic_timer_interrupt+0xf/0x20
      	  _raw_spin_unlock_irqrestore+0x4e/0x60
      	  rcu_nocb_gp_kthread+0x15d/0x590
      	  kthread+0xf3/0x130
      	  ret_from_fork+0x3a/0x50
      	irq event stamp: 171
      	hardirqs last  enabled at (171): [<ffffffff8a201a37>] trace_hardirqs_on_thunk+0x1a/0x1c
      	hardirqs last disabled at (170): [<ffffffff8a201a53>] trace_hardirqs_off_thunk+0x1a/0x1c
      	softirqs last  enabled at (0): [<ffffffff8a264ee0>] copy_process.part.56+0x650/0x1cb0
      	softirqs last disabled at (0): [<0000000000000000>] 0x0
      
              [...]
      
      To reproduce, run the following rcutorture test:
      
       $ tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TASKS03 TREE04"
      
      It turns out that tick_broadcast_offline() was an innocent bystander.
      After all, interrupts are supposed to be disabled throughout
      take_cpu_down(), and therefore should have been disabled upon entry to
      tick_offline_cpu() and thus to tick_broadcast_offline().  This suggests
      that one of the CPU-hotplug notifiers was incorrectly enabling interrupts,
      and leaving them enabled on return.
      
      Some debugging code showed that the culprit was sched_cpu_dying().
      It had irqs enabled after return from sched_tick_stop().  Which in turn
      had irqs enabled after return from cancel_delayed_work_sync().  Which is a
      wrapper around __cancel_work_timer().  Which can sleep in the case where
      something else is concurrently trying to cancel the same delayed work,
      and as Thomas Gleixner pointed out on IRC, sleeping is a decidedly bad
      idea when you are invoked from take_cpu_down(), regardless of the state
      you leave interrupts in upon return.
      
      Code inspection located no reason why the delayed work absolutely
      needed to be canceled from sched_tick_stop():  The work is not
      bound to the outgoing CPU by design, given that the whole point is
      to collect statistics without disturbing the outgoing CPU.
      
      This commit therefore simply drops the cancel_delayed_work_sync() from
      sched_tick_stop().  Instead, a new ->state field is added to the tick_work
      structure so that the delayed-work handler function sched_tick_remote()
      can avoid reposting itself.  A cpu_is_offline() check is also added to
      sched_tick_remote() to avoid mucking with the state of an offlined CPU
      (though it does appear safe to do so).  The sched_tick_start() and
      sched_tick_stop() functions also update ->state, and sched_tick_start()
      also schedules the delayed work if ->state indicates that it is not
      already in flight.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      [ paulmck: Apply Peter Zijlstra and Frederic Weisbecker atomics feedback. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190625165238.GJ26519@linux.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      84ec3a07
    • Vincent Guittot's avatar
      sched/fair: Fix imbalance due to CPU affinity · f6cad8df
      Vincent Guittot authored
      The load_balance() has a dedicated mecanism to detect when an imbalance
      is due to CPU affinity and must be handled at parent level. In this case,
      the imbalance field of the parent's sched_group is set.
      
      The description of sg_imbalanced() gives a typical example of two groups
      of 4 CPUs each and 4 tasks each with a cpumask covering 1 CPU of the first
      group and 3 CPUs of the second group. Something like:
      
      	{ 0 1 2 3 } { 4 5 6 7 }
      	        *     * * *
      
      But the load_balance fails to fix this UC on my octo cores system
      made of 2 clusters of quad cores.
      
      Whereas the load_balance is able to detect that the imbalanced is due to
      CPU affinity, it fails to fix it because the imbalance field is cleared
      before letting parent level a chance to run. In fact, when the imbalance is
      detected, the load_balance reruns without the CPU with pinned tasks. But
      there is no other running tasks in the situation described above and
      everything looks balanced this time so the imbalance field is immediately
      cleared.
      
      The imbalance field should not be cleared if there is no other task to move
      when the imbalance is detected.
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1561996022-28829-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f6cad8df
    • Valentin Schneider's avatar
      sched/fair: Change task_numa_work() storage to static · 9434f9f5
      Valentin Schneider authored
      There are no callers outside of fair.c.
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mgorman@suse.de
      Cc: riel@surriel.com
      Link: https://lkml.kernel.org/r/20190715102508.32434-4-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9434f9f5
    • Valentin Schneider's avatar
      sched/fair: Move task_numa_work() init to init_numa_balancing() · b34920d4
      Valentin Schneider authored
      We only need to set the callback_head worker function once, do it
      during sched_fork().
      
      While at it, move the comment regarding double task_work addition to
      init_numa_balancing(), since the double add sentinel is first set there.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mgorman@suse.de
      Cc: riel@surriel.com
      Link: https://lkml.kernel.org/r/20190715102508.32434-3-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b34920d4
    • Valentin Schneider's avatar
      sched/fair: Move init_numa_balancing() below task_numa_work() · d35927a1
      Valentin Schneider authored
      To reference task_numa_work() from within init_numa_balancing(), we
      need the former to be declared before the latter. Do just that.
      
      This is a pure code movement.
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mgorman@suse.de
      Cc: riel@surriel.com
      Link: https://lkml.kernel.org/r/20190715102508.32434-2-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d35927a1
    • Jann Horn's avatar
      sched/fair: Use RCU accessors consistently for ->numa_group · cb361d8c
      Jann Horn authored
      The old code used RCU annotations and accessors inconsistently for
      ->numa_group, which can lead to use-after-frees and NULL dereferences.
      
      Let all accesses to ->numa_group use proper RCU helpers to prevent such
      issues.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Fixes: 8c8a743c ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
      Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cb361d8c
    • Jann Horn's avatar
      sched/fair: Don't free p->numa_faults with concurrent readers · 16d51a59
      Jann Horn authored
      When going through execve(), zero out the NUMA fault statistics instead of
      freeing them.
      
      During execve, the task is reachable through procfs and the scheduler. A
      concurrent /proc/*/sched reader can read data from a freed ->numa_faults
      allocation (confirmed by KASAN) and write it back to userspace.
      I believe that it would also be possible for a use-after-free read to occur
      through a race between a NUMA fault and execve(): task_numa_fault() can
      lead to task_numa_compare(), which invokes task_weight() on the currently
      running task of a different CPU.
      
      Another way to fix this would be to make ->numa_faults RCU-managed or add
      extra locking, but it seems easier to wipe the NUMA fault statistics on
      execve.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Fixes: 82727018 ("sched/numa: Call task_numa_free() from do_execve()")
      Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      16d51a59
  2. 22 Jul, 2019 9 commits
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b5cf701
      Linus Torvalds authored
      Pull preemption Kconfig fix from Thomas Gleixner:
       "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
        PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
        and PREEMPT_RT.
      
        Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
        obviously can't work anymore. oldconfig builds are affected as well,
        but it's more obvious as the user gets asked. [old]defconfig silently
        fixes it up and selects PREEMPT_NONE.
      
        Unbreak it by undoing the rename and adding a intermediate config
        symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
        to chase down a few #ifdefs, but it's better than tweaking 114
        defconfigs and annoying users"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
      7b5cf701
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 44b912cd
      Linus Torvalds authored
      Pull pidfd polling fix from Christian Brauner:
       "A fix for pidfd polling. It ensures that the task's exit state is
        visible to all waiters"
      
      * tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        pidfd: fix a poll race when setting exit_state
      44b912cd
    • Linus Torvalds's avatar
      Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 21c730d7
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fixes for leaks caused by recently merged patches
      
       - one build fix
      
       - a fix to prevent mixing of incompatible features
      
      * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: don't leak extent_map in btrfs_get_io_geometry()
        btrfs: free checksum hash on in close_ctree
        btrfs: Fix build error while LIBCRC32C is module
        btrfs: inode: Don't compress if NODATASUM or NODATACOW set
      21c730d7
    • Thomas Gleixner's avatar
      sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y · b8d33498
      Thomas Gleixner authored
      The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
      CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
      set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
      the preemption mode choice wich defaults to NONE. This also affects
      oldconfig builds.
      
      So rather than changing 114 defconfig files and being an annoyance to
      users, revert the rename and select a new config symbol PREEMPTION. That
      keeps everything working smoothly and the revelant ifdef's are going to be
      fixed up step by step.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: a50a3f4b ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b8d33498
    • Linus Torvalds's avatar
      Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · c92f0380
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For two regressions in media core:
      
         - v4l2-subdev: fix regression in check_pad()
      
         - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
           in use"
      
      * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
        media: v4l2-subdev: fix regression in check_pad()
      c92f0380
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 83768245
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several netfilter fixes including a nfnetlink deadlock fix from
          Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
      
       2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
          proper block sharing.
      
       3) Fix r8169 PHY init from Thomas Voegtle.
      
       4) Fix memory leak in mac80211, from Lorenzo Bianconi.
      
       5) Missing NULL check on object allocation in cxgb4, from Navid
          Emamdoost.
      
       6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
      
       7) Check that there is actually an ip header to access in skb->data in
          VRF, from Peter Kosyh.
      
       8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
      
       9) One more tweak the the TCP fragmentation memory limit changes, to be
          less harmful to applications setting small SO_SNDBUF values. From
          Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
        tcp: be more careful in tcp_fragment()
        hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
        vrf: make sure skb->data contains ip header to make routing
        connector: remove redundant input callback from cn_dev
        qed: Prefer pcie_capability_read_word()
        igc: Prefer pcie_capability_read_word()
        cxgb4: Prefer pcie_capability_read_word()
        be2net: Synchronize be_update_queues with dev_watchdog
        bnx2x: Prevent load reordering in tx completion processing
        net: phy: sfp: hwmon: Fix scaling of RX power
        net: sched: verify that q!=NULL before setting q->flags
        chelsio: Fix a typo in a function name
        allocate_flower_entry: should check for null deref
        net: hns3: typo in the name of a constant
        kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
        tipc: Fix a typo
        mac80211: don't warn about CW params when not using them
        mac80211: fix possible memory leak in ieee80211_assign_beacon
        nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
        nl80211: fix VENDOR_CMD_RAW_DATA
        ...
      83768245
    • Suren Baghdasaryan's avatar
      pidfd: fix a poll race when setting exit_state · b191d649
      Suren Baghdasaryan authored
      There is a race between reading task->exit_state in pidfd_poll and
      writing it after do_notify_parent calls do_notify_pidfd. Expected
      sequence of events is:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
        tsk->exit_state = EXIT_DEAD
                                        pidfd_poll
                                           if (tsk->exit_state)
      
      However nothing prevents the following sequence:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
                                         pidfd_poll
                                            if (tsk->exit_state)
        tsk->exit_state = EXIT_DEAD
      
      This causes a polling task to wait forever, since poll blocks because
      exit_state is 0 and the waiting task is not notified again. A stress
      test continuously doing pidfd poll and process exits uncovered this bug.
      
      To fix it, we make sure that the task's exit_state is always set before
      calling do_notify_pidfd.
      
      Fixes: b53b0b9d ("pidfd: add polling support")
      Cc: kernel-team@android.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
      [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      b191d649
    • Eric Dumazet's avatar
      tcp: be more careful in tcp_fragment() · b617158d
      Eric Dumazet authored
      Some applications set tiny SO_SNDBUF values and expect
      TCP to just work. Recent patches to address CVE-2019-11478
      broke them in case of losses, since retransmits might
      be prevented.
      
      We should allow these flows to make progress.
      
      This patch allows the first and last skb in retransmit queue
      to be split even if memory limits are hit.
      
      It also adds the some room due to the fact that tcp_sendmsg()
      and tcp_sendpage() might overshoot sk_wmem_queued by about one full
      TSO skb (64KB size). Note this allowance was already present
      in stable backports for kernels < 4.15
      
      Note for < 4.15 backports :
       tcp_rtx_queue_tail() will probably look like :
      
      static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
      {
      	struct sk_buff *skb = tcp_send_head(sk);
      
      	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
      }
      
      Fixes: f070ef2a ("tcp: tcp_fragment() should apply sane memory limits")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Tested-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Cc: Jonathan Looney <jtl@netflix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b617158d
    • Haiyang Zhang's avatar
      hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() · be4363bd
      Haiyang Zhang authored
      There is an extra rcu_read_unlock left in netvsc_recv_callback(),
      after a previous patch that removes RCU from this function.
      This patch removes the extra RCU unlock.
      
      Fixes: 345ac089 ("hv_netvsc: pass netvsc_device to receive callback")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be4363bd
  3. 21 Jul, 2019 24 commits
    • Linus Torvalds's avatar
      Linus 5.3-rc1 · 5f9e832c
      Linus Torvalds authored
      5f9e832c
    • Peter Kosyh's avatar
      vrf: make sure skb->data contains ip header to make routing · 107e47cc
      Peter Kosyh authored
      vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
      using ip/ipv6 addresses, but don't make sure the header is available
      in skb->data[] (skb_headlen() is less then header size).
      
      Case:
      
      1) igb driver from intel.
      2) Packet size is greater then 255.
      3) MPLS forwards to VRF device.
      
      So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
      functions.
      Signed-off-by: default avatarPeter Kosyh <p.kosyh@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      107e47cc
    • Vasily Averin's avatar
      connector: remove redundant input callback from cn_dev · 903e9d1b
      Vasily Averin authored
      A small cleanup: this callback is never used.
      Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
      for OpenVZ7 bug OVZ-6877
      
      cc: stanislav.kinsburskiy@gmail.com
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      903e9d1b
    • Frederick Lawler's avatar
      qed: Prefer pcie_capability_read_word() · 93428c58
      Frederick Lawler authored
      Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability")
      added accessors for the PCI Express Capability so that drivers didn't
      need to be aware of differences between v1 and v2 of the PCI
      Express Capability.
      
      Replace pci_read_config_word() and pci_write_config_word() calls with
      pcie_capability_read_word() and pcie_capability_write_word().
      Signed-off-by: default avatarFrederick Lawler <fred@fredlawl.com>
      Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93428c58
    • Frederick Lawler's avatar
      igc: Prefer pcie_capability_read_word() · a16f6d3a
      Frederick Lawler authored
      Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability")
      added accessors for the PCI Express Capability so that drivers didn't
      need to be aware of differences between v1 and v2 of the PCI
      Express Capability.
      
      Replace pci_read_config_word() and pci_write_config_word() calls with
      pcie_capability_read_word() and pcie_capability_write_word().
      Signed-off-by: default avatarFrederick Lawler <fred@fredlawl.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a16f6d3a
    • Frederick Lawler's avatar
      cxgb4: Prefer pcie_capability_read_word() · 6133b920
      Frederick Lawler authored
      Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability")
      added accessors for the PCI Express Capability so that drivers didn't
      need to be aware of differences between v1 and v2 of the PCI
      Express Capability.
      
      Replace pci_read_config_word() and pci_write_config_word() calls with
      pcie_capability_read_word() and pcie_capability_write_word().
      Signed-off-by: default avatarFrederick Lawler <fred@fredlawl.com>
      Reviewed-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6133b920
    • Benjamin Poirier's avatar
      be2net: Synchronize be_update_queues with dev_watchdog · ffd342e0
      Benjamin Poirier authored
      As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
      ethtool set_channels operation is started. be_tx_timeout(), which dumps
      some queue structures, is not written to run concurrently with
      be_update_queues(), which frees/allocates those queues structures. Add some
      synchronization between the two.
      
      Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffd342e0
    • Brian King's avatar
      bnx2x: Prevent load reordering in tx completion processing · ea811b79
      Brian King authored
      This patch fixes an issue seen on Power systems with bnx2x which results
      in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
      pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
      load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea811b79
    • Andrew Lunn's avatar
      net: phy: sfp: hwmon: Fix scaling of RX power · 0cea0e11
      Andrew Lunn authored
      The RX power read from the SFP uses units of 0.1uW. This must be
      scaled to units of uW for HWMON. This requires a divide by 10, not the
      current 100.
      
      With this change in place, sensors(1) and ethtool -m agree:
      
      sff2-isa-0000
      Adapter: ISA adapter
      in0:          +3.23 V
      temp1:        +33.1 C
      power1:      270.00 uW
      power2:      200.00 uW
      curr1:        +0.01 A
      
              Laser output power                        : 0.2743 mW / -5.62 dBm
              Receiver signal average optical power     : 0.2014 mW / -6.96 dBm
      
      Reported-by: chris.healy@zii.aero
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cea0e11
    • Vlad Buslov's avatar
      net: sched: verify that q!=NULL before setting q->flags · 503d81d4
      Vlad Buslov authored
      In function int tc_new_tfilter() q pointer can be NULL when adding filter
      on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
      filter creation, following NULL pointer dereference happens in case parent
      block is shared:
      
      [  212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
      [  212.925445] #PF: supervisor write access in kernel mode
      [  212.925709] #PF: error_code(0x0002) - not-present page
      [  212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
      [  212.926302] Oops: 0002 [#1] SMP KASAN PTI
      [  212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G    B             5.2.0+ #512
      [  212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
      [  212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
      b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
      [  212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
      [  212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
      [  212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
      [  212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
      [  212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
      [  212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
      [  212.929999] FS:  00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
      [  212.930235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
      [  212.930588] Call Trace:
      [  212.930682]  ? tc_del_tfilter+0xa40/0xa40
      [  212.930811]  ? __lock_acquire+0x5b5/0x2460
      [  212.930948]  ? find_held_lock+0x85/0xa0
      [  212.931081]  ? tc_del_tfilter+0xa40/0xa40
      [  212.931201]  rtnetlink_rcv_msg+0x4ab/0x5f0
      [  212.931332]  ? rtnl_dellink+0x490/0x490
      [  212.931454]  ? lockdep_hardirqs_on+0x260/0x260
      [  212.931589]  ? netlink_deliver_tap+0xab/0x5a0
      [  212.931717]  ? match_held_lock+0x1b/0x240
      [  212.931844]  netlink_rcv_skb+0xd0/0x200
      [  212.931958]  ? rtnl_dellink+0x490/0x490
      [  212.932079]  ? netlink_ack+0x440/0x440
      [  212.932205]  ? netlink_deliver_tap+0x161/0x5a0
      [  212.932335]  ? lock_downgrade+0x360/0x360
      [  212.932457]  ? lock_acquire+0xe5/0x210
      [  212.932579]  netlink_unicast+0x296/0x350
      [  212.932705]  ? netlink_attachskb+0x390/0x390
      [  212.932834]  ? _copy_from_iter_full+0xe0/0x3a0
      [  212.932976]  netlink_sendmsg+0x394/0x600
      [  212.937998]  ? netlink_unicast+0x350/0x350
      [  212.943033]  ? move_addr_to_kernel.part.0+0x90/0x90
      [  212.948115]  ? netlink_unicast+0x350/0x350
      [  212.953185]  sock_sendmsg+0x96/0xa0
      [  212.958099]  ___sys_sendmsg+0x482/0x520
      [  212.962881]  ? match_held_lock+0x1b/0x240
      [  212.967618]  ? copy_msghdr_from_user+0x250/0x250
      [  212.972337]  ? lock_downgrade+0x360/0x360
      [  212.976973]  ? rwlock_bug.part.0+0x60/0x60
      [  212.981548]  ? __mod_node_page_state+0x1f/0xa0
      [  212.986060]  ? match_held_lock+0x1b/0x240
      [  212.990567]  ? find_held_lock+0x85/0xa0
      [  212.994989]  ? do_user_addr_fault+0x349/0x5b0
      [  212.999387]  ? lock_downgrade+0x360/0x360
      [  213.003713]  ? find_held_lock+0x85/0xa0
      [  213.007972]  ? __fget_light+0xa1/0xf0
      [  213.012143]  ? sockfd_lookup_light+0x91/0xb0
      [  213.016165]  __sys_sendmsg+0xba/0x130
      [  213.020040]  ? __sys_sendmsg_sock+0xb0/0xb0
      [  213.023870]  ? handle_mm_fault+0x337/0x470
      [  213.027592]  ? page_fault+0x8/0x30
      [  213.031316]  ? lockdep_hardirqs_off+0xbe/0x100
      [  213.034999]  ? mark_held_locks+0x24/0x90
      [  213.038671]  ? do_syscall_64+0x1e/0xe0
      [  213.042297]  do_syscall_64+0x74/0xe0
      [  213.045828]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  213.049354] RIP: 0033:0x7fe7c527c7b8
      [  213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
      0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
      [  213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
      [  213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
      [  213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
      [  213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
      [  213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
      [  213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
      [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
      lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
      _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
      r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
      [  213.112326] CR2: 0000000000000010
      [  213.117429] ---[ end trace adb58eb0a4ee6283 ]---
      
      Verify that q pointer is not NULL before setting the 'flags' field.
      
      Fixes: 3f05e688 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      503d81d4
    • Christophe JAILLET's avatar
      chelsio: Fix a typo in a function name · 85d9bf97
      Christophe JAILLET authored
      It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2
      switched in 3126.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85d9bf97
    • Navid Emamdoost's avatar
      allocate_flower_entry: should check for null deref · bb132083
      Navid Emamdoost authored
      allocate_flower_entry does not check for allocation success, but tries
      to deref the result. I only moved the spin_lock under null check, because
       the caller is checking allocation's status at line 652.
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb132083
    • Christophe JAILLET's avatar
      net: hns3: typo in the name of a constant · 4803d010
      Christophe JAILLET authored
      All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except
      'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched)
      
      s/HLC/HCL/
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4803d010
    • Jeremy Sowden's avatar
      kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist. · 408d2bbb
      Jeremy Sowden authored
      net/netfilter/nf_tables_offload.h includes net/netfilter/nf_tables.h
      which is itself on the blacklist.
      Reported-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      408d2bbb
    • Christophe JAILLET's avatar
      tipc: Fix a typo · bad7f869
      Christophe JAILLET authored
      s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/
      (r and s switched in topsrv)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bad7f869
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2019-07-20' of... · 953ba0a6
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2019-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      We have a handful of fixes:
       * ignore bad CW parameters if we aren't using them,
         instead of warning
       * fix operation (and then build) with the new netlink vendor
         command policy requirement
       * fix a memory leak in an error path when setting beacons
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      953ba0a6
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · c7bf0a0f
      Linus Torvalds authored
      Pull Devicetree fixes from Rob Herring:
       "Fix several warnings/errors in validation of binding schemas"
      
      * tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples
        dt-bindings: iio: ad7124: Fix dtc warnings in example
        dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example
        dt-bindings: pinctrl: aspeed: Fix AST2500 example errors
        dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors
        dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes
        dt-bindings: Ensure child nodes are of type 'object'
      c7bf0a0f
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d6788eb7
      Linus Torvalds authored
      Pull vfs documentation typo fix from Al Viro.
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        typo fix: it's d_make_root, not d_make_inode...
      d6788eb7
    • Linus Torvalds's avatar
      Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 91962d0f
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Two fixes for stable, one that had dependency on earlier patch in this
        merge window and can now go in, and a perf improvement in SMB3 open"
      
      * tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module number
        cifs: flush before set-info if we have writeable handles
        smb3: optimize open to not send query file internal info
        cifs: copy_file_range needs to strip setuid bits and update timestamps
        CIFS: fix deadlock in cached root handling
      91962d0f
    • Qian Cai's avatar
      iommu/amd: fix a crash in iova_magazine_free_pfns · 8cf66504
      Qian Cai authored
      The commit b3aa14f0 ("iommu: remove the mapping_error dma_map_ops
      method") incorrectly changed the checking from dma_ops_alloc_iova() in
      map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
      never return DMA_MAPPING_ERROR on failure but 0, so the error handling
      is all wrong.
      
         kernel BUG at drivers/iommu/iova.c:801!
          Workqueue: kblockd blk_mq_run_work_fn
          RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
          Call Trace:
           free_cpu_cached_iovas+0xbd/0x150
           alloc_iova_fast+0x8c/0xba
           dma_ops_alloc_iova.isra.6+0x65/0xa0
           map_sg+0x8c/0x2a0
           scsi_dma_map+0xc6/0x160
           pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
           pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
           scsi_queue_rq+0x79c/0x1200
           blk_mq_dispatch_rq_list+0x4dc/0xb70
           blk_mq_sched_dispatch_requests+0x249/0x310
           __blk_mq_run_hw_queue+0x128/0x200
           blk_mq_run_work_fn+0x27/0x30
           process_one_work+0x522/0xa10
           worker_thread+0x63/0x5b0
           kthread+0x1d2/0x1f0
           ret_from_fork+0x22/0x40
      
      Fixes: b3aa14f0 ("iommu: remove the mapping_error dma_map_ops method")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8cf66504
    • Mike Rapoport's avatar
      hexagon: switch to generic version of pte allocation · 618381f0
      Mike Rapoport authored
      The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
      pte_free_kernel() and pte_free() is identical to the generic except of
      lack of __GFP_ACCOUNT for the user PTEs allocation.
      
      Switch hexagon to use generic version of these functions.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      618381f0
    • Linus Torvalds's avatar
      Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb · bec5545e
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "New feature to add support for NTB virtual MSI interrupts, the ability
        to test and use this feature in the NTB transport layer.
      
        Also, bug fixes for the AMD and Switchtec drivers, as well as some
        general patches"
      
      * tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
        NTB: Describe the ntb_msi_test client in the documentation.
        NTB: Add MSI interrupt support to ntb_transport
        NTB: Add ntb_msi_test support to ntb_test
        NTB: Introduce NTB MSI Test Client
        NTB: Introduce MSI library
        NTB: Rename ntb.c to support multiple source files in the module
        NTB: Introduce functions to calculate multi-port resource index
        NTB: Introduce helper functions to calculate logical port number
        PCI/switchtec: Add module parameter to request more interrupts
        PCI/MSI: Support allocating virtual MSI interrupts
        ntb_hw_switchtec: Fix setup MW with failure bug
        ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
        ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
        NTB: correct ntb_dev_ops and ntb_dev comment typos
        NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
        ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
        NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
        NTB: ntb_hw_amd: set peer limit register
        NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
        NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
        ...
      bec5545e
    • Al Viro's avatar
      typo fix: it's d_make_root, not d_make_inode... · 1b03bc5c
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1b03bc5c
    • Rob Herring's avatar
      dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples · e2297f7c
      Rob Herring authored
      Now that examples are validated against the DT schema, an error with
      required 'clocks' property missing is exposed:
      
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@40020000: gpio@0: 'clocks' is a required property
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@50020000: gpio@1000: 'clocks' is a required property
      Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
      pinctrl@50020000: gpio@2000: 'clocks' is a required property
      
      Add the missing 'clocks' properties to the examples to fix the errors.
      
      Fixes: 2c9239c1 ("dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema")
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-stm32@st-md-mailman.stormreply.com
      Acked-by: default avatarAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      e2297f7c