1. 08 Aug, 2019 2 commits
    • Peter Zijlstra's avatar
      sched: Clean up active_mm reference counting · 139d025c
      Peter Zijlstra authored
      The current active_mm reference counting is confusing and sub-optimal.
      
      Rewrite the code to explicitly consider the 4 separate cases:
      
          user -> user
      
      	When switching between two user tasks, all we need to consider
      	is switch_mm().
      
          user -> kernel
      
      	When switching from a user task to a kernel task (which
      	doesn't have an associated mm) we retain the last mm in our
      	active_mm. Increment a reference count on active_mm.
      
        kernel -> kernel
      
      	When switching between kernel threads, all we need to do is
      	pass along the active_mm reference.
      
        kernel -> user
      
      	When switching between a kernel and user task, we must switch
      	from the last active_mm to the next mm, hoping of course that
      	these are the same. Decrement a reference on the active_mm.
      
      The code keeps a different order, because as you'll note, both 'to
      user' cases require switch_mm().
      
      And where the old code would increment/decrement for the 'kernel ->
      kernel' case, the new code observes this is a neutral operation and
      avoids touching the reference count.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: luto@kernel.org
      139d025c
    • Peter Zijlstra's avatar
      rcu/tree: Fix SCHED_FIFO params · 130d9c33
      Peter Zijlstra authored
      A rather embarrasing mistake had us call sched_setscheduler() before
      initializing the parameters passed to it.
      
      Fixes: 1a763fd7 ("rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      130d9c33
  2. 25 Jul, 2019 23 commits
  3. 22 Jul, 2019 9 commits
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b5cf701
      Linus Torvalds authored
      Pull preemption Kconfig fix from Thomas Gleixner:
       "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
        PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
        and PREEMPT_RT.
      
        Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
        obviously can't work anymore. oldconfig builds are affected as well,
        but it's more obvious as the user gets asked. [old]defconfig silently
        fixes it up and selects PREEMPT_NONE.
      
        Unbreak it by undoing the rename and adding a intermediate config
        symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
        to chase down a few #ifdefs, but it's better than tweaking 114
        defconfigs and annoying users"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
      7b5cf701
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 44b912cd
      Linus Torvalds authored
      Pull pidfd polling fix from Christian Brauner:
       "A fix for pidfd polling. It ensures that the task's exit state is
        visible to all waiters"
      
      * tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        pidfd: fix a poll race when setting exit_state
      44b912cd
    • Linus Torvalds's avatar
      Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 21c730d7
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fixes for leaks caused by recently merged patches
      
       - one build fix
      
       - a fix to prevent mixing of incompatible features
      
      * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: don't leak extent_map in btrfs_get_io_geometry()
        btrfs: free checksum hash on in close_ctree
        btrfs: Fix build error while LIBCRC32C is module
        btrfs: inode: Don't compress if NODATASUM or NODATACOW set
      21c730d7
    • Thomas Gleixner's avatar
      sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y · b8d33498
      Thomas Gleixner authored
      The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
      CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
      set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
      the preemption mode choice wich defaults to NONE. This also affects
      oldconfig builds.
      
      So rather than changing 114 defconfig files and being an annoyance to
      users, revert the rename and select a new config symbol PREEMPTION. That
      keeps everything working smoothly and the revelant ifdef's are going to be
      fixed up step by step.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: a50a3f4b ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b8d33498
    • Linus Torvalds's avatar
      Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · c92f0380
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For two regressions in media core:
      
         - v4l2-subdev: fix regression in check_pad()
      
         - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
           in use"
      
      * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
        media: v4l2-subdev: fix regression in check_pad()
      c92f0380
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 83768245
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several netfilter fixes including a nfnetlink deadlock fix from
          Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
      
       2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
          proper block sharing.
      
       3) Fix r8169 PHY init from Thomas Voegtle.
      
       4) Fix memory leak in mac80211, from Lorenzo Bianconi.
      
       5) Missing NULL check on object allocation in cxgb4, from Navid
          Emamdoost.
      
       6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
      
       7) Check that there is actually an ip header to access in skb->data in
          VRF, from Peter Kosyh.
      
       8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
      
       9) One more tweak the the TCP fragmentation memory limit changes, to be
          less harmful to applications setting small SO_SNDBUF values. From
          Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
        tcp: be more careful in tcp_fragment()
        hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
        vrf: make sure skb->data contains ip header to make routing
        connector: remove redundant input callback from cn_dev
        qed: Prefer pcie_capability_read_word()
        igc: Prefer pcie_capability_read_word()
        cxgb4: Prefer pcie_capability_read_word()
        be2net: Synchronize be_update_queues with dev_watchdog
        bnx2x: Prevent load reordering in tx completion processing
        net: phy: sfp: hwmon: Fix scaling of RX power
        net: sched: verify that q!=NULL before setting q->flags
        chelsio: Fix a typo in a function name
        allocate_flower_entry: should check for null deref
        net: hns3: typo in the name of a constant
        kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
        tipc: Fix a typo
        mac80211: don't warn about CW params when not using them
        mac80211: fix possible memory leak in ieee80211_assign_beacon
        nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
        nl80211: fix VENDOR_CMD_RAW_DATA
        ...
      83768245
    • Suren Baghdasaryan's avatar
      pidfd: fix a poll race when setting exit_state · b191d649
      Suren Baghdasaryan authored
      There is a race between reading task->exit_state in pidfd_poll and
      writing it after do_notify_parent calls do_notify_pidfd. Expected
      sequence of events is:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
        tsk->exit_state = EXIT_DEAD
                                        pidfd_poll
                                           if (tsk->exit_state)
      
      However nothing prevents the following sequence:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
                                         pidfd_poll
                                            if (tsk->exit_state)
        tsk->exit_state = EXIT_DEAD
      
      This causes a polling task to wait forever, since poll blocks because
      exit_state is 0 and the waiting task is not notified again. A stress
      test continuously doing pidfd poll and process exits uncovered this bug.
      
      To fix it, we make sure that the task's exit_state is always set before
      calling do_notify_pidfd.
      
      Fixes: b53b0b9d ("pidfd: add polling support")
      Cc: kernel-team@android.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
      [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      b191d649
    • Eric Dumazet's avatar
      tcp: be more careful in tcp_fragment() · b617158d
      Eric Dumazet authored
      Some applications set tiny SO_SNDBUF values and expect
      TCP to just work. Recent patches to address CVE-2019-11478
      broke them in case of losses, since retransmits might
      be prevented.
      
      We should allow these flows to make progress.
      
      This patch allows the first and last skb in retransmit queue
      to be split even if memory limits are hit.
      
      It also adds the some room due to the fact that tcp_sendmsg()
      and tcp_sendpage() might overshoot sk_wmem_queued by about one full
      TSO skb (64KB size). Note this allowance was already present
      in stable backports for kernels < 4.15
      
      Note for < 4.15 backports :
       tcp_rtx_queue_tail() will probably look like :
      
      static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
      {
      	struct sk_buff *skb = tcp_send_head(sk);
      
      	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
      }
      
      Fixes: f070ef2a ("tcp: tcp_fragment() should apply sane memory limits")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Tested-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Cc: Jonathan Looney <jtl@netflix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b617158d
    • Haiyang Zhang's avatar
      hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() · be4363bd
      Haiyang Zhang authored
      There is an extra rcu_read_unlock left in netvsc_recv_callback(),
      after a previous patch that removes RCU from this function.
      This patch removes the extra RCU unlock.
      
      Fixes: 345ac089 ("hv_netvsc: pass netvsc_device to receive callback")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be4363bd
  4. 21 Jul, 2019 6 commits