1. 03 Nov, 2017 4 commits
  2. 02 Nov, 2017 1 commit
    • Dongli Zhang's avatar
      xen/time: do not decrease steal time after live migration on xen · 5e25f5db
      Dongli Zhang authored
      After guest live migration on xen, steal time in /proc/stat
      (cpustat[CPUTIME_STEAL]) might decrease because steal returned by
      xen_steal_lock() might be less than this_rq()->prev_steal_time which is
      derived from previous return value of xen_steal_clock().
      
      For instance, steal time of each vcpu is 335 before live migration.
      
      cpu  198 0 368 200064 1962 0 0 1340 0 0
      cpu0 38 0 81 50063 492 0 0 335 0 0
      cpu1 65 0 97 49763 634 0 0 335 0 0
      cpu2 38 0 81 50098 462 0 0 335 0 0
      cpu3 56 0 107 50138 374 0 0 335 0 0
      
      After live migration, steal time is reduced to 312.
      
      cpu  200 0 370 200330 1971 0 0 1248 0 0
      cpu0 38 0 82 50123 500 0 0 312 0 0
      cpu1 65 0 97 49832 634 0 0 312 0 0
      cpu2 39 0 82 50167 462 0 0 312 0 0
      cpu3 56 0 107 50207 374 0 0 312 0 0
      
      Since runstate times are cumulative and cleared during xen live migration
      by xen hypervisor, the idea of this patch is to accumulate runstate times
      to global percpu variables before live migration suspend. Once guest VM is
      resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new
      runstate times and previously accumulated times stored in global percpu
      variables.
      
      Comment above HYPERVISOR_suspend() has been removed as it is inaccurate:
      the call can return an error code (e.g., possibly -EPERM in the future).
      
      Similar and more severe issue would impact prior linux 4.8-4.10 as
      discussed by Michael Las at
      https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
      which would overflow steal time and lead to 100% st usage in top command
      for linux 4.8-4.10. A backport of this patch would fix that issue.
      
      [boris: added linux/slab.h to driver/xen/time.c, slightly reformatted
              commit message]
      
      References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guestSigned-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      5e25f5db
  3. 31 Oct, 2017 14 commits
  4. 29 Oct, 2017 21 commits
    • Linus Torvalds's avatar
      Linux 4.14-rc7 · 0b07194b
      Linus Torvalds authored
      0b07194b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 19e12196
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix route leak in xfrm_bundle_create().
      
       2) In mac80211, validate user rate mask before configuring it. From
          Johannes Berg.
      
       3) Properly enforce memory limits in fair queueing code, from Toke
          Hoiland-Jorgensen.
      
       4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
      
       5) Fix TSO header allocation and management in mvpp2 driver, from Yan
          Markman.
      
       6) Don't take socket lock in BH handler in strparser code, from Tom
          Herbert.
      
       7) Don't show sockets from other namespaces in AF_UNIX code, from
          Andrei Vagin.
      
       8) Fix double free in error path of tap_open(), from Girish Moodalbail.
      
       9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
          and Alexander Duyck.
      
      10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
      
      11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
          Long.
      
      12) Properly align SKB head before building SKB in tuntap, from Jason
          Wang.
      
      13) Avoid matching qdiscs with a zero handle during lookups, from Cong
          Wang.
      
      14) Fix various endianness bugs in sctp, from Xin Long.
      
      15) Fix tc filter callback races and add selftests which trigger the
          problem, from Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
        selftests: Introduce a new test case to tc testsuite
        selftests: Introduce a new script to generate tc batch file
        net_sched: fix call_rcu() race on act_sample module removal
        net_sched: add rtnl assertion to tcf_exts_destroy()
        net_sched: use tcf_queue_work() in tcindex filter
        net_sched: use tcf_queue_work() in rsvp filter
        net_sched: use tcf_queue_work() in route filter
        net_sched: use tcf_queue_work() in u32 filter
        net_sched: use tcf_queue_work() in matchall filter
        net_sched: use tcf_queue_work() in fw filter
        net_sched: use tcf_queue_work() in flower filter
        net_sched: use tcf_queue_work() in flow filter
        net_sched: use tcf_queue_work() in cgroup filter
        net_sched: use tcf_queue_work() in bpf filter
        net_sched: use tcf_queue_work() in basic filter
        net_sched: introduce a workqueue for RCU callbacks of tc filter
        sctp: fix some type cast warnings introduced since very beginning
        sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
        sctp: fix some type cast warnings introduced by transport rhashtable
        sctp: fix some type cast warnings introduced by stream reconf
        ...
      19e12196
    • David S. Miller's avatar
      Merge branch 'net_sched-fix-races-with-RCU-callbacks' · 6c325f4e
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: fix races with RCU callbacks
      
      Recently, the RCU callbacks used in TC filters and TC actions keep
      drawing my attention, they introduce at least 4 race condition bugs:
      
      1. A simple one fixed by Daniel:
      
      commit c78e1746
      Author: Daniel Borkmann <daniel@iogearbox.net>
      Date:   Wed May 20 17:13:33 2015 +0200
      
          net: sched: fix call_rcu() race on classifier module unloads
      
      2. A very nasty one fixed by me:
      
      commit 1697c4bb
      Author: Cong Wang <xiyou.wangcong@gmail.com>
      Date:   Mon Sep 11 16:33:32 2017 -0700
      
          net_sched: carefully handle tcf_block_put()
      
      3. Two more bugs found by Chris:
      https://patchwork.ozlabs.org/patch/826696/
      https://patchwork.ozlabs.org/patch/826695/
      
      Usually RCU callbacks are simple, however for TC filters and actions,
      they are complex because at least TC actions could be destroyed
      together with the TC filter in one callback. And RCU callbacks are
      invoked in BH context, without locking they are parallel too. All of
      these contribute to the cause of these nasty bugs.
      
      Alternatively, we could also:
      
      a) Introduce a spinlock to serialize these RCU callbacks. But as I
      said in commit 1697c4bb ("net_sched: carefully handle
      tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
      Potentially we need to do a lot of work to make it possible (if not
      impossible).
      
      b) Just get rid of these RCU callbacks, because they are not
      necessary at all, callers of these call_rcu() are all on slow paths
      and holding RTNL lock, so blocking is allowed in their contexts.
      However, David and Eric dislike adding synchronize_rcu() here.
      
      As suggested by Paul, we could defer the work to a workqueue and
      gain the permission of holding RTNL again without any performance
      impact, however, in tcf_block_put() we could have a deadlock when
      flushing workqueue while hodling RTNL lock, the trick here is to
      defer the work itself in workqueue and make it queued after all
      other works so that we keep the same ordering to avoid any
      use-after-free. Please see the first patch for details.
      
      Patch 1 introduces the infrastructure, patch 2~12 move each
      tc filter to the new tc filter workqueue, patch 13 adds
      an assertion to catch potential bugs like this, patch 14
      closes another rcu callback race, patch 15 and patch 16 add
      new test cases.
      ====================
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c325f4e
    • Chris Mi's avatar
      selftests: Introduce a new test case to tc testsuite · 31c2611b
      Chris Mi authored
      In this patchset, we fixed a tc bug. This patch adds the test case
      that reproduces the bug. To run this test case, user should specify
      an existing NIC device:
        # sudo ./tdc.py -d enp4s0f0
      
      This test case belongs to category "flower". If user doesn't specify
      a NIC device, the test cases belong to "flower" will not be run.
      
      In this test case, we create 1M filters and all filters share the same
      action. When destroying all filters, kernel should not panic. It takes
      about 18s to run it.
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31c2611b
    • Chris Mi's avatar
      selftests: Introduce a new script to generate tc batch file · 7f071998
      Chris Mi authored
        # ./tdc_batch.py -h
        usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file
      
        TC batch file generator
      
        positional arguments:
          device                device name
          file                  batch file name
      
        optional arguments:
          -h, --help            show this help message and exit
          -n NUMBER, --number NUMBER
                                how many lines in batch file
          -o, --skip_sw         skip_sw (offload), by default skip_hw
          -s, --share_action    all filters share the same action
          -p, --prio            all filters have different prio
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f071998
    • Cong Wang's avatar
      net_sched: fix call_rcu() race on act_sample module removal · 46e235c1
      Cong Wang authored
      Similar to commit c78e1746
      ("net: sched: fix call_rcu() race on classifier module unloads"),
      we need to wait for flying RCU callback tcf_sample_cleanup_rcu().
      
      Cc: Yotam Gigi <yotamg@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46e235c1
    • Cong Wang's avatar
      net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba
      Cong Wang authored
      After previous patches, it is now safe to claim that
      tcf_exts_destroy() is always called with RTNL lock.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d132eba
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in tcindex filter · 27ce4f05
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27ce4f05
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in rsvp filter · d4f84a41
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4f84a41
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in route filter · c2f3f31d
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f3f31d
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in u32 filter · c0d378ef
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0d378ef
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in matchall filter · df2735ee
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df2735ee
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in fw filter · e071dff2
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e071dff2
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flower filter · 0552c8af
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0552c8af
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flow filter · 94cdb475
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94cdb475
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in cgroup filter · b1b5b04f
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1b5b04f
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in bpf filter · e910af67
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e910af67
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in basic filter · c96a4838
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c96a4838
    • Cong Wang's avatar
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang authored
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aa0045d
    • David S. Miller's avatar
      Merge branch 'sctp-endianness-fixes' · 8c83c885
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for some sparse warnings
      
      As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
      warnings or errors checked by sparse appear. They are all problems
      about Endian and type cast.
      
      Most of them are just warnings by which no issues could be caused
      while some might be bugs.
      
      This patchset fixes them with four patches basically according to
      how they are introduced.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c83c885
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced since very beginning · 978aa047
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      They are there since very beginning.
      
      Note after this patch, there still one warning left in
      sctp_outq_flush():
        sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)
      
      Since it has been moved to sctp_stream_outq_migrate on net-next,
      to avoid the extra job when merging net-next to net, I will post
      the fix for it after the merging is done.
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978aa047