1. 12 Apr, 2019 3 commits
    • Vlad Buslov's avatar
      net: sched: flower: fix filter net reference counting · 9994677c
      Vlad Buslov authored
      Fix net reference counting in fl_change() and remove redundant call to
      tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
      before releasing exts and deallocating a filter, so this code caused flower
      classifier to obtain net twice per filter that is being deleted.
      
      Implementation of __fl_delete() called tcf_exts_get_net() to pass its
      result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
      and only complicates fl_mask_put() implementation. This functionality seems
      to be copied from filter cleanup code, where it was added by Cong with
      following explanation:
      
          This patchset tries to fix the race between call_rcu() and
          cleanup_net() again. Without holding the netns refcnt the
          tc_action_net_exit() in netns workqueue could be called before
          filter destroy works in tc filter workqueue. This patchset
          moves the netns refcnt from tc actions to tcf_exts, without
          breaking per-netns tc actions.
      
      This doesn't apply to flower mask, which doesn't call any tc action code
      during cleanup. Simplify fl_mask_put() by removing the flag parameter and
      always use tcf_queue_work() to free mask objects.
      
      Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9994677c
    • David Ahern's avatar
      selftests: Add debugging options to pmtu.sh · 56490b62
      David Ahern authored
      pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
      If a test fails, it is near impossible to debug why. For example:
      
          TEST: ipv6: PMTU exceptions                       [FAIL]
      
      There are a lot of commands run behind the scenes for this test. Which
      one is failing?
      
      Add a VERBOSE option to show commands that are run and any output from
      those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
      fails allowing users to poke around with the setup in the failed state.
      
      In the process, rename tracing to TRACING and move declaration to top
      with the new variables.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56490b62
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bb23581b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-04-12
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Improve BPF verifier scalability for large programs through two
         optimizations: i) remove verifier states that are not useful in pruning,
         ii) stop walking parentage chain once first LIVE_READ is seen. Combined
         gives approx 20x speedup. Increase limits for accepting large programs
         under root, and add various stress tests, from Alexei.
      
      2) Implement global data support in BPF. This enables static global variables
         for .data, .rodata and .bss sections to be properly handled which allows
         for more natural program development. This also opens up the possibility
         to optimize program workflow by compiling ELFs only once and later only
         rewriting section data before reload, from Daniel and with test cases and
         libbpf refactoring from Joe.
      
      3) Add config option to generate BTF type info for vmlinux as part of the
         kernel build process. DWARF debug info is converted via pahole to BTF.
         Latter relies on libbpf and makes use of BTF deduplication algorithm which
         results in 100x savings compared to DWARF data. Resulting .BTF section is
         typically about 2MB in size, from Andrii.
      
      4) Add BPF verifier support for stack access with variable offset from
         helpers and add various test cases along with it, from Andrey.
      
      5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
         so that L2 encapsulation can be used for tc tunnels, from Alan.
      
      6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
         users can define a subset of allowed __sk_buff fields that get fed into
         the test program, from Stanislav.
      
      7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
         various UBSAN warnings in bpftool, from Yonghong.
      
      8) Generate a pkg-config file for libbpf, from Luca.
      
      9) Dump program's BTF id in bpftool, from Prashant.
      
      10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
          program, from Magnus.
      
      11) kallsyms related fixes for the case when symbols are not present in
          BPF selftests and samples, from Daniel
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb23581b
  2. 11 Apr, 2019 34 commits
  3. 10 Apr, 2019 3 commits
    • Jakub Kicinski's avatar
      net: strparser: fix comment · 93e21254
      Jakub Kicinski authored
      Fix comment.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93e21254
    • David Ahern's avatar
      ipv4: Handle RTA_GATEWAY set to 0 · d73f80f9
      David Ahern authored
      Govindarajulu reported a regression with Network Manager which sends an
      RTA_GATEWAY attribute with the address set to 0. Fixup the handling of
      RTA_GATEWAY to only set fc_gw_family if the gateway address is actually
      set.
      
      Fixes: f35b794b ("ipv4: Prepare fib_config for IPv6 gateway")
      Reported-by: default avatarGovindarajulu Varadarajan <govind.varadar@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d73f80f9
    • David S. Miller's avatar
      Merge branch 'net-sched-move-back-qlen-to-per-CPU-accounting' · 44b9b6ca
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      net: sched: move back qlen to per CPU accounting
      
      The commit 46b1c18f ("net: sched: put back q.qlen into a single location")
      introduced some measurable regression in the contended scenarios for
      lock qdisc.
      
      As Eric suggested we could replace q.qlen access with calls to qdisc_is_empty()
      in the datapath and revert the above commit. The TC subsystem updates
      qdisc->is_empty in a somewhat loose way: notably 'is_empty' is set only when
      the qdisc dequeue() calls return a NULL ptr. That is, the invocation after
      the last packet is dequeued.
      
      The above is good enough for BYPASS implementation - the only downside is that
      we end up avoiding the optimization for a very small time-frame - but will
      break hard things when internal structures consistency for classful qdisc
      relies on child qdisc_is_empty().
      
      A more strict 'is_empty' update adds a relevant complexity to its life-cycle, so
      this series takes a different approach: we allow lockless qdisc to switch from
      per CPU accounting to global stats accounting when the NOLOCK bit is cleared.
      Since most pieces of infrastructure are already in place, this requires very
      little changes to the pfifo_fast qdisc, and any later NOLOCK qdisc can hook
      there with little effort - no need to maintain two different implementations.
      
      The first 2 patches removes direct qlen access from non core TC code, the 3rd
      and 4th patches place and use the infrastructure to allow stats account
      switching and the 5th patch is the actual revert.
      
       v1 -> v2:
        - fixed build issues
        - more descriptive commit message for patch 5/5
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44b9b6ca