1. 08 Nov, 2017 19 commits
  2. 07 Nov, 2017 1 commit
    • Eric Dumazet's avatar
      ipv6: addrconf: fix a lockdep splat · fffcefe9
      Eric Dumazet authored
      Fixes a case where GFP_ATOMIC allocation must be used instead of
      GFP_KERNEL one.
      
      [   54.891146]  lock_acquire+0xb3/0x2f0
      [   54.891153]  ? fs_reclaim_acquire.part.60+0x5/0x30
      [   54.891165]  fs_reclaim_acquire.part.60+0x29/0x30
      [   54.891170]  ? fs_reclaim_acquire.part.60+0x5/0x30
      [   54.891178]  kmem_cache_alloc_trace+0x3f/0x500
      [   54.891186]  ? cyc2ns_read_end+0x1e/0x30
      [   54.891196]  ipv6_add_addr+0x15a/0xc30
      [   54.891217]  ? ipv6_create_tempaddr+0x2ea/0x5d0
      [   54.891223]  ipv6_create_tempaddr+0x2ea/0x5d0
      [   54.891238]  ? manage_tempaddrs+0x195/0x220
      [   54.891249]  ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
      [   54.891255]  addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
      [   54.891268]  addrconf_prefix_rcv+0x2e5/0x9b0
      [   54.891279]  ? neigh_update+0x446/0xb90
      [   54.891298]  ? ndisc_router_discovery+0x5ab/0xf00
      [   54.891303]  ndisc_router_discovery+0x5ab/0xf00
      [   54.891311]  ? retint_kernel+0x2d/0x2d
      [   54.891331]  ndisc_rcv+0x1b6/0x270
      [   54.891340]  icmpv6_rcv+0x6aa/0x9f0
      [   54.891345]  ? ipv6_chk_mcast_addr+0x176/0x530
      [   54.891351]  ? do_csum+0x17b/0x260
      [   54.891360]  ip6_input_finish+0x194/0xb20
      [   54.891372]  ip6_input+0x5b/0x2c0
      [   54.891380]  ? ip6_rcv_finish+0x320/0x320
      [   54.891389]  ip6_mc_input+0x15a/0x250
      [   54.891396]  ipv6_rcv+0x772/0x1050
      [   54.891403]  ? consume_skb+0xbe/0x2d0
      [   54.891412]  ? ip6_make_skb+0x2a0/0x2a0
      [   54.891418]  ? ip6_input+0x2c0/0x2c0
      [   54.891425]  __netif_receive_skb_core+0xa0f/0x1600
      [   54.891436]  ? process_backlog+0xac/0x400
      [   54.891441]  process_backlog+0xfa/0x400
      [   54.891448]  ? net_rx_action+0x145/0x1130
      [   54.891456]  net_rx_action+0x310/0x1130
      [   54.891524]  ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta]
      [   54.891538]  __do_softirq+0x140/0xaba
      [   54.891553]  irq_exit+0x10b/0x160
      [   54.891561]  do_IRQ+0xbb/0x1b0
      
      Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fffcefe9
  3. 05 Nov, 2017 20 commits
    • David S. Miller's avatar
      Merge branch 'eBPF-based-device-cgroup-controller' · 2798b80b
      David S. Miller authored
      Roman Gushchin says:
      
      ====================
      eBPF-based device cgroup controller
      
      This patchset introduces an eBPF-based device controller for cgroup v2.
      
      Patches (1) and (2) are a preparational work required to share some code
        with the existing device controller implementation.
      Patch (3) is the main patch, which introduces a new bpf prog type
        and all necessary infrastructure.
      Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
      Patch (5) implements an example of eBPF program which controls access
        to device files and corresponding userspace test.
      
      v3:
        Renamed constants introduced by patch (3) to BPF_DEVCG_*
      
      v2:
        Added patch (1).
      
      v1:
        https://lkml.org/lkml/2017/11/1/363
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2798b80b
    • Roman Gushchin's avatar
      selftests/bpf: add a test for device cgroup controller · 37f1ba09
      Roman Gushchin authored
      Add a test for device cgroup controller.
      
      The test loads a simple bpf program which logs all
      device access attempts using trace_printk() and forbids
      all operations except operations with /dev/zero and
      /dev/urandom.
      
      Then the test creates and joins a test cgroup, and attaches
      the bpf program to it.
      
      Then it tries to perform some simple device operations
      and checks the result:
      
        create /dev/null (should fail)
        create /dev/zero (should pass)
        copy data from /dev/urandom to /dev/zero (should pass)
        copy data from /dev/urandom to /dev/full (should fail)
        copy data from /dev/random to /dev/zero (should fail)
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37f1ba09
    • Roman Gushchin's avatar
      bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/ · 9d1f1594
      Roman Gushchin authored
      The purpose of this move is to use these files in bpf tests.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d1f1594
    • Roman Gushchin's avatar
      bpf, cgroup: implement eBPF-based device controller for cgroup v2 · ebc614f6
      Roman Gushchin authored
      Cgroup v2 lacks the device controller, provided by cgroup v1.
      This patch adds a new eBPF program type, which in combination
      of previously added ability to attach multiple eBPF programs
      to a cgroup, will provide a similar functionality, but with some
      additional flexibility.
      
      This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
      A program takes major and minor device numbers, device type
      (block/character) and access type (mknod/read/write) as parameters
      and returns an integer which defines if the operation should be
      allowed or terminated with -EPERM.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebc614f6
    • Roman Gushchin's avatar
      device_cgroup: prepare code for bpf-based device controller · ecf8fecb
      Roman Gushchin authored
      This is non-functional change to prepare the device cgroup code
      for adding eBPF-based controller for cgroups v2.
      
      The patch performs the following changes:
      1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
         are moving to the device-cgroup.h and converting into static inline.
      2) __devcgroup_check_permission() is exported.
      3) devcgroup_check_permission() wrapper is introduced to be used
         by both existing and new bpf-based implementations.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecf8fecb
    • Roman Gushchin's avatar
      device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants · 67e306fd
      Roman Gushchin authored
      Rename device type and access type constants defined in
      security/device_cgroup.c by adding the DEVCG_ prefix.
      
      The reason behind this renaming is to make them global namespace
      friendly, as they will be moved to the corresponding header file
      by following patches.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67e306fd
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 488e5b30
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-11-04
      
      This series includes:
      
      From Huy: dscp to priority mapping for Ethernet packet.
      
      ===================================================
      First six patches enable differentiated services code point (dscp) to
      priority mapping for Ethernet packet. Once this feature is
      enabled, the packet is routed to the corresponding priority based on its
      dscp. User can combine this feature with priority flow control (pfc)
      feature to have priority flow control based on the dscp.
      
      Firmware interface:
      Mellanox firmware provides two control knobs for this feature:
        QPTS register allow changing the trust state between dscp and
        pcp mode. The default is pcp mode. Once in dscp mode, firmware will
        route the packet based on its dscp value if the dscp field exists.
      
        QPDPM register allow mapping a specific dscp (0 to 63) to a
        specific priority (0 to 7). By default, all the dscps are mapped to
        priority zero.
      
      Software interface:
      This feature is controlled via application priority TLV. IEEE
      specification P802.1Qcd/D2.1 defines priority selector id 5 for
      application priority TLV. This APP TLV selector defines DSCP to priority
      map. This APP TLV can be sent by the switch or can be set locally using
      software such as lldptool. In mlx5 drivers, we add the support for net
      dcb's getapp and setapp call back. Mlx5 driver only handles the selector
      id 5 application entry (dscp application priority application entry).
      If user sends multiple dscp to priority APP TLV entries on the same
      dscp, the last sent one will take effect. All the previous sent will be
      deleted.
      
      The firmware trust state (in QPTS register) is changed based on the
      number of dscp to priority application entries. When the first dscp to
      priority application entry is added by the user, the trust state is
      changed to dscp. When the last dscp to priority application entry is
      deleted by the user, the trust state is changed to pcp.
      
      When the port is in DSCP trust state, the transmit queue is selected
      based on the dscp of the skb.
      
      When the port is in DSCP trust state and vport inline mode is not NONE,
      firmware requires mlx5 driver to copy the IP header to the
      wqe ethernet segment inline header if the skb has it.
      This is done by changing the transmit queue sq's min inline mode to L3.
      Note that the min inline mode of sqs that belong to other features
      such as xdpsq, icosq are not modified.
      ===================================================
      
      Plus to the dscp series, some small misc changes are include as well:
      
      From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
      From Or Gerlitz, Enlarge the NIC TC offload table size
      From Rabie, Initialize destination_flow struct to 0
      From Feras, Add inner TTC table to IPoIB flow steering
      From Tal, Enable CQE based moderation on TX CQ
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      488e5b30
    • David S. Miller's avatar
      Merge branch 'nfp-ethtool-and-related-improvements' · bfe26ba9
      David S. Miller authored
      Simon Horman says:
      
      ====================
      nfp: ethtool and related improvements
      
      Dirk van der Merwe says:
      
      This patch series throws a couple of loosely related items into a single
      series.
      
      Patch 1: Clang compilation fix reported by
        Matthias Kaehlcke <mka@chromium.org>
      
      Patch 2: Driver can now do MAC reinit on load when there has been a
        media override set in the NSP.
      
      Patch 3: Refactor the nfp_app_reprs_set API.
      
      Patch 4: Similar to vNICs, representors must be able to deal with media
        override changes in the NSP.
      
      Patch 5: Since representors can now handle media overrides, we can
        allocate the get/set link ndo's to them.
      
      Patch 6 & 7: Add support for FEC mode modification.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfe26ba9
    • Dirk van der Merwe's avatar
      nfp: implement ethtool FEC mode settings · 0d087093
      Dirk van der Merwe authored
      Add support in the driver ethtool ops to modify the NFP FEC modes.
      
      The FEC modes can be set for vNIC associated with physical ports or
      for MAC representor netdevs.
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d087093
    • Dirk van der Merwe's avatar
      nfp: add helpers for FEC support · b471232e
      Dirk van der Merwe authored
      Implement helpers to determine and modify FEC modes via the NSP.
      The NSP advertises FEC capabilities on a per port basis and provides
      support for:
      * Auto mode selection
      * Reed Solomon
      * BaseR
      * None/Off
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b471232e
    • Dirk van der Merwe's avatar
      nfp: add get/set link settings ndos to representors · a564d30e
      Dirk van der Merwe authored
      Since it is now safe to modify link settings for representors, we can
      attach the get/set link settings ndos to it. The get/set link settings
      are nfp_port based operations.
      
      If a port becomes invalid, the representor will be removed in the same
      way a vnic would be.
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a564d30e
    • Dirk van der Merwe's avatar
      nfp: resync repr state when port table sync · 5fa27d59
      Dirk van der Merwe authored
      If the NSP port table has been refreshed, resync the representor state
      with the new port information. At the moment, this only entails looking
      for invalid ports and killing off representors associated with them.
      
      The repr instance becomes NULL which is safe since the app accessor
      function for reprs returns NULL when it cannot access a repr.
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fa27d59
    • Dirk van der Merwe's avatar
      nfp: refactor nfp_app_reprs_set · 51ccc37d
      Dirk van der Merwe authored
      The criteria that reprs cannot be replaced with another new set of reprs
      has been removed. This check is not needed since the only use case that
      could exercise this at the moment, would be to modify the number of
      SRIOV VFs without first disabling them. This case is explicitly
      disallowed in any case and subsequent patches in this series
      need to be able to replace the running set of reprs.
      
      All cases where the return code used to be checked for the
      nfp_app_reprs_set function have been removed.
      As stated above, it is not possible for the current code to encounter a
      case where reprs exist and need to be replaced.
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51ccc37d
    • Jakub Kicinski's avatar
      nfp: make use of MAC reinit · 7717c319
      Jakub Kicinski authored
      Recent management FW images can perform full reinit of MAC cores
      without requiring a reboot.  When loading the driver check if there
      are changes pending and if so call NSP MAC reinit.  Full application
      FW reload is still required, and all MACs need to be reinited at the
      same time (not only the ones which have been reconfigured, and thus
      potentially causing disruption to unrelated netdevs) therefore for
      now changing MAC config without reloading the driver still remains
      future work.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Tested-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7717c319
    • Jakub Kicinski's avatar
      nfp: don't depend on compiler constant propagation · 4e595325
      Jakub Kicinski authored
      Matthias reports:
      
        nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
        identify the 'mask' parameter as known to be constant at compile time,
        which is required to use the FIELD_GET() macro.
      
        The forced inlining does the trick for gcc, but for kernel builds with
        clang it results in undefined symbols:
      
        drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function
          `__nfp_eth_set_aneg':
      
      drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787):
          undefined reference to `__compiletime_assert_492'
      
      drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1):
          undefined reference to `__compiletime_assert_496'
      
        These __compiletime_assert_xyx() calls would have been optimized away
      if
        the compiler had seen 'mask' as a constant.
      
      Add a macro to extract the mask and shift and pass those to
      nfp_eth_set_bit_config() separately.
      Reported-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Tested-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e595325
    • Priyaranjan Jha's avatar
      tcp: higher throughput under reordering with adaptive RACK reordering wnd · 1f255691
      Priyaranjan Jha authored
      Currently TCP RACK loss detection does not work well if packets are
      being reordered beyond its static reordering window (min_rtt/4).Under
      such reordering it may falsely trigger loss recoveries and reduce TCP
      throughput significantly.
      
      This patch improves that by increasing and reducing the reordering
      window based on DSACK, which is now supported in major TCP implementations.
      It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.
      
      - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
        by srtt), since there is possibility that spurious retransmission was
        due to reordering delay longer than reo_wnd.
      
      - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
        no. of successful recoveries (accounts for full DSACK-based loss
        recovery undo). After that, reset it to default (min_rtt/4).
      
      - At max, reo_wnd is incremented only once per rtt. So that the new
        DSACK on which we are reacting, is due to the spurious retx (approx)
        after the reo_wnd has been updated last time.
      
      - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
        absolute value to account for change in rtt.
      
      In our internal testing, we observed significant increase in throughput,
      in scenarios where reordering exceeds min_rtt/4 (previous static value).
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f255691
    • David S. Miller's avatar
      Merge branch 'dsa-parsing-stage' · 6c49b5e2
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: parsing stage
      
      When registering a DSA switch, there is basically two stages.
      
      The first stage is the parsing of the switch device, from either device
      tree or platform data. It fetches the DSA tree to which it belongs, and
      validates its ports. The switch device is then added to the tree, and
      the second stage is called if this was the last switch of the tree.
      
      The second stage is the setup of the tree, which validates that the tree
      is complete, sets up the routing tables, the default CPU port for user
      ports, sets up the switch drivers and finally the master interfaces,
      which makes the whole switch fabric functional.
      
      This patch series covers the first parsing stage. It fixes the type of
      the switch and tree indexes to unsigned int, simplifies the tree
      reference counting and the switch and CPU ports parsing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c49b5e2
    • Vivien Didelot's avatar
      net: dsa: resolve tagging protocol at parse time · 7354fcb0
      Vivien Didelot authored
      Extend the dsa_port_parse_cpu() function to resolve the tagging protocol
      at port parsing time, instead of waiting for the whole tree to be
      complete.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7354fcb0
    • Vivien Didelot's avatar
      net: dsa: add one port parsing function per type · 06e24d08
      Vivien Didelot authored
      Add dsa_port_parse_user, dsa_port_parse_dsa and dsa_port_parse_cpu
      functions to factorize the code shared by both OF and pdata parsing.
      
      They don't do much for the moment but will be extended later to support
      tagging protocol resolution for example.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06e24d08
    • Vivien Didelot's avatar
      net: dsa: only check presence of link property · 54df6fa9
      Vivien Didelot authored
      When parsing a port, simply use of_property_read_bool which checks the
      presence of a given property, instead of parsing the link phandle.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54df6fa9