1. 28 Aug, 2017 6 commits
    • John Fastabend's avatar
      bpf: more SK_SKB selftests · ed85054d
      John Fastabend authored
      Tests packet read/writes and additional skb fields.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed85054d
    • John Fastabend's avatar
      bpf: additional sockmap self tests · 6fd28865
      John Fastabend authored
      Add some more sockmap tests to cover,
      
       - forwarding to NULL entries
       - more than two maps to test list ops
       - forwarding to different map
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd28865
    • John Fastabend's avatar
      bpf: sockmap add missing rcu_read_(un)lock in smap_data_ready · d26e597d
      John Fastabend authored
      References to psock must be done inside RCU critical section.
      
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d26e597d
    • John Fastabend's avatar
      bpf: sockmap, remove STRPARSER map_flags and add multi-map support · 2f857d04
      John Fastabend authored
      The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a
      specific use case where we want to have BPF parse program disabled on
      an entry in a sockmap.
      
      However, Alexei found the API a bit cumbersome and I agreed. Lets
      remove the STRPARSER flag and support the use case by allowing socks
      to be in multiple maps. This allows users to create two maps one with
      programs attached and one without. When socks are added to maps they
      now inherit any programs attached to the map. This is a nice
      generalization and IMO improves the API.
      
      The API rules are less ambiguous and do not need a flag:
      
        - When a sock is added to a sockmap we have two cases,
      
           i. The sock map does not have any attached programs so
              we can add sock to map without inheriting bpf programs.
              The sock may exist in 0 or more other maps.
      
          ii. The sock map has an attached BPF program. To avoid duplicate
              bpf programs we only add the sock entry if it does not have
              an existing strparser/verdict attached, returning -EBUSY if
              a program is already attached. Otherwise attach the program
              and inherit strparser/verdict programs from the sock map.
      
      This allows for socks to be in a multiple maps for redirects and
      inherit a BPF program from a single map.
      
      Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY}
      flags. In the original patch I tried to be extra clever and only
      update map entries when necessary. Now I've decided the complexity
      is not worth it. If users constantly update an entry with the same
      sock for no reason (i.e. update an entry without actually changing
      any parameters on map or sock) we still do an alloc/release. Using
      this and allowing multiple entries of a sock to exist in a map the
      logic becomes much simpler.
      
      Note: Now that multiple maps are supported the "maps" pointer called
      when a socket is closed becomes a list of maps to remove the sock from.
      To keep the map up to date when a sock is added to the sockmap we must
      add the map/elem in the list. Likewise when it is removed we must
      remove it from the list. This results in searching the per psock list
      on delete operation. On TCP_CLOSE events we walk the list and remove
      the psock from all map/entry locations. I don't see any perf
      implications in this because at most I have a psock in two maps. If
      a psock were to be in many maps its possibly this might be noticeable
      on delete but I can't think of a reason to dup a psock in many maps.
      The sk_callback_lock is used to protect read/writes to the list. This
      was convenient because in all locations we were taking the lock
      anyways just after working on the list. Also the lock is per sock so
      in normal cases we shouldn't see any contention.
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f857d04
    • John Fastabend's avatar
      bpf: convert sockmap field attach_bpf_fd2 to type · 464bc0fd
      John Fastabend authored
      In the initial sockmap API we provided strparser and verdict programs
      using a single attach command by extending the attach API with a the
      attach_bpf_fd2 field.
      
      However, if we add other programs in the future we will be adding a
      field for every new possible type, attach_bpf_fd(3,4,..). This
      seems a bit clumsy for an API. So lets push the programs using two
      new type fields.
      
         BPF_SK_SKB_STREAM_PARSER
         BPF_SK_SKB_STREAM_VERDICT
      
      This has the advantage of having a readable name and can easily be
      extended in the future.
      
      Updates to samples and sockmap included here also generalize tests
      slightly to support upcoming patch for multiple map support.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      464bc0fd
    • David Wu's avatar
      ARM: dts: rk3228-evb: Fix the compiling error · 901c5d2f
      David Wu authored
      This patch solves the following error:
      arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to non-existent node or label "phy0"
      
      Fixess db40f15b ("ARM: dts: rk3228-evb: Enable the integrated PHY for gmac")
      Signed-off-by: default avatarDavid Wu <david.wu@rock-chips.com>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      901c5d2f
  2. 26 Aug, 2017 18 commits
  3. 25 Aug, 2017 16 commits
    • Sudheer Mogilappagari's avatar
      i40e: synchronize nvmupdate command and adminq subtask · 2bf01935
      Sudheer Mogilappagari authored
      During NVM update, state machine gets into unrecoverable state because
      i40e_clean_adminq_subtask can get scheduled after the admin queue
      command but before other state variables are updated. This causes
      incorrect input to i40e_nvmupd_check_wait_event and state transitions
      don't happen.
      
      This issue existed before but surfaced after commit 373149fc
      ("i40e: Decrease the scope of rtnl lock")
      
      This fix adds locking around admin queue command and update of
      state variables so that adminq_subtask will have accurate information
      whenever it gets scheduled.
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2bf01935
    • Alan Brady's avatar
      i40e: prevent changing ITR if adaptive-rx/tx enabled · 06b2decd
      Alan Brady authored
      Currently the driver allows the user to change (or even disable)
      interrupt moderation if adaptive-rx/tx is enabled when this should
      not be the case.
      
      Adaptive RX/TX will not respect the user's ITR settings so
      allowing the user to change it is weird.  This bug would also
      allow the user to disable interrupt moderation with adaptive-rx/tx
      enabled which doesn't make much sense either.
      
      This patch makes it such that if adaptive-rx/tx is enabled, the user
      cannot make any manual adjustments to interrupt moderation.  It also
      makes it so that if ITR is disabled but adaptive-rx/tx is then
      enabled, ITR will be re-enabled.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      06b2decd
    • Jacob Keller's avatar
      i40e: use cpumask_copy instead of direct assignment · 7e4d01e7
      Jacob Keller authored
      According to the header file cpumask.h, we shouldn't be directly copying
      a cpumask_t, since its a bitmap and might not be copied correctly. Lets
      use the provided cpumask_copy() function instead.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7e4d01e7
    • Alan Brady's avatar
      i40evf: use netdev variable in reset task · f0db7892
      Alan Brady authored
      If we're going to bother initializing a variable to reference it we might
      as well use it.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f0db7892
    • Stefan Assmann's avatar
      i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct virtchnl_vf_resource · fbb113f7
      Stefan Assmann authored
      The current name of vf_offload_flags indicates that the bitmap is
      limited to offload related features. Make this more generic by renaming
      it to vf_cap_flags, which allows for other capabilities besides
      offloading to be added.
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fbb113f7
    • Jacob Keller's avatar
      i40e: move check for avoiding VID=0 filters into i40e_vsi_add_vlan · fcf6cfc8
      Jacob Keller authored
      In i40e_vsi_add_vlan we treat attempting to add VID=0 as an error,
      because it does not do what the caller might expect. We already special
      case VID=0 in i40e_vlan_rx_add_vid so that we avoid this error when
      adding the VLAN.
      
      This special casing is necessary so that we do not add the VLAN=0 filter
      since we don't want to stop receiving untagged traffic. Unfortunately,
      not all callers of i40e_vsi_add_vlan are aware of this, including when
      we add VLANs from a VF device.
      
      Rather than special casing every single caller of i40e_vsi_add_vlan,
      lets just move this check internally. This makes the code simpler
      because the caller does not need to be aware of how VLAN=0 is special,
      and we don't forget to add this check in new places.
      
      This fixes a harmless error message displaying when adding a VLAN from
      within a VF. The message was meaningless but there is no reason to
      confuse end users and system administrators, and this is now avoided.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fcf6cfc8
    • Jacob Keller's avatar
      i40e/i40evf: use cmpxchg64 when updating private flags in ethtool · 841c950d
      Jacob Keller authored
      When a user gives an invalid command to change a private flag which is
      not supported, either because it is read-only, or the device is not
      capable of the feature, we simply ignore the request.
      
      A naive solution would simply be to report error codes when one of the
      flags was not supported. However, this causes problems because it makes
      the operation not atomic. If a user requests multiple private flags
      together at once we could end up changing one before failing at the
      second flag.
      
      We can do a bit better if we instead update a temporary copy of the
      flags variable in the loop, and then copy it into place after. If we
      aren't careful this has the pitfall of potentially silently overwriting
      any changes caused by other threads.
      
      Avoid this by using cmpxchg64 which will compare and swap the flags
      variable only if it currently matched the old value. We'll report
      -EAGAIN in the (hopefully rare!) case where the cmpxchg64 fails.
      
      This ensures that we can properly report when flags are not supported in
      an atomic fashion without the risk of overwriting other threads changes.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      841c950d
    • Anjali Singhai Jain's avatar
      i40e: Detect ATR HW Evict NVM issue and disable the feature · 10a955ff
      Anjali Singhai Jain authored
      This patch fixes a problem with the HW ATR eviction feature where the
      NVM setting was incorrect.  This patch detects the issue on X720
      adapters and disables the feature if the NVM setting is incorrect.
      
      Without this patch, HW ATR Evict feature does not work on broken NVMs
      and is not detected either.  If the HW ATR Evict feature is disabled
      the SW Eviction feature will take effect.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      10a955ff
    • Jacob Keller's avatar
      i40e: remove workaround for Open Firmware MAC address · 28921a0c
      Jacob Keller authored
      Since commit b499ffb0 ("i40e: Look up MAC address in Open Firmware
      or IDPROM"), we've had support for obtaining the MAC address
      form Open Firmware or IDPROM.
      
      This code relied on sending the Open Firmware address directly to the
      device firmware instead of relying on our MAC/VLAN filter list. Thus,
      a work around was introduced in commit b1b15df5 ("i40e: Explicitly
      write platform-specific mac address after PF reset")
      
      We refactored the Open Firmware address enablement code in the ill-named
      commit 41c4c2b5 ("i40e: allow look-up of MAC address from Open
      Firmware or IDPROM")
      
      Since this refactor, we no longer even set I40E_FLAG_PF_MAC. Further, we
      don't need this work around, because we actually store the MAC address
      as part of the MAC/VLAN filter hash. Thus, we will restore the address
      correctly upon reset.
      
      The refactor above failed to revert the workaround, so do that now.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      28921a0c
    • Jacob Keller's avatar
      i40e: separate hw_features from runtime changing flags · d36e41dc
      Jacob Keller authored
      The number of flags found in pf->flags has grown quite large, and there
      are a lot of different types of flags. Most of the flags are simply
      hardware features which are enabled on some firmware or some MAC types.
      Other flags are dynamic run-time flags which enable or disable certain
      features of the driver.
      
      Separate these two types of flags into pf->hw_features and pf->flags.
      The hw_features list will contain a set of features which are enabled at
      init time. This will not contain toggles or otherwise dynamically
      changing features. These flags should not need atomic protections, as
      they will be set once during init and then be essentially read only.
      
      Everything else will remain in the flags variable. These flags may be
      modified at any time during run time. A future patch may wish to convert
      these flags into set_bit/clear_bit/test_bit or similar approach to
      ensure atomic correctness.
      
      The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but
      currently is used by ethtool in the private flags settings, and thus has
      been left as part of flags.
      
      Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the
      hw_features but this patch has not tried to untangle it yet.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d36e41dc
    • Anjali Singhai Jain's avatar
      i40e: Fix a bug with VMDq RSS queue allocation · 5a433199
      Anjali Singhai Jain authored
      The X722 pf flag setup should happen before the VMDq RSS queue count is
      initialized for VMDq VSI to get the right number of queues for RSS in
      case of X722 devices.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5a433199
    • Sudheer Mogilappagari's avatar
      i40evf: prevent VF close returning before state transitions to DOWN · fe2647ab
      Sudheer Mogilappagari authored
      Currently i40evf_close() can return before state transitions to
      __I40EVF_DOWN because of the latency involved in processing and
      receiving response from PF driver and scheduling of VF watchdog_task.
      Due to this inconsistency an immediate call to i40evf_open() fails
      because state is still DOWN_PENDING.
      
      When a VF interface is in up state and we try to add it as slave,
      The bonding driver calls dev_close() and dev_open() in short duration
      resulting in dev_open returning error. The ifenslave command needs
      to be run again for dev_open to succeed.
      
      This fix ensures that watchdog timer is scheduled immediately after
      admin queue operations are scheduled in i40evf_down(). In addition a
      wait condition is added at the end of i40evf_close so that function
      wont return when state is still DOWN_PENDING. The timeout value is
      chosen after some profiling and includes some buffer.
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe2647ab
    • Mitch Williams's avatar
      i40e/i40evf: adjust packet size to account for double VLANs · 1e3a5fd5
      Mitch Williams authored
      Now that the kernel supports double VLAN tags, we should at least play
      nice. Adjust the max packet size to account for two VLAN tags, not just
      one.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e3a5fd5
    • Eric Biggers's avatar
      strparser: initialize all callbacks · 3fd87127
      Eric Biggers authored
      commit bbb03029 ("strparser: Generalize strparser") added more
      function pointers to 'struct strp_callbacks'; however, kcm_attach() was
      not updated to initialize them.  This could cause the ->lock() and/or
      ->unlock() function pointers to be set to garbage values, causing a
      crash in strp_work().
      
      Fix the bug by moving the callback structs into static memory, so
      unspecified members are zeroed.  Also constify them while we're at it.
      
      This bug was found by syzkaller, which encountered the following splat:
      
          IP: 0x55
          PGD 3b1ca067
          P4D 3b1ca067
          PUD 3b12f067
          PMD 0
      
          Oops: 0010 [#1] SMP KASAN
          Dumping ftrace buffer:
             (ftrace buffer empty)
          Modules linked in:
          CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: kstrp strp_work
          task: ffff88006bb0e480 task.stack: ffff88006bb10000
          RIP: 0010:0x55
          RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
          RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
          RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
          RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
          R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
          R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
          FS:  0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
           worker_thread+0x223/0x1860 kernel/workqueue.c:2233
           kthread+0x35e/0x430 kernel/kthread.c:231
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
          Code:  Bad RIP value.
          RIP: 0x55 RSP: ffff88006bb17540
          CR2: 0000000000000055
          ---[ end trace f0e4920047069cee ]---
      
      Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
      CONFIG_AF_KCM=y):
      
          #include <linux/bpf.h>
          #include <linux/kcm.h>
          #include <linux/types.h>
          #include <stdint.h>
          #include <sys/ioctl.h>
          #include <sys/socket.h>
          #include <sys/syscall.h>
          #include <unistd.h>
      
          static const struct bpf_insn bpf_insns[3] = {
              { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
              { .code = 0x95 }, /* BPF_EXIT_INSN() */
          };
      
          static const union bpf_attr bpf_attr = {
              .prog_type = 1,
              .insn_cnt = 2,
              .insns = (uintptr_t)&bpf_insns,
              .license = (uintptr_t)"",
          };
      
          int main(void)
          {
              int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
                                   &bpf_attr, sizeof(bpf_attr));
              int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
              int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);
      
              ioctl(kcm_fd, SIOCKCMATTACH,
                    &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
          }
      
      Fixes: bbb03029 ("strparser: Generalize strparser")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fd87127
    • Haiyang Zhang's avatar
      hv_netvsc: Fix rndis_filter_close error during netvsc_remove · c6f71c41
      Haiyang Zhang authored
      We now remove rndis filter before unregister_netdev(), which calls
      device close. It involves closing rndis filter already removed.
      
      This patch fixes this error.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6f71c41
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 0cf3f4c3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-08-24
      
      This series includes updates to mlx5 core driver.
      
      From Gal and Saeed, three cleanup patches.
      From Matan, Low level flow steering improvements and optimizations,
       - Use more efficient data structures for flow steering objects handling.
       - Add tracepoints to flow steering operations.
       - Overall these patches improve flow steering rule insertion rate by a
         factor of seven in large scales (~50K rules or more).
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cf3f4c3