1. 31 May, 2019 22 commits
    • Oz Shlomo's avatar
      net/mlx5e: Use termination table for VLAN push actions · 10caabda
      Oz Shlomo authored
      HW does not support push VLAN action in the RX direction (packets
      arriving from the wire). The FW works around this limitation by haripining
      the packet. The hairpin workaround applies only when the push VLAN action
      is specified in a termination table, assuring that there are no actions
      following the haripin.
      
      Instantiate termination table for push VLAN actions. Re-use identical
      terminating tables for increased HW cache efficiency.
      Signed-off-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      10caabda
    • Yevgeny Kliteynik's avatar
      net/mlx5e: Geneve, Add support for encap/decap flows offload · 9272e3df
      Yevgeny Kliteynik authored
      Add HW offloading support for flows with Geneve encap/decap.
      
      Notes about decap flows with Geneve TLV Options:
        - Support offloading of 32-bit options data only
        - At any given time, only one combination of class/type parameters
          can be offloaded, but the same class/type combination can have
          many different flows offloaded with different 32-bit option data
        - Options with value of 0 can't be offloaded
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9272e3df
    • Yevgeny Kliteynik's avatar
      net/mlx5e: Rearrange tc tunnel code in a modular way · d386939a
      Yevgeny Kliteynik authored
      Rearrange tc tunnel code so that it would be easy to add future tunnels:
       - Define tc tunnel object with the fields and callbacks that any
         tunnel must implement.
       - Define tc UDP tunnel object for UDP tunnels, such as VXLAN
       - Move each tunnel code (GRE, VXLAN) to its own separate file
       - Rewrite tc tunnel implementation in a general way - using only
         the objects and their callbacks.
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d386939a
    • Yevgeny Kliteynik's avatar
      net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct · 1f6da306
      Yevgeny Kliteynik authored
      In mlx5e encap entry structure, IP tunnel info data structure is copied
      by value. This approach worked till now, but it breaks when there are
      encapsulation options, such as in case of Geneve.
      
      These options are stored in the structure that is allocated adjacent to
      the IP tunnel info struct, and not pointed at by any field in that struct.
      Therefore, when copying the struct by value, we loose the address of the
      original struct and can't get to the encapsulation options.
      
      Fix the problem by storing the pointer to the tunnel info data instead.
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1f6da306
    • Yevgeny Kliteynik's avatar
      net/mlx5: Geneve, Manage Geneve TLV options · 0ccc171e
      Yevgeny Kliteynik authored
      Use Geneve TLV Options object to manage the flex parser matching
      on the 32-bit options data.
      
      When the first flow with a certain class/type values is requested to
      be offloaded, create a FW object with FW command (Geneve TLV Options
      general object) and start counting the number of flows using this object.
      
      During this time, any request with a different class/type values will
      fail to be offloaded.
      Once the refcount reaches 0, destroy the TLV options general object,
      and can now offload a flow with any class/type parameters.
      
      Geneve TLV Options object is added to core device.
      It is currently used to manage Geneve TLV options general
      object allocation in FW and its reference counting only.
      In the future it will also be used for managing geneve ports
      by registering callbacks for ndo_udp_tunnel_add/del.
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0ccc171e
    • Yevgeny Kliteynik's avatar
      net/mlx5e: Enable setting multiple match criteria for flow group · d4a18e16
      Yevgeny Kliteynik authored
      When filling in flow spec match criteria, to allow previous
      modifications of the match criteria, use "|=" rather than "=".
      
      Tunnel options are parsed before the match criteria of the offloaded
      flow are being set. If the the flow that we're about to offload has
      encapsulation options, the flow group might need to match on additional
      criteria.
      
      For Geneve, an additional flow group matching parameter should
      be used - misc3. The appropriate bit in the match criteria is set
      while parsing the tunnel options, so the criteria value shouldn't
      be overwritten.
      
      This is a pre-step for supporting Geneve TLV options offload.
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d4a18e16
    • Tonghao Zhang's avatar
      net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action · d1bda7ee
      Tonghao Zhang authored
      In some case, we don't care the enc_src_ip and enc_dst_ip, and
      if we don't match the field enc_src_ip and enc_dst_ip, we can use
      fewer flows in hardware when revice the tunnel packets. For example,
      the tunnel packets may be sent from different hosts, we must offload
      one rule for each host.
      
      	$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
      		flower dst_mac 00:11:22:33:44:00 \
      		enc_src_ip Host0_IP enc_dst_ip 2.2.2.100 \
      		enc_dst_port 4789 enc_key_id 100 \
      		action tunnel_key unset action mirred egress redirect dev eth0_1
      
      	$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
      		flower dst_mac 00:11:22:33:44:00 \
      		enc_src_ip Host1_IP enc_dst_ip 2.2.2.100 \
      		enc_dst_port 4789 enc_key_id 100 \
      		action tunnel_key unset action mirred egress redirect dev eth0_1
      
      If we support flows which only match the enc_key_id and enc_dst_port,
      a flow can process the packets sent to VM which (mac 00:11:22:33:44:00).
      
      	$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
      		flower dst_mac 00:11:22:33:44:00 \
      		enc_dst_port 4789 enc_key_id 100 \
      		action tunnel_key unset action mirred egress redirect dev eth0_1
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d1bda7ee
    • Vu Pham's avatar
      net/mlx5e: Generalize vport type in vport representor · 9b81d5a9
      Vu Pham authored
      Beside the special vports (PF/uplink/ecpf), the rest of the vports
      are similar.
      Remove vf_ prefix from function and variable names.
      
      This patch does not change any functionality.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9b81d5a9
    • Saeed Mahameed's avatar
      Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 7fe4d43e
      Saeed Mahameed authored
      This series provides some low level updates for mlx5 driver needed for
      both rdma and netdev trees.
      
      1) Termination flow steering table bits and hardware definitions.
      
      2) Introduce the core dump HW access registers definitions.
      
      3) Refactor and cleans-up VF representors functions handlers.
      
      4) Renames host_params bits to function_changed bits and add the
      support for eswitch functions change event in the eswitch general case.
      (for both legacy and switchdev modes).
      
      5) Potential error pointer dereference in error handling
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7fe4d43e
    • Parav Pandit's avatar
      {IB,net}/mlx5: Constify rep ops functions pointers · 8693115a
      Parav Pandit authored
      Currently for every representor type and for every single vport,
      representer function pointers copy is stored even though they don't
      change from one to other vport.
      
      Additionally priv data entry for the rep is not passed during
      registration, but its copied. It is used (set and cleared) by the user
      of the reps.
      
      As we want to scale vports, to simplify and also to split constants
      from data,
      
      1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops
      prefix with other standard netdev, ibdev ops.
      2. Constify the IB and Ethernet rep ops structure.
      3. Instead of storing copy of all rep function pointers, store copy
      per eswitch rep type.
      4. Split data and function pointers to mlx5_eswitch_rep_ops and
      mlx5_eswitch_rep_data.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8693115a
    • Parav Pandit's avatar
      {IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev* · c94ff748
      Parav Pandit authored
      Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv*
      as it is not needed.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c94ff748
    • Vu Pham's avatar
      net/mlx5: E-Switch, Honor eswitch functions changed event cap · 6706a3b9
      Vu Pham authored
      Whenever device supports eswitch functions changed event, honor
      such device setting. Do not limit it to ECPF.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6706a3b9
    • Vu Pham's avatar
      net/mlx5: E-Switch, Replace host_params event with functions_changed event · cd56f929
      Vu Pham authored
      To support sriov on a E-Switch manager, num_vfs are queried
      to the firmware whenever E-Switch manager is notified by
      esw_functions_changed event.
      
      Replace host_params event with esw_functions_changed event that reflects
      more appropriate naming.
      
      While at it, also correct num_vfs type from int to u16 as expected by
      the function mlx5_esw_query_functions().
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      cd56f929
    • Eli Britstein's avatar
      net/mlx5: Introduce termination table bits · c6d4e45d
      Eli Britstein authored
      Termination table is a flow table with a termination flag. The flag
      allows the firmware to assume that the the specified actions are the last
      actions list. This assumption allows the FW to safely perform potential
      looping logic (e.g. hairpin). Introduce the bits for this attribute.
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c6d4e45d
    • Moshe Shemesh's avatar
      net/mlx5: Add core dump register access HW bits · 0b9055a1
      Moshe Shemesh authored
      Add Firmware core dump registers and HW definitions.
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0b9055a1
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b4b12b0d
      David S. Miller authored
      The phylink conflict was between a bug fix by Russell King
      to make sure we have a consistent PHY interface mode, and
      a change in net-next to pull some code in phylink_resolve()
      into the helper functions phylink_mac_link_{up,down}()
      
      On the dp83867 side it's mostly overlapping changes, with
      the 'net' side removing a condition that was supposed to
      trigger for RGMII but because of how it was coded never
      actually could trigger.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4b12b0d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y · c9bb6165
      Pablo Neira Ayuso authored
      This patch fixes a few problems with CONFIG_IPV6=y and
      CONFIG_NF_CONNTRACK_BRIDGE=m:
      
      In file included from net/netfilter/utils.c:5:
      include/linux/netfilter_ipv6.h: In function 'nf_ipv6_br_defrag':
      include/linux/netfilter_ipv6.h:110:9: error: implicit declaration of function 'nf_ct_frag6_gather'; did you mean 'nf_ct_attach'? [-Werror=implicit-function-declaration]
      
      And these too:
      
      net/ipv6/netfilter.c:242:2: error: unknown field 'br_defrag' specified in initializer
      net/ipv6/netfilter.c:243:2: error: unknown field 'br_fragment' specified in initializer
      
      This patch includes an original chunk from wenxu.
      
      Fixes: 764dd163 ("netfilter: nf_conntrack_bridge: add support for IPv6")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reported-by: default avatarYuehaibing <yuehaibing@huawei.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9bb6165
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 036e3431
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix OOPS during nf_tables rule dump, from Florian Westphal.
      
       2) Use after free in ip_vs_in, from Yue Haibing.
      
       3) Fix various kTLS bugs (NULL deref during device removal resync,
          netdev notification ignoring, etc.) From Jakub Kicinski.
      
       4) Fix ipv6 redirects with VRF, from David Ahern.
      
       5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet.
      
       6) Missing memory allocation failure check in ip6_ra_control(), from
          Gen Zhang. And likewise fix ip_ra_control().
      
       7) TX clean budget logic error in aquantia, from Igor Russkikh.
      
       8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet.
      
       9) Double frees in mlx5, from Parav Pandit.
      
      10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit.
      
      11) Fix botched register access in mvpp2, from Antoine Tenart.
      
      12) Use after free in napi_gro_frags(), from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits)
        net: correct zerocopy refcnt with udp MSG_MORE
        ethtool: Check for vlan etype or vlan tci when parsing flow_rule
        net: don't clear sock->sk early to avoid trouble in strparser
        net-gro: fix use-after-free read in napi_gro_frags()
        net: dsa: tag_8021q: Create a stable binary format
        net: dsa: tag_8021q: Change order of rx_vid setup
        net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
        ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
        mlxsw: spectrum: Prevent force of 56G
        mlxsw: spectrum_acl: Avoid warning after identical rules insertion
        net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
        r8169: fix MAC address being lost in PCI D3
        net: core: support XDP generic on stacked devices.
        netvsc: unshare skb in VF rx handler
        udp: Avoid post-GRO UDP checksum recalculation
        net: phy: dp83867: Set up RGMII TX delay
        net: phy: dp83867: do not call config_init twice
        net: phy: dp83867: increase SGMII autoneg timer duration
        net: phy: dp83867: fix speed 10 in sgmii mode
        net: phy: marvell10g: report if the PHY fails to boot firmware
        ...
      036e3431
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · adc3f554
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The fixes are still trickling in for arm64, but the only really
        significant one here is actually fixing a regression in the botched
        module relocation range checking merged for -rc2.
      
        Hopefully we've nailed it this time.
      
         - Fix implementation of our set_personality() system call, which
           wasn't being wrapped properly
      
         - Fix system call function types to keep CFI happy
      
         - Fix siginfo layout when delivering SIGKILL after a kernel fault
      
         - Really fix module relocation range checking"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: use the correct function type for __arm64_sys_ni_syscall
        arm64: use the correct function type in SYSCALL_DEFINE0
        arm64: fix syscall_fn_t type
        signal/arm64: Use force_sig not force_sig_fault for SIGKILL
        arm64/module: revert to unsigned interpretation of ABS16/32 relocations
        arm64: Fix the arm64_personality() syscall wrapper redirection
      adc3f554
    • Linus Torvalds's avatar
      Merge tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 318adf8e
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few more fixes for bugs reported by users, fuzzing tools and
        regressions:
      
         - fix crashes in relocation:
             + resuming interrupted balance operation does not properly clean
               up orphan trees
             + with enabled qgroups, resuming needs to be more careful about
               block groups due to limited context when updating qgroups
      
         - fsync and logging fixes found by fuzzing
      
         - incremental send fixes for no-holes and clone
      
         - fix spin lock type used in timer function for zstd"
      
      * tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix race updating log root item during fsync
        Btrfs: fix wrong ctime and mtime of a directory after log replay
        Btrfs: fix fsync not persisting changed attributes of a directory
        btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference
        btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON()
        Btrfs: incremental send, fix emission of invalid clone operations
        Btrfs: incremental send, fix file corruption when no-holes feature is enabled
        btrfs: correct zstd workspace manager lock to use spin_lock_bh()
        btrfs: Ensure replaced device doesn't have pending chunk allocation
      318adf8e
    • Linus Torvalds's avatar
      Merge tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs · 8cb7104d
      Linus Torvalds authored
      Pull configs fix from Christoph Hellwig:
      
       - fix a use after free in configfs_d_iput (Sahitya Tummala)
      
      * tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs:
        configfs: Fix use-after-free when accessing sd->s_dentry
      8cb7104d
    • Linus Torvalds's avatar
      Merge tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c5ba1712
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "No big surprises here, just a few device-specific fixes.
      
        HD-audio received several fixes for Acer, Dell, Huawei and other
        laptops as well as the workaround for the new Intel chipset. One
        significant one-liner fix is the disablement of the node-power saving
        on Realtek codecs, which may potentially cover annoying bugs like the
        background noises or click noises on many devices.
      
        Other than that, a fix for FireWire bit definitions, and another fix
        for LINE6 USB audio bug that was discovered by syzkaller"
      
      * tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: fireface: Use ULL suffixes for 64-bit constants
        ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops
        ALSA: line6: Assure canceling delayed work at disconnection
        ALSA: hda - Force polling mode on CNL for fixing codec communication
        ALSA: hda/realtek - Enable micmute LED for Huawei laptops
        ALSA: hda/realtek - Set default power save node to 0
        ALSA: hda/realtek - Check headset type by unplug and resume
      c5ba1712
  2. 30 May, 2019 18 commits
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 20f94496
      Linus Torvalds authored
      Pull clk driver fixes from Stephen Boyd:
      
       - Don't expose the SiFive clk driver on non-RISCV architectures
      
       - Fix some bits describing clks in the imx8mm driver
      
       - Always call clk domain code in the TI driver so non-legacy platforms
         work
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: ti: clkctrl: Fix clkdm_clk handling
        clk: imx: imx8mm: fix int pll clk gate
        clk: sifive: restrict Kconfig scope for the FU540 PRCI driver
      20f94496
    • Willem de Bruijn's avatar
      net: correct zerocopy refcnt with udp MSG_MORE · 100f6d8e
      Willem de Bruijn authored
      TCP zerocopy takes a uarg reference for every skb, plus one for the
      tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero
      as it builds, sends and frees skbs inside its inner loop.
      
      UDP and RAW zerocopy do not send inside the inner loop so do not need
      the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit
      52900d22288ed ("udp: elide zerocopy operation in hot path") introduced
      extra_uref to pass the initial reference taken in sock_zerocopy_alloc
      to the first generated skb.
      
      But, sock_zerocopy_realloc takes this extra reference at the start of
      every call. With MSG_MORE, no new skb may be generated to attach the
      extra_uref to, so refcnt is incorrectly 2 with only one skb.
      
      Do not take the extra ref if uarg && !tcp, which implies MSG_MORE.
      Update extra_uref accordingly.
      
      This conditional assignment triggers a false positive may be used
      uninitialized warning, so have to initialize extra_uref at define.
      
      Changes v1->v2: fix typo in Fixes SHA1
      
      Fixes: 52900d22 ("udp: elide zerocopy operation in hot path")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      100f6d8e
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7b3ed2a1
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-05-30
      
      This series contains updates to ice driver only.
      
      Brett continues his work with interrupt handling by fixing an issue
      where were writing to the incorrect register to disable all VF
      interrupts.
      
      Tony consolidates the unicast and multicast MAC filters into a single
      new function.
      
      Anirudh adds support for virtual channel vector mapping to receive and
      transmit queues.  This uses a bitmap to associate indicated queues with
      the specified vector.  Makes several cosmetic code cleanups, as well as
      update the driver to align with the current specification for managing
      MAC operation codes (opcodes).
      
      Paul adds support for Forward Error Correction (FEC) and also adds the
      ethtool get and set handlers to modify FEC parameters.
      
      Bruce cleans up the driver code to fix a number of issues, such as,
      reducing the scope of some local variables, reduce the number of
      de-references by changing a local variable and reorder the code to
      remove unnecessary "goto's".
      
      Dave adds switch rules to be able to handle LLDP packets and in the
      process, fix a couple of issues found, like stop treating DCBx state of
      "not started" as an error and stop hard coding the filter information
      flag to transmit.
      
      Jacob updates the driver to allow for more granular debugging by
      developers by using a distinct separate bit for dumping firmware logs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b3ed2a1
    • Kevin 'ldir' Darbyshire-Bryant's avatar
      net: sched: act_ctinfo: minor size optimisation · 84a32ede
      Kevin 'ldir' Darbyshire-Bryant authored
      Since the new parameter block is initialised to 0 by kzmalloc we don't
      need to mask & clear unused operational mode bits, they are already
      unset.
      
      Drop the pointless code.
      Signed-off-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84a32ede
    • Maxime Chevallier's avatar
      ethtool: Check for vlan etype or vlan tci when parsing flow_rule · b73484b2
      Maxime Chevallier authored
      When parsing an ethtool flow spec to build a flow_rule, the code checks
      if both the vlan etype and the vlan tci are specified by the user to add
      a FLOW_DISSECTOR_KEY_VLAN match.
      
      However, when the user only specified a vlan etype or a vlan tci, this
      check silently ignores these parameters.
      
      For example, the following rule :
      
      ethtool -N eth0 flow-type udp4 vlan 0x0010 action -1 loc 0
      
      will result in no error being issued, but the equivalent rule will be
      created and passed to the NIC driver :
      
      ethtool -N eth0 flow-type udp4 action -1 loc 0
      
      In the end, neither the NIC driver using the rule nor the end user have
      a way to know that these keys were dropped along the way, or that
      incorrect parameters were entered.
      
      This kind of check should be left to either the driver, or the ethtool
      flow spec layer.
      
      This commit makes so that ethtool parameters are forwarded as-is to the
      NIC driver.
      
      Since none of the users of ethtool_rx_flow_rule_create are using the
      VLAN dissector, I don't think this qualifies as a regression.
      
      Fixes: eca4205f ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@gnumonks.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b73484b2
    • David S. Miller's avatar
      Merge branch 'complex-c45-phys' · 655887fe
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: improve handling of more complex C45 PHY's
      
      This series tries to address few problematic aspects raised by
      Russell. Concrete example is the Marvell 88x3310, the changes
      should be helpful for other complex C45 PHY's too.
      
      v2:
      - added patch enabling interrupts also if phylib state machine
        isn't started
      - removed patch dealing with the double link status read
        This one needs little bit more thinking and will go separately.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      655887fe
    • Heiner Kallweit's avatar
      net: phy: export phy_queue_state_machine · 97b33bdf
      Heiner Kallweit authored
      We face the issue that link change interrupt and link status may be
      reported by different PHY layers. As a result the link change
      interrupt may occur before the link status changes.
      Export phy_queue_state_machine to allow PHY drivers to specify a
      delay between link status change interrupt and link status check.
      
      v2:
      - change jiffies parameter type to unsigned long
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Suggested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97b33bdf
    • Heiner Kallweit's avatar
      net: phy: add callback for custom interrupt handler to struct phy_driver · 49644e68
      Heiner Kallweit authored
      The phylib interrupt handler handles link change events only currently.
      However PHY drivers may want to use other interrupt sources too,
      e.g. to report temperature monitoring events. Therefore add a callback
      to struct phy_driver allowing PHY drivers to implement a custom
      interrupt handler.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Suggested-by: default avatarRussell King - ARM Linux admin <linux@armlinux.org.uk>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49644e68
    • Heiner Kallweit's avatar
      net: phy: enable interrupts when PHY is attached already · 07b09289
      Heiner Kallweit authored
      This patch is a step towards allowing PHY drivers to handle more
      interrupt sources than just link change. E.g. several PHY's have
      built-in temperature monitoring and can raise an interrupt if a
      temperature threshold is exceeded. We may be interested in such
      interrupts also if the phylib state machine isn't started.
      Therefore move enabling interrupts to phy_request_interrupt().
      
      v2:
      - patch added to series
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07b09289
    • Michal Kalderon's avatar
      qed: Fix static checker warning · 8e2ea3ea
      Michal Kalderon authored
      In some cases abs_ppfid could be printed without being initialized.
      
      Fixes: 79284ade ("qed: Add llh ppfid interface and 100g support for offload protocols")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2ea3ea
    • Ioana Ciornei's avatar
      net: dsa: Add error path handling in dsa_tree_setup() · e70c7aad
      Ioana Ciornei authored
      In case a call to dsa_tree_setup() fails, an attempt to cleanup is made
      by calling dsa_tree_remove_switch(), which should take care of
      removing/unregistering any resources previously allocated. This does not
      happen because it is conditioned by dst->setup being true, which is set
      only after _all_ setup steps were performed successfully.
      
      This is especially interesting when the internal MDIO bus is registered
      but afterwards, a port setup fails and the mdiobus_unregister() is never
      called. This leads to a BUG_ON() complaining about the fact that it's
      trying to free an MDIO bus that's still registered.
      
      Add proper error handling in all functions branching from
      dsa_tree_setup().
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e70c7aad
    • Jakub Kicinski's avatar
      net: don't clear sock->sk early to avoid trouble in strparser · 2b81f816
      Jakub Kicinski authored
      af_inet sets sock->sk to NULL which trips strparser over:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000012
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 #21
      RIP: 0010:tcp_peek_len+0x10/0x60
      RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051
      RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480
      RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000
      R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40
      R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020
      FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
       <IRQ>
       strp_data_ready+0x48/0x90
       tls_data_ready+0x22/0xd0 [tls]
       tcp_rcv_established+0x569/0x620
       tcp_v4_do_rcv+0x127/0x1e0
       tcp_v4_rcv+0xad7/0xbf0
       ip_protocol_deliver_rcu+0x2c/0x1c0
       ip_local_deliver_finish+0x41/0x50
       ip_local_deliver+0x6b/0xe0
       ? ip_protocol_deliver_rcu+0x1c0/0x1c0
       ip_rcv+0x52/0xd0
       ? ip_rcv_finish_core.isra.20+0x380/0x380
       __netif_receive_skb_one_core+0x7e/0x90
       netif_receive_skb_internal+0x42/0xf0
       napi_gro_receive+0xed/0x150
       nfp_net_poll+0x7a2/0xd30 [nfp]
       ? kmem_cache_free_bulk+0x286/0x310
       net_rx_action+0x149/0x3b0
       __do_softirq+0xe3/0x30a
       ? handle_irq_event_percpu+0x6a/0x80
       irq_exit+0xe8/0xf0
       do_IRQ+0x85/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      RIP: 0010:cpuidle_enter_state+0xbc/0x450
      
      To avoid this issue set sock->sk after sk_prot->close.
      My grepping and testing did not discover any code which
      would depend on the current behaviour.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Reported-by: default avatarDavid Beckett <david.beckett@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b81f816
    • Eric Dumazet's avatar
      net-gro: fix use-after-free read in napi_gro_frags() · a4270d67
      Eric Dumazet authored
      If a network driver provides to napi_gro_frags() an
      skb with a page fragment of exactly 14 bytes, the call
      to gro_pull_from_frag0() will 'consume' the fragment
      by calling skb_frag_unref(skb, 0), and the page might
      be freed and reused.
      
      Reading eth->h_proto at the end of napi_frags_skb() might
      read mangled data, or crash under specific debugging features.
      
      BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
      BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
      Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
      
      CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
       napi_frags_skb net/core/dev.c:5833 [inline]
       napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
       tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
      
      Fixes: a50e233c ("net-gro: restore frag0 optimization")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4270d67
    • David S. Miller's avatar
      Merge branch 'Fixes-for-DSA-tagging-using-802-1Q' · c3bc6deb
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fixes for DSA tagging using 802.1Q
      
      During the prototyping for the "Decoupling PHYLINK from struct
      net_device" patchset, the CPU port of the sja1105 driver was moved to a
      different spot.  This uncovered an issue in the tag_8021q DSA code,
      which used to work by mistake - the CPU port was the last hardware port
      numerically, and this was masking an ordering issue which is very likely
      to be seen in other drivers that make use of 802.1Q tags.
      
      A question was also raised whether the VID numbers bear any meaning, and
      the conclusion was that they don't, at least not in an absolute sense.
      The second patch defines bit fields inside the DSA 802.1Q VID so that
      tcpdump can decode it unambiguously (although the meaning is now clear
      even by visual inspection).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3bc6deb
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: Create a stable binary format · 0471dd42
      Vladimir Oltean authored
      Tools like tcpdump need to be able to decode the significance of fake
      VLAN headers that DSA uses to separate switch ports.
      
      But currently these have no global significance - they are simply an
      ordered list of DSA_MAX_SWITCHES x DSA_MAX_PORTS numbers ending at 4095.
      
      The reason why this is submitted as a fix is that the existing mapping
      of VIDs should not enter into a stable kernel, so we can pretend that
      only the new format exists. This way tcpdump won't need to try to make
      something out of the VLAN tags on 5.2 kernels.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0471dd42
    • Ioana Ciornei's avatar
      net: dsa: tag_8021q: Change order of rx_vid setup · d34d2baa
      Ioana Ciornei authored
      The 802.1Q tagging performs an unbalanced setup in terms of RX VIDs on
      the CPU port. For the ingress path of a 802.1Q switch to work, the RX
      VID of a port needs to be seen as tagged egress on the CPU port.
      
      While configuring the other front-panel ports to be part of this VID,
      for bridge scenarios, the untagged flag is applied even on the CPU port
      in dsa_switch_vlan_add.  This happens because DSA applies the same flags
      on the CPU port as on the (bridge-controlled) slave ports, and the
      effect in this case is that the CPU port tagged settings get deleted.
      
      Instead of fixing DSA by introducing a way to control VLAN flags on the
      CPU port (and hence stop inheriting from the slave ports) - a hard,
      perhaps intractable problem - avoid this situation by moving the setup
      part of the RX VID on the CPU port after all the other front-panel ports
      have been added to the VID.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d34d2baa
    • David S. Miller's avatar
      Merge branch 'r8169-fw' · 1b0b807d
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      r8169: decouple firmware handling code from actual driver code
      
      These two patches are a step towards eventually factoring out firmware
      handling code to a separate source file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b0b807d
    • Heiner Kallweit's avatar
      r8169: decouple rtl_phy_write_fw from actual driver code · ce8843ab
      Heiner Kallweit authored
      This patch is a further step towards decoupling firmware handling from
      the actual driver code. Firmware can be for PHY and/or MAC, and two
      pairs of read/write functions are needed for handling PHY firmware and
      MAC firmware respectively. Pass these functions via struct rtl_fw and
      avoid the ugly switching of mdio_ops behind the back of rtl_writephy().
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce8843ab