1. 04 Mar, 2019 40 commits
    • Ido Schimmel's avatar
      team: Free BPF filter when unregistering netdev · 692c31bd
      Ido Schimmel authored
      When team is used in loadbalance mode a BPF filter can be used to
      provide a hash which will determine the Tx port.
      
      When the netdev is later unregistered the filter is not freed which
      results in memory leaks [1].
      
      Fix by freeing the program and the corresponding filter when
      unregistering the netdev.
      
      [1]
      unreferenced object 0xffff8881dbc47cc8 (size 16):
        comm "teamd", pid 3068, jiffies 4294997779 (age 438.247s)
        hex dump (first 16 bytes):
          a3 00 6b 6b 6b 6b 6b 6b 88 a5 82 e1 81 88 ff ff  ..kkkkkk........
        backtrace:
          [<000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0
          [<00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080
          [<00000000610ef838>] genl_rcv_msg+0xca/0x170
          [<00000000a281df93>] netlink_rcv_skb+0x132/0x380
          [<000000004d9448a2>] genl_rcv+0x29/0x40
          [<000000000321b2f4>] netlink_unicast+0x4c0/0x690
          [<000000008c25dffb>] netlink_sendmsg+0x929/0xe10
          [<00000000068298c5>] sock_sendmsg+0xc8/0x110
          [<0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0
          [<00000000663ae29d>] __sys_sendmsg+0xf7/0x250
          [<0000000027c5f11a>] do_syscall_64+0x14d/0x610
          [<000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<00000000e23197e2>] 0xffffffffffffffff
      unreferenced object 0xffff8881e182a588 (size 2048):
        comm "teamd", pid 3068, jiffies 4294997780 (age 438.247s)
        hex dump (first 32 bytes):
          20 00 00 00 02 00 00 00 30 00 00 00 28 f0 ff ff   .......0...(...
          07 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00  ........(.......
        backtrace:
          [<000000002daf01fb>] lb_bpf_func_set+0x45c/0x6d0
          [<000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0
          [<00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080
          [<00000000610ef838>] genl_rcv_msg+0xca/0x170
          [<00000000a281df93>] netlink_rcv_skb+0x132/0x380
          [<000000004d9448a2>] genl_rcv+0x29/0x40
          [<000000000321b2f4>] netlink_unicast+0x4c0/0x690
          [<000000008c25dffb>] netlink_sendmsg+0x929/0xe10
          [<00000000068298c5>] sock_sendmsg+0xc8/0x110
          [<0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0
          [<00000000663ae29d>] __sys_sendmsg+0xf7/0x250
          [<0000000027c5f11a>] do_syscall_64+0x14d/0x610
          [<000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<00000000e23197e2>] 0xffffffffffffffff
      
      Fixes: 01d7f30a ("team: add loadbalance mode")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      692c31bd
    • Ido Schimmel's avatar
      ip6mr: Do not call __IP6_INC_STATS() from preemptible context · 87c11f1d
      Ido Schimmel authored
      Similar to commit 44f49dd8 ("ipmr: fix possible race resulting from
      improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
      assume preemption is disabled when incrementing the counter and
      accessing a per-CPU variable.
      
      Preemption can be enabled when we add a route in process context that
      corresponds to packets stored in the unresolved queue, which are then
      forwarded using this route [1].
      
      Fix this by using IP6_INC_STATS() which takes care of disabling
      preemption on architectures where it is needed.
      
      [1]
      [  157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
      [  157.460409] caller is ip6mr_forward2+0x73e/0x10e0
      [  157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
      [  157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  157.460461] Call Trace:
      [  157.460486]  dump_stack+0xf9/0x1be
      [  157.460553]  check_preemption_disabled+0x1d6/0x200
      [  157.460576]  ip6mr_forward2+0x73e/0x10e0
      [  157.460705]  ip6_mr_forward+0x9a0/0x1510
      [  157.460771]  ip6mr_mfc_add+0x16b3/0x1e00
      [  157.461155]  ip6_mroute_setsockopt+0x3cb/0x13c0
      [  157.461384]  do_ipv6_setsockopt.isra.8+0x348/0x4060
      [  157.462013]  ipv6_setsockopt+0x90/0x110
      [  157.462036]  rawv6_setsockopt+0x4a/0x120
      [  157.462058]  __sys_setsockopt+0x16b/0x340
      [  157.462198]  __x64_sys_setsockopt+0xbf/0x160
      [  157.462220]  do_syscall_64+0x14d/0x610
      [  157.462349]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 0912ea38 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarAmit Cohen <amitc@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87c11f1d
    • Aditya Pakki's avatar
      isdn: mISDN: Fix potential NULL pointer dereference of kzalloc · 38d22659
      Aditya Pakki authored
      Allocating memory via kzalloc for phi may fail and causes a
      NULL pointer dereference. This patch avoids such a scenario.
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38d22659
    • Heiner Kallweit's avatar
      net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs · 72d8b4fd
      Heiner Kallweit authored
      If an external PHY is connected via SGMII and uses in-band signalling
      then the auto-negotiated values aren't propagated to the port,
      resulting in a broken link. See discussion in [0]. This patch adds
      this propagation. We need to call mv88e6xxx_port_setup_mac(),
      therefore export it from chip.c.
      
      Successfully tested on a ZII DTU with 88E6390 switch and an
      Aquantia AQCS109 PHY connected via SGMII to port 9.
      
      [0] https://marc.info/?t=155130287200001&r=1&w=2Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72d8b4fd
    • Arjun Vynipadath's avatar
      cxgb4/chtls: Prefix adapter flags with CXGB4 · 80f61f19
      Arjun Vynipadath authored
      Some of these macros were conflicting with global namespace,
      hence prefixing them with CXGB4.
      Signed-off-by: default avatarArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: default avatarVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80f61f19
    • Andy Shevchenko's avatar
      net-sysfs: Switch to bitmap_zalloc() · 29ca1c5a
      Andy Shevchenko authored
      Switch to bitmap_zalloc() to show clearly what we are allocating.
      Besides that it returns pointer of bitmap type instead of opaque void *.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29ca1c5a
    • Andy Shevchenko's avatar
      mellanox: Switch to bitmap_zalloc() · 214fa1c4
      Andy Shevchenko authored
      Switch to bitmap_zalloc() to show clearly what we are allocating.
      Besides that it returns pointer of bitmap type instead of opaque void *.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      214fa1c4
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · f7fb7c1a
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-03-04
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
         AF_XDP applications by offering higher-level APIs that hide many
         of the details of the AF_XDP uapi. Sample programs are converted
         over to this new interface as well, from Magnus.
      
      2) Introduce a new cant_sleep() macro for annotation of functions
         that cannot sleep and use it in BPF_PROG_RUN() to assert that
         BPF programs run under preemption disabled context, from Peter.
      
      3) Introduce per BPF prog stats in order to monitor the usage
         of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
         knob where monitoring tools can make use of this to efficiently
         determine the average cost of programs, from Alexei.
      
      4) Split up BPF selftest's test_progs similarly as we already
         did with test_verifier. This allows to further reduce merge
         conflicts in future and to get more structure into our
         quickly growing BPF selftest suite, from Stanislav.
      
      5) Fix a bug in BTF's dedup algorithm which can cause an infinite
         loop in some circumstances; also various BPF doc fixes and
         improvements, from Andrii.
      
      6) Various BPF sample cleanups and migration to libbpf in order
         to further isolate the old sample loader code (so we can get
         rid of it at some point), from Jakub.
      
      7) Add a new BPF helper for BPF cgroup skb progs that allows
         to set ECN CE code point and a Host Bandwidth Manager (HBM)
         sample program for limiting the bandwidth used by v2 cgroups,
         from Lawrence.
      
      8) Enable write access to skb->queue_mapping from tc BPF egress
         programs in order to let BPF pick TX queue, from Jesper.
      
      9) Fix a bug in BPF spinlock handling for map-in-map which did
         not propagate spin_lock_off to the meta map, from Yonghong.
      
      10) Fix a bug in the new per-CPU BPF prog counters to properly
          initialize stats for each CPU, from Eric.
      
      11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
          from Willem.
      
      12) Fix various BPF samples bugs in XDP and tracing progs,
          from Toke, Daniel and Yonghong.
      
      13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
          enforces it now everywhere, from Anders.
      
      14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
          get error handling working, from Dan.
      
      15) Fix bpftool documentation and auto-completion with regards
          to stream_{verdict,parser} attach types, from Alban.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7fb7c1a
    • Daniel Borkmann's avatar
      bpf: add test cases for non-pointer sanitiation logic · 87dab7c3
      Daniel Borkmann authored
      Add two additional tests for further asserting the
      BPF_ALU_NON_POINTER logic with cases that were missed
      previously.
      
      Cc: Marek Majkowski <marek@cloudflare.com>
      Cc: Arthur Fabre <afabre@cloudflare.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      87dab7c3
    • David S. Miller's avatar
      Merge branch 'mlxsw-minimal-Add-ethtool-and-resource-query-support' · 8c4238df
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: minimal: Add ethtool and resource query support
      
      Vadim says:
      
      The minimal driver is chip independent and uses I2C bus for chip access.
      Its purpose is to support chassis management on systems equipped with
      Mellanox switch ASICs. For example, from a BMC (Board Management
      Controller) device.
      
      Patches #1-#3 add ethtool support to the minimal driver so that QSFP/SFP
      module info could be retrieved by the driver. This is done by exposing a
      dummy netdev for each front panel port and implementing the required
      ethtool operations.
      
      Patches #4-#8 add resource query support. This allows the driver to
      query the firmware about values of certain resources (e.g., maximum
      number of ports). It is required on systems where the maximum number of
      ports is larger than the hard coded default (64).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c4238df
    • Vadim Pasternak's avatar
      mlxsw: i2c: Extend initialization by querying resources data · 6a986993
      Vadim Pasternak authored
      Extend initialization flow by query requests for chip resources data in
      order to obtain chip's specific capabilities, like the number of ports.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a986993
    • Vadim Pasternak's avatar
      mlxsw: i2c: Extend input parameters list of command API · 95b75cbd
      Vadim Pasternak authored
      Extend input parameters list of command API in mlxsw_i2c_cmd() in order
      to support initialization commands. Up until now, only access commands
      were supported by I2C driver.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95b75cbd
    • Vadim Pasternak's avatar
      mlxsw: i2c: Modify input parameter name in initialization API · f43d9d9b
      Vadim Pasternak authored
      Change input parameter name "resource" to "res" in mlxsw_i2c_init() in
      order to align it with mlxsw_pci_init().
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f43d9d9b
    • Vadim Pasternak's avatar
      mlxsw: i2c: Fix comment misspelling · 27758c80
      Vadim Pasternak authored
      Fix comment for mlxsw_i2c_write_cmd().
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27758c80
    • Vadim Pasternak's avatar
      mlxsw: core: Move resource query API to common location · e5ba7803
      Vadim Pasternak authored
      Move mlxsw_pci_resources_query() to a common location to allow reuse by
      the different drivers and over all the supported physical buses. Rename
      it to mlxsw_core_resources_query().
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5ba7803
    • Vadim Pasternak's avatar
      mlxsw: minimal: Add ethtool support · c100e47c
      Vadim Pasternak authored
      The minimal driver is chip independent and uses I2C bus for chip access.
      Its purpose is to support chassis management on systems equipped with
      Mellanox switch ASICs. For example from BMC (Board Management
      Controller) device.
      
      Expose a dummy netdev for each front panel port and implement basic
      ethtool operations to obtain QSFP/SFP module info through ethtool.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c100e47c
    • Vadim Pasternak's avatar
      mlxsw: minimal: Make structures and variables names shorter · 1ded391d
      Vadim Pasternak authored
      Replace "mlxsw_minimal" by "mlxsw_m" in order to improve code
      readability.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ded391d
    • Vadim Pasternak's avatar
      mlxsw: core: Move ethtool module callbacks to a common location · 1b1c6c1a
      Vadim Pasternak authored
      Move the implementation of ethtool module callbacks - .get_module_info()
      and .get_module_eeprom() - to a common location to allow reuse by the
      different mlxsw drivers.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b1c6c1a
    • David S. Miller's avatar
      Merge branch 'tls-Fix-issues-in-tls_device' · a9836336
      David S. Miller authored
      Boris Pismenny says:
      
      ====================
      tls: Fix issues in tls_device
      
      This series fixes issues encountered in tls_device code paths,
      which were introduced recently.
      
      Additionally, this series includes a fix for tls software only receive flow,
      which causes corruption of payload received by user space applications.
      
      This series was tested using the OpenSSL integration of KTLS -
      https://github.com/mellan
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9836336
    • Boris Pismenny's avatar
      tls: Fix tls_device receive · d069b780
      Boris Pismenny authored
      Currently, the receive function fails to handle records already
      decrypted by the device due to the commit mentioned below.
      
      This commit advances the TLS record sequence number and prepares the context
      to handle the next record.
      
      Fixes: fedf201e ("net: tls: Refactor control message handling on recv")
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d069b780
    • Eran Ben Elisha's avatar
      tls: Fix mixing between async capable and async · 7754bd63
      Eran Ben Elisha authored
      Today, tls_sw_recvmsg is capable of using asynchronous mode to handle
      application data TLS records. Moreover, it assumes that if the cipher
      can be handled asynchronously, then all packets will be processed
      asynchronously.
      
      However, this assumption is not always true. Specifically, for AES-GCM
      in TLS1.2, it causes data corruption, and breaks user applications.
      
      This patch fixes this problem by separating the async capability from
      the decryption operation result.
      
      Fixes: c0ab4732 ("net/tls: Do not use async crypto for non-data records")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7754bd63
    • Boris Pismenny's avatar
      tls: Fix write space handling · 7463d3a2
      Boris Pismenny authored
      TLS device cannot use the sw context. This patch returns the original
      tls device write space handler and moves the sw/device specific portions
      to the relevant files.
      
      Also, we remove the write_space call for the tls_sw flow, because it
      handles partial records in its delayed tx work handler.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7463d3a2
    • Boris Pismenny's avatar
      tls: Fix tls_device handling of partial records · 94850257
      Boris Pismenny authored
      Cleanup the handling of partial records while fixing a bug where the
      tls_push_pending_closed_record function is using the software tls
      context instead of the hardware context.
      
      The bug resulted in the following crash:
      [   88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [   88.793271] #PF error: [normal kernel read fault]
      [   88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
      [   88.795958] Oops: 0000 [#1] SMP PTI
      [   88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
      [   88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
      [   88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
      4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [   88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
      [   88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
      [   88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
      [   88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
      [   88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
      [   88.814506] FS:  00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
      [   88.816337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
      [   88.819328] Call Trace:
      [   88.820123]  tls_push_data+0x628/0x6a0 [tls]
      [   88.821283]  ? remove_wait_queue+0x20/0x60
      [   88.822383]  ? n_tty_read+0x683/0x910
      [   88.823363]  tls_device_sendmsg+0x53/0xa0 [tls]
      [   88.824505]  sock_sendmsg+0x36/0x50
      [   88.825492]  sock_write_iter+0x87/0x100
      [   88.826521]  __vfs_write+0x127/0x1b0
      [   88.827499]  vfs_write+0xad/0x1b0
      [   88.828454]  ksys_write+0x52/0xc0
      [   88.829378]  do_syscall_64+0x5b/0x180
      [   88.830369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   88.831603] RIP: 0033:0x7fcad1451680
      
      [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [ 1248.472564] #PF error: [normal kernel read fault]
      [ 1248.473790] PGD 0 P4D 0
      [ 1248.474642] Oops: 0000 [#1] SMP PTI
      [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G           OE 5.0.0-rc4+ #3
      [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
      [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
      1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
      [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
      [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
      [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
      [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
      [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
      [ 1248.494923] FS:  0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
      [ 1248.496847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
      [ 1248.500136] Call Trace:
      [ 1248.500998]  ? tcp_check_oom+0xd0/0xd0
      [ 1248.502106]  tls_sk_proto_close+0x127/0x1e0 [tls]
      [ 1248.503411]  inet_release+0x3c/0x60
      [ 1248.504530]  __sock_release+0x3d/0xb0
      [ 1248.505611]  sock_close+0x11/0x20
      [ 1248.506612]  __fput+0xb4/0x220
      [ 1248.507559]  task_work_run+0x88/0xa0
      [ 1248.508617]  do_exit+0x2cb/0xbc0
      [ 1248.509597]  ? core_sys_select+0x17a/0x280
      [ 1248.510740]  do_group_exit+0x39/0xb0
      [ 1248.511789]  get_signal+0x1d0/0x630
      [ 1248.512823]  do_signal+0x36/0x620
      [ 1248.513822]  exit_to_usermode_loop+0x5c/0xc6
      [ 1248.515003]  do_syscall_64+0x157/0x180
      [ 1248.516094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1248.517456] RIP: 0033:0x7fb398bd3f53
      [ 1248.518537] Code: Bad RIP value.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94850257
    • David S. Miller's avatar
      Merge branch 'net-phy-clean-up-the-old-gen10g-functions' · 7d827379
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: clean up the old gen10g functions
      
      The old gen10g_ functions are mainly stubs and have been superseded
      by genphy_c45_ equivalents. So lets remove / hide the old functions
      as far as possible.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d827379
    • Heiner Kallweit's avatar
      net: phy: remove gen10g_no_soft_reset · 7be3ad84
      Heiner Kallweit authored
      genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops,
      one is enough.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7be3ad84
    • Heiner Kallweit's avatar
      net: phy: don't export gen10g_read_status · d81210c2
      Heiner Kallweit authored
      gen10g_read_status is deprecated, therefore stop exporting it.
      We don't want to encourage anybody to use it.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d81210c2
    • Heiner Kallweit's avatar
      net: phy: remove gen10g_config_init · c5e91d39
      Heiner Kallweit authored
      ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported
      and advertising bitmap because it's part of PHY_10GBIT_FEATURES.
      And all users of gen10g_config_init use PHY_10GBIT_FEATURES.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5e91d39
    • Heiner Kallweit's avatar
      net: phy: remove gen10g_suspend and gen10g_resume · a6d0aa97
      Heiner Kallweit authored
      phy_suspend() and phy_resume() are no-ops anyway if no callback is
      defined. Therefore we don't need these stubs.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6d0aa97
    • Heiner Kallweit's avatar
      net: phy: use genphy_c45_aneg_done in genphy_aneg_done · d7bed825
      Heiner Kallweit authored
      Now that we have it let's use genphy_c45_aneg_done() in phy_aneg_done().
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7bed825
    • Joe Perches's avatar
      fsl/fman: Use vsprintf extension %pM · 6bfc1128
      Joe Perches authored
      Make logging of an ethernet address more consistent with
      the rest of the kernel.
      
      Miscellanea:
      
      The %02hx use also did not quite match the u8 definition
      of addr though that did not actually matter given normal
      integer promotion rules.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bfc1128
    • Francesco Ruggeri's avatar
      net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE · 9036b2fe
      Francesco Ruggeri authored
      By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
      receive all IPv6 RA packets from all namespaces.
      IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
      the socket to be only from the socket's namespace.
      Signed-off-by: default avatarMaxim Martynov <maxim@arista.com>
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9036b2fe
    • Ben Dooks's avatar
      net: fixup address-space warnings in compat_mc_{get,set}sockopt() · 46d84110
      Ben Dooks authored
      Add __user attributes in some of the casts in this function to avoid
      the following sparse warnings:
      
      net/compat.c:592:57: warning: cast removes address space of expression
      net/compat.c:592:57: warning: incorrect type in initializer (different address spaces)
      net/compat.c:592:57:    expected struct compat_group_req [noderef] <asn:1>*gr32
      net/compat.c:592:57:    got void *<noident>
      net/compat.c:613:65: warning: cast removes address space of expression
      net/compat.c:613:65: warning: incorrect type in initializer (different address spaces)
      net/compat.c:613:65:    expected struct compat_group_source_req [noderef] <asn:1>*gsr32
      net/compat.c:613:65:    got void *<noident>
      net/compat.c:634:60: warning: cast removes address space of expression
      net/compat.c:634:60: warning: incorrect type in initializer (different address spaces)
      net/compat.c:634:60:    expected struct compat_group_filter [noderef] <asn:1>*gf32
      net/compat.c:634:60:    got void *<noident>
      net/compat.c:672:52: warning: cast removes address space of expression
      net/compat.c:672:52: warning: incorrect type in initializer (different address spaces)
      net/compat.c:672:52:    expected struct compat_group_filter [noderef] <asn:1>*gf32
      net/compat.c:672:52:    got void *<noident>
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46d84110
    • Florian Fainelli's avatar
      net: dsa: Use prepare/commit phase in dsa_slave_vlan_rx_add_vid() · d6af21a4
      Florian Fainelli authored
      We were skipping the prepare phase which causes some problems with at
      least a couple of drivers:
      
      - mv88e6xxx chooses to skip programming VID = 0 with -EOPNOTSUPP in
        the prepare phase, but we would still try to force this VID since we
        would only call the commit phase and so we would get the driver to
        return -EINVAL instead
      
      - qca8k does not currently have a port_vlan_add() callback implemented,
        yet we would try to call that unconditionally leading to a NPD
      
      Fix both issues by conforming to the current model doing a
      prepare/commit phase, this makes us consistent throughout the code and
      assumptions.
      Reported-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reported-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Fixes: 061f6a50 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6af21a4
    • David S. Miller's avatar
      Merge branch 'dpaa2-eth-add-XDP_REDIRECT-support' · a5f1512d
      David S. Miller authored
      Ioana Ciornei says:
      
      ====================
      dpaa2-eth: add XDP_REDIRECT support
      
      The first patch adds different software annotation types for Tx frames
      depending on frame type while the second one actually adds support for basic
      XDP_REDIRECT.
      
      Changes in v2:
        - add missing xdp_do_flush_map() call
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5f1512d
    • Ioana Radulescu's avatar
      dpaa2-eth: add XDP_REDIRECT support · d678be1d
      Ioana Radulescu authored
      Implement support for the XDP_REDIRECT action.
      
      The redirected frame is transmitted and confirmed on the regular Tx/Tx
      conf queues. Frame is marked with the "XDP" type in the software
      annotation, since it requires special treatment.
      
      We don't have good hardware support for TX batching, so the
      XDP_XMIT_FLUSH flag doesn't make a difference for now; ndo_xdp_xmit
      performs the actual Tx operation on the spot.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d678be1d
    • Ioana Radulescu's avatar
      dpaa2-eth: Add software annotation types · e3fdf6ba
      Ioana Radulescu authored
      We write different metadata information in the software annotation
      area of Tx frames, depending on frame type. Make this more explicit
      by introducing a type field and separate structures for single buffer
      and scatter-gather frames.
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3fdf6ba
    • David S. Miller's avatar
      Merge branch 'sched-Patches-from-out-of-tree-version-of-sch_cake' · 3cec12ce
      David S. Miller authored
      Toke Høiland-Jørgensen says:
      
      ====================
      sched: Patches from out-of-tree version of sch_cake
      
      This series includes a couple of patches with updates from the out-of-tree
      version of sch_cake. The first one is a fix to the fairness scheduling when
      dual-mode fairness is enabled. The second patch is an additional feature flag
      that allows using fwmark as a tin selector, as a convenience for people who want
      to customise tin selection. The third patch is just a cleanup to the tin
      selection logic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cec12ce
    • Toke Høiland-Jørgensen's avatar
      sch_cake: Simplify logic in cake_select_tin() · 4976e3c6
      Toke Høiland-Jørgensen authored
      With more modes added the logic in cake_select_tin() was getting a bit
      hairy, and it turns out we can actually simplify it quite a bit. This also
      allows us to get rid of one of the two diffserv parsing functions, which
      has the added benefit that already-zeroed DSCP fields won't get re-written.
      Suggested-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4976e3c6
    • Kevin Darbyshire-Bryant's avatar
      sch_cake: Permit use of connmarks as tin classifiers · 0b5c7efd
      Kevin Darbyshire-Bryant authored
      Add flag 'FWMARK' to enable use of firewall connmarks as tin selector.
      The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie.
      for diffserv3 the mark needs to be 1->3.
      
      Background
      
      Typically CAKE uses DSCP as the basis for tin selection.  DSCP values
      are relatively easily changed as part of the egress path, usually with
      iptables & the mangle table, ingress is more challenging.  CAKE is often
      used on the WAN interface of a residential gateway where passthrough of
      DSCP from the ISP is either missing or set to unhelpful values thus use
      of ingress DSCP values for tin selection isn't helpful in that
      environment.
      
      An approach to solving the ingress tin selection problem is to use
      CAKE's understanding of tc filters.  Naive tc filters could match on
      source/destination port numbers and force tin selection that way, but
      multiple filters don't scale particularly well as each filter must be
      traversed whether it matches or not. e.g. a simple example to map 3
      firewall marks to tins:
      
      MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
      tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1
      tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2
      tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3
      
      Another option is to use eBPF cls_act with tc filters e.g.
      
      MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
      tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o
      
      This has the disadvantages of a) needing someone to write & maintain
      the bpf program, b) a bpf toolchain to compile it and c) needing to
      hardcode the major number in the bpf program so it matches the cake
      instance (or forcing the cake instance to a particular major number)
      since the major number cannot be passed to the bpf program via tc
      command line.
      
      As already hinted at by the previous examples, it would be helpful
      to associate tins with something that survives the Internet path and
      ideally allows tin selection on both egress and ingress.  Netfilter's
      conntrack permits setting an identifying mark on a connection which
      can also be restored to an ingress packet with tc action connmark e.g.
      
      tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
      	match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1
      
      Since tc's connmark action has restored any connmark into skb->mark,
      any of the previous solutions are based upon it and in one form or
      another copy that mark to the skb->priority field where again CAKE
      picks this up.
      
      This change cuts out at least one of the (less intuitive &
      non-scalable) middlemen and permit direct access to skb->mark.
      Signed-off-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b5c7efd
    • George Amanakis's avatar
      sch_cake: Make the dual modes fairer · 71263992
      George Amanakis authored
      CAKE host fairness does not work well with TCP flows in dual-srchost and
      dual-dsthost setup. The reason is that ACKs generated by TCP flows are
      classified as sparse flows, and affect flow isolation from other hosts. Fix
      this by calculating host_load based only on the bulk flows a host
      generates. In a hash collision the host_bulk_flow_count values must be
      decremented on the old hosts and incremented on the new ones *if* the queue
      is in the bulk set.
      Reported-by: default avatarPete Heist <peteheist@gmail.com>
      Signed-off-by: default avatarGeorge Amanakis <gamanakis@gmail.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71263992