1. 31 Dec, 2019 19 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 31d518f3
      David S. Miller authored
      Simple overlapping changes in bpf land wrt. bpf_helper_defs.h
      handling.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31d518f3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 738d2902
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix big endian overflow in nf_flow_table, from Arnd Bergmann.
      
       2) Fix port selection on big endian in nft_tproxy, from Phil Sutter.
      
       3) Fix precision tracking for unbound scalars in bpf verifier, from
          Daniel Borkmann.
      
       4) Fix integer overflow in socket rcvbuf check in UDP, from Antonio
          Messina.
      
       5) Do not perform a neigh confirmation during a pmtu update over a
          tunnel, from Hangbin Liu.
      
       6) Fix DMA mapping leak in dpaa_eth driver, from Madalin Bucur.
      
       7) Various PTP fixes for sja1105 dsa driver, from Vladimir Oltean.
      
       8) Add missing to dummy definition of of_mdiobus_child_is_phy(), from
          Geert Uytterhoeven
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
        hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename()
        net/sched: add delete_empty() to filters and use it in cls_flower
        tcp: Fix highest_sack and highest_sack_seq
        ptp: fix the race between the release of ptp_clock and cdev
        net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S
        Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation
        net: dsa: sja1105: Remove restriction of zero base-time for taprio offload
        net: dsa: sja1105: Really make the PTP command read-write
        net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot
        cxgb4/cxgb4vf: fix flow control display for auto negotiation
        mlxsw: spectrum: Use dedicated policer for VRRP packets
        mlxsw: spectrum_router: Skip loopback RIFs during MAC validation
        net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs
        net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device
        net_sched: sch_fq: properly set sk->sk_pacing_status
        bnx2x: Fix accounting of vlan resources among the PFs
        bnx2x: Use appropriate define for vlan credit
        of: mdio: Add missing inline to of_mdiobus_child_is_phy() dummy
        net: phy: aquantia: add suspend / resume ops for AQR105
        dpaa_eth: fix DMA mapping leak
        ...
      738d2902
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · c5c928c6
      Linus Torvalds authored
      Pull tomoyo fixes from Tetsuo Handa:
       "Two bug fixes:
      
         - Suppress RCU warning at list_for_each_entry_rcu()
      
         - Don't use fancy names on sockets"
      
      * tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: Suppress RCU warning at list_for_each_entry_rcu().
        tomoyo: Don't use nifty names on sockets.
      c5c928c6
    • Taehee Yoo's avatar
      hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() · 04b69426
      Taehee Yoo authored
      hsr slave interfaces don't have debugfs directory.
      So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
      is changed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link set dummy0 name ap
      
      Splat looks like:
      [21071.899367][T22666] ap: renamed from dummy0
      [21071.914005][T22666] ==================================================================
      [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
      [21071.926941][T22666]
      [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
      [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [21071.935094][T22666] Call Trace:
      [21071.935867][T22666]  dump_stack+0x96/0xdb
      [21071.936687][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.937774][T22666]  print_address_description.constprop.5+0x1be/0x360
      [21071.939019][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940081][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940949][T22666]  __kasan_report+0x12a/0x16f
      [21071.941758][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.942674][T22666]  kasan_report+0xe/0x20
      [21071.943325][T22666]  hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.944187][T22666]  hsr_netdev_notify+0x1fe/0x9b0 [hsr]
      [21071.945052][T22666]  ? __module_text_address+0x13/0x140
      [21071.945897][T22666]  notifier_call_chain+0x90/0x160
      [21071.946743][T22666]  dev_change_name+0x419/0x840
      [21071.947496][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.948600][T22666]  ? netdev_adjacent_rename_links+0x280/0x280
      [21071.949577][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.950672][T22666]  ? lock_downgrade+0x6e0/0x6e0
      [21071.951345][T22666]  ? do_setlink+0x811/0x2ef0
      [21071.951991][T22666]  do_setlink+0x811/0x2ef0
      [21071.952613][T22666]  ? is_bpf_text_address+0x81/0xe0
      [ ... ]
      
      Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com
      Fixes: 4c2d5e33 ("hsr: rename debugfs file when interface name is changed")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04b69426
    • Davide Caratti's avatar
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti authored
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b72a08
    • Vijay Khemka's avatar
      net/ncsi: Fix gma flag setting after response · 9e860947
      Vijay Khemka authored
      gma_flag was set at the time of GMA command request but it should
      only be set after getting successful response. Movinng this flag
      setting in GMA response handler.
      
      This flag is used mainly for not repeating GMA command once
      received MAC address.
      Signed-off-by: default avatarVijay Khemka <vijaykhemka@fb.com>
      Reviewed-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e860947
    • Kevin Kou's avatar
      sctp: add enabled check for path tracepoint loop. · f398efc1
      Kevin Kou authored
      sctp_outq_sack is the main function handles SACK, it is called very
      frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack"
      added below code to this function, sctp tracepoint is disabled most of time,
      but the loop of transport list will be always called even though the
      tracepoint is disabled, this is unnecessary.
      
      +	/* SCTP path tracepoint for congestion control debugging. */
      +	list_for_each_entry(transport, transport_list, transports) {
      +		trace_sctp_probe_path(transport, asoc);
      +	}
      
      This patch is to add tracepoint enabled check at outside of the loop of
      transport list, and avoid traversing the loop when trace is disabled,
      it is a small optimization.
      Signed-off-by: default avatarKevin Kou <qdkevin.kou@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f398efc1
    • David S. Miller's avatar
      Merge branch 'Improvements-to-SJA1105-DSA-RX-timestamping' · 9010ef57
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Improvements to SJA1105 DSA RX timestamping
      
      This series makes the sja1105 DSA driver use a dedicated kernel thread
      for RX timestamping, a process which is time-sensitive and otherwise a
      bit fragile. This allows users to customize their system (probabil an
      embedded PTP switch) fully and allocate the CPU bandwidth for the driver
      to expedite the RX timestamps as quickly as possible.
      
      While doing this conversion, add a function to the PTP core for
      cancelling this kernel thread (function which I found rather strange to
      be missing).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9010ef57
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Empty the RX timestamping queue on PTP settings change · 19d1f0ed
      Vladimir Oltean authored
      When disabling PTP timestamping, don't reset the switch with the new
      static config until all existing PTP frames have been timestamped on the
      RX path or dropped. There's nothing we can do with these afterwards.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19d1f0ed
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX timestamping · 1e762bd2
      Vladimir Oltean authored
      And move the queue of skb's waiting for RX timestamps into the ptp_data
      structure, since it isn't needed if PTP is not compiled.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e762bd2
    • Vladimir Oltean's avatar
      ptp: introduce ptp_cancel_worker_sync · 544fed47
      Vladimir Oltean authored
      In order to effectively use the PTP kernel thread for tasks such as
      timestamping packets, allow the user control over stopping it, which is
      needed e.g. when the timestamping queues must be drained.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      544fed47
    • Cambda Zhu's avatar
      tcp: Fix highest_sack and highest_sack_seq · 85369750
      Cambda Zhu authored
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Signed-off-by: default avatarCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85369750
    • Vladis Dronov's avatar
      ptp: fix the race between the release of ptp_clock and cdev · a33121e5
      Vladis Dronov authored
      In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
      device is removed, closing this file leads to a race. This reproduces
      easily in a kvm virtual machine:
      
      ts# cat openptp0.c
      int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
      ts# uname -r
      5.5.0-rc3-46cf053e
      ts# cat /proc/cmdline
      ... slub_debug=FZP
      ts# modprobe ptp_kvm
      ts# ./openptp0 &
      [1] 670
      opened /dev/ptp0, sleeping 10s...
      ts# rmmod ptp_kvm
      ts# ls /dev/ptp*
      ls: cannot access '/dev/ptp*': No such file or directory
      ts# ...woken up
      [   48.010809] general protection fault: 0000 [#1] SMP
      [   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
      [   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
      [   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
      [   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
      [   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
      [   48.019470] ...                                              ^^^ a slub poison
      [   48.023854] Call Trace:
      [   48.024050]  __fput+0x21f/0x240
      [   48.024288]  task_work_run+0x79/0x90
      [   48.024555]  do_exit+0x2af/0xab0
      [   48.024799]  ? vfs_write+0x16a/0x190
      [   48.025082]  do_group_exit+0x35/0x90
      [   48.025387]  __x64_sys_exit_group+0xf/0x10
      [   48.025737]  do_syscall_64+0x3d/0x130
      [   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   48.026479] RIP: 0033:0x7f53b12082f6
      [   48.026792] ...
      [   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
      [   48.045001] Fixing recursive fault but reboot is needed!
      
      This happens in:
      
      static void __fput(struct file *file)
      {   ...
          if (file->f_op->release)
              file->f_op->release(inode, file); <<< cdev is kfree'd here
          if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
                   !(mode & FMODE_PATH))) {
              cdev_put(inode->i_cdev); <<< cdev fields are accessed here
      
      Namely:
      
      __fput()
        posix_clock_release()
          kref_put(&clk->kref, delete_clock) <<< the last reference
            delete_clock()
              delete_ptp_clock()
                kfree(ptp) <<< cdev is embedded in ptp
        cdev_put
          module_put(p->owner) <<< *p is kfree'd, bang!
      
      Here cdev is embedded in posix_clock which is embedded in ptp_clock.
      The race happens because ptp_clock's lifetime is controlled by two
      refcounts: kref and cdev.kobj in posix_clock. This is wrong.
      
      Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
      created especially for such cases. This way the parent device with its
      ptp_clock is not released until all references to the cdev are released.
      This adds a requirement that an initialized but not exposed struct
      device should be provided to posix_clock_register() by a caller instead
      of a simple dev_t.
      
      This approach was adopted from the commit 72139dfa ("watchdog: Fix
      the race between the release of watchdog_core_data and cdev"). See
      details of the implementation in the commit 233ed09d ("chardev: add
      helper function to register char devs with a struct device").
      
      Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#uAnalyzed-by: default avatarStephen Johnston <sjohnsto@redhat.com>
      Analyzed-by: default avatarVern Lovejoy <vlovejoy@redhat.com>
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33121e5
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S · 54fa49ee
      Vladimir Oltean authored
      For first-generation switches (SJA1105E and SJA1105T):
      - TPID means C-Tag (typically 0x8100)
      - TPID2 means S-Tag (typically 0x88A8)
      
      While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R,
      SJA1105S) it is the other way around:
      - TPID means S-Tag (typically 0x88A8)
      - TPID2 means C-Tag (typically 0x8100)
      
      In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with
      TPID2.
      
      So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke
      it for E/T.
      
      We strive for a common code path for all switches in the family, so just
      lie in the static config packing functions that TPID and TPID2 are at
      swapped bit offsets than they actually are, for P/Q/R/S. This will make
      both switches understand TPID to be ETH_P_8021Q and TPID2 to be
      ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S
      because E/T is actually the one with public documentation available
      (UM10944.pdf).
      
      Fixes: f9a1a764 ("net: dsa: sja1105: Reverse TPID and TPID2")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54fa49ee
    • Vladimir Oltean's avatar
      Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation · 3a323ed7
      Vladimir Oltean authored
      Since commit 86db36a3 ("net: dsa: sja1105: Implement state machine
      for TAS with PTP clock source"), this paragraph is no longer true. So
      remove it.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a323ed7
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Remove restriction of zero base-time for taprio offload · d00bdc0a
      Vladimir Oltean authored
      The check originates from the initial implementation which was not based
      on PTP time but on a standalone clock source. In the meantime we can now
      program the PTPSCHTM register at runtime with the dynamic base time
      (actually with a value that is 200 ns smaller, to avoid writing DELTA=0
      in the Schedule Entry Points Parameters Table). And we also have logic
      for moving the actual base time in the future of the PHC's current time
      base, so the check for zero serves no purpose, since even if the user
      will specify zero, that's not what will end up in the static config
      table where the limitation is.
      
      Fixes: 86db36a3 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00bdc0a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Really make the PTP command read-write · 5a47f588
      Vladimir Oltean authored
      When activating tc-taprio offload on the switch ports, the TAS state
      machine will try to check whether it is running or not, but will find
      both the STARTED and STOPPED bits as false in the
      sja1105_tas_check_running function. So the function will return -EINVAL
      (an abnormal situation) and the kernel will keep printing this from the
      TAS FSM workqueue:
      
      [   37.691971] sja1105 spi0.1: An operation returned -22
      
      The reason is that the underlying function that gets called,
      sja1105_ptp_commit, does not actually do a SPI_READ, but a SPI_WRITE. So
      the command buffer remains initialized with zeroes instead of retrieving
      the hardware state. Fix that.
      
      Fixes: 41603d78 ("net: dsa: sja1105: Make the PTP command read-write")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a47f588
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot · 9fcf024d
      Vladimir Oltean authored
      The PTP egress timestamp N must be captured from register PTPEGR_TS[n],
      where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per
      port. We are only using TSREG=0.
      
      As opposed to the management slots, which are 4 in number
      (SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which
      includes PTP frames) can be sent to any non-CPU port through any
      management slot. When the CPU port is not the last port (#4), there will
      be a mismatch between the slot and the port number.
      
      Luckily, the only mainline occurrence with this switch
      (arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the
      issue did not manifest itself thus far.
      
      Fixes: 47ed985e ("net: dsa: sja1105: Add logic for TX timestamping")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fcf024d
    • Christophe JAILLET's avatar
      sfc: avoid duplicate error handling code in 'efx_ef10_sriov_set_vf_mac()' · db99d512
      Christophe JAILLET authored
      'eth_zero_addr()' is already called in the error handling path. This is
      harmless, but there is no point in calling it twice, so remove one.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db99d512
  2. 30 Dec, 2019 3 commits
    • Eric Dumazet's avatar
      tcp_cubic: refactor code to perform a divide only when needed · f278b99c
      Eric Dumazet authored
      Neal Cardwell suggested to not change ca->delay_min
      and apply the ack delay cushion only when Hystart ACK train
      is still under consideration. This should avoid a 64bit
      divide unless needed.
      
      Tested:
      
      40Gbit(mlx4) testbed (with sch_fq as packet scheduler)
      
      $ echo -n 'file tcp_cubic.c +p'  >/sys/kernel/debug/dynamic_debug/control
      $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart"
        14815
        16280
        15293
        15563
        11574
        15145
        14789
        18548
        16972
        12520
      TcpExtTCPHystartTrainDetect     10                 0.0
      TcpExtTCPHystartTrainCwnd       1396               0.0
      $ dmesg | tail -10
      [ 4873.951350] hystart_ack_train (116 > 93) delay_min 24 (+ ack_delay 69) cwnd 80
      [ 4875.155379] hystart_ack_train (55 > 50) delay_min 21 (+ ack_delay 29) cwnd 160
      [ 4876.333921] hystart_ack_train (69 > 62) delay_min 23 (+ ack_delay 39) cwnd 130
      [ 4877.519037] hystart_ack_train (69 > 60) delay_min 22 (+ ack_delay 38) cwnd 130
      [ 4878.701559] hystart_ack_train (87 > 63) delay_min 24 (+ ack_delay 39) cwnd 160
      [ 4879.844597] hystart_ack_train (93 > 50) delay_min 21 (+ ack_delay 29) cwnd 216
      [ 4880.956650] hystart_ack_train (74 > 67) delay_min 20 (+ ack_delay 47) cwnd 108
      [ 4882.098500] hystart_ack_train (61 > 57) delay_min 23 (+ ack_delay 34) cwnd 130
      [ 4883.262056] hystart_ack_train (72 > 67) delay_min 21 (+ ack_delay 46) cwnd 130
      [ 4884.418760] hystart_ack_train (74 > 67) delay_min 29 (+ ack_delay 38) cwnd 152
      
      10Gbit(bnx2x) testbed (with sch_fq as packet scheduler)
      
      $ echo -n 'file tcp_cubic.c +p'  >/sys/kernel/debug/dynamic_debug/control
      $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpk52 -l -4000000; done;nstat|egrep "Hystart"
         7050
         7065
         7100
         6900
         7202
         7263
         7189
         6869
         7463
         7034
      TcpExtTCPHystartTrainDetect     10                 0.0
      TcpExtTCPHystartTrainCwnd       3199               0.0
      $ dmesg | tail -10
      [  176.920012] hystart_ack_train (161 > 141) delay_min 83 (+ ack_delay 58) cwnd 264
      [  179.144645] hystart_ack_train (164 > 159) delay_min 120 (+ ack_delay 39) cwnd 444
      [  181.354527] hystart_ack_train (214 > 168) delay_min 125 (+ ack_delay 43) cwnd 436
      [  183.539565] hystart_ack_train (170 > 147) delay_min 96 (+ ack_delay 51) cwnd 326
      [  185.727309] hystart_ack_train (177 > 160) delay_min 61 (+ ack_delay 99) cwnd 128
      [  187.947142] hystart_ack_train (184 > 167) delay_min 123 (+ ack_delay 44) cwnd 367
      [  190.166680] hystart_ack_train (230 > 153) delay_min 116 (+ ack_delay 37) cwnd 444
      [  192.327285] hystart_ack_train (210 > 206) delay_min 86 (+ ack_delay 120) cwnd 152
      [  194.511392] hystart_ack_train (173 > 151) delay_min 94 (+ ack_delay 57) cwnd 239
      [  196.736023] hystart_ack_train (149 > 146) delay_min 105 (+ ack_delay 41) cwnd 399
      
      Fixes: 42f3a8aa ("tcp_cubic: tweak Hystart detection for short RTT flows")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://www.spinics.net/lists/netdev/msg621886.html
      Link: https://www.spinics.net/lists/netdev/msg621797.htmlAcked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f278b99c
    • Rahul Lakkireddy's avatar
      cxgb4/cxgb4vf: fix flow control display for auto negotiation · 0caeaf6a
      Rahul Lakkireddy authored
      As per 802.3-2005, Section Two, Annex 28B, Table 28B-2 [1], when
      _only_ Rx pause is enabled, both symmetric and asymmetric pause
      towards local device must be enabled. Also, firmware returns the local
      device's flow control pause params as part of advertised capabilities
      and negotiated params as part of current link attributes. So, fix up
      ethtool's flow control pause params fetch logic to read from acaps,
      instead of linkattr.
      
      [1] https://standards.ieee.org/standard/802_3-2005.html
      
      Fixes: c3168cab ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")
      Signed-off-by: default avatarSurendra Mobiya <surendra@chelsio.com>
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0caeaf6a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · ba402810
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner.
      
      2) Document ingress hook in netdevice, also from Lukas.
      
      3) Remove htons() in tunnel metadata port netlink attributes,
         from Xin Long.
      
      4) Missing erspan netlink attribute validation also from Xin Long.
      
      5) Missing erspan version in tunnel, from Xin Long.
      
      6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN}
         Patch from Xin Long.
      
      7) Missing nla_nest_cancel() in tunnel netlink dump path,
         from Xin Long.
      
      8) Remove two exported conntrack symbols with no clients,
         from Florian Westphal.
      
      9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian.
      
      10) Add nft_meta_pkttype helper for loopback, also from Florian.
      
      11) Add nft_meta_socket uid helper, from Florian Westphal.
      
      12) Add nft_meta_cgroup helper, from Florian.
      
      13) Add nft_meta_ifkind helper, from Florian.
      
      14) Group all interface related meta selector, from Florian.
      
      15) Add nft_prandom_u32() helper, from Florian.
      
      16) Add nft_meta_rtclassid helper, from Florian.
      
      17) Add support for matching on the slave device index,
          from Florian.
      
      This batch, among other things, contains updates for the netfilter
      tunnel netlink interface: This extension is still incomplete and lacking
      proper userspace support which is actually my fault, I did not find the
      time to go back and finish this. This update is breaking tunnel UAPI in
      some aspects to fix it but do it better sooner than never.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba402810
  3. 29 Dec, 2019 8 commits
  4. 28 Dec, 2019 10 commits
    • David S. Miller's avatar
      Merge branch 'DSA-TX-tstamp' · 1a1fda57
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      The DSA TX timestamping situation
      
      This series is the moral v2 of "[PATCH net] net: dsa: sja1105: Fix
      double delivery of TX timestamps to socket error queue" [0] which did
      not manage to convince public opinion (actually it didn't convince me
      neither).
      
      This fixes PTP timestamping on one particular board, where the DSA
      switch is sja1105 and the master is gianfar. Unfortunately there is no
      way to make the fix more general without committing logical
      inaccuracies: the SKBTX_IN_PROGRESS flag does serve a purpose, even if
      the sja1105 driver is not using it now: it prevents delivering a SW
      timestamp to the app socket when the HW timestamp will be provided. So
      not setting this flag (the approach from v1) might create avoidable
      complications in the future (not to mention that there isn't any
      satisfactory explanation on why that would be the correct solution).
      
      So the goal of this change set is to create a more strict framework for
      DSA master devices when attached to PTP switches, and to fix the first
      master driver that is overstepping its duties and is delivering
      unsolicited TX timestamps.
      
      [0]: https://www.spinics.net/lists/netdev/msg619699.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a1fda57
    • Vladimir Oltean's avatar
      net: dsa: Deny PTP on master if switch supports it · f685e609
      Vladimir Oltean authored
      It is possible to kill PTP on a DSA switch completely and absolutely,
      until a reboot, with a simple command:
      
      tcpdump -i eth2 -j adapter_unsynced
      
      where eth2 is the switch's DSA master.
      
      Why? Well, in short, the PTP API in place today is a bit rudimentary and
      relies on applications to retrieve the TX timestamps by polling the
      error queue and looking at the cmsg structure. But there is no timestamp
      identification of any sorts (except whether it's HW or SW), you don't
      know how many more timestamps are there to come, which one is this one,
      from whom it is, etc. In other words, the SO_TIMESTAMPING API is
      fundamentally limited in that you can get a single HW timestamp from the
      stack.
      
      And the "-j adapter_unsynced" flag of tcpdump enables hardware
      timestamping.
      
      So let's imagine what happens when the DSA master decides it wants to
      deliver TX timestamps to the skb's socket too:
      - The timestamp that the user space sees is taken by the DSA master.
        Whereas the RX timestamp will eventually be overwritten by the DSA
        switch. So the RX and TX timestamps will be in different time bases
        (aka garbage).
      - The user space applications have no way to deal with the second (real)
        TX timestamp finally delivered by the DSA switch, or even to know to
        wait for it.
      
      Take ptp4l from the linuxptp project, for example. This is its behavior
      after running tcpdump, before the patch:
      
      ptp4l[172]: [6469.594] Unexpected data on socket err queue:
      ptp4l[172]: [6469.693] rms    8 max   16 freq -21257 +/-  11 delay   748 +/-   0
      ptp4l[172]: [6469.711] Unexpected data on socket err queue:
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.721] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.838] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.848] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02
      ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f
      ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05
      ptp4l[172]: 0040 de 06 00 01
      ptp4l[172]: [6469.855] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.974] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      
      The ptp4l program itself is heavily patched to show this (more details
      here [0]). Otherwise, by default it just hangs.
      
      On the other hand, with the DSA patch to disallow HW timestamping
      applied:
      
      tcpdump -i eth2 -j adapter_unsynced
      tcpdump: SIOCSHWTSTAMP failed: Device or resource busy
      
      So it is a fact of life that PTP timestamping on the DSA master is
      incompatible with timestamping on the switch MAC, at least with the
      current API. And if the switch supports PTP, taking the timestamps from
      the switch MAC is highly preferable anyway, due to the fact that those
      don't contain the queuing latencies of the switch. So just disallow PTP
      on the DSA master if there is any PTP-capable switch attached.
      
      [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/
      
      Fixes: 0336369d ("net: dsa: forward hardware timestamping ioctls to switch driver")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f685e609
    • Vladimir Oltean's avatar
      gianfar: Fix TX timestamping with a stacked DSA driver · c26a2c2d
      Vladimir Oltean authored
      The driver wrongly assumes that it is the only entity that can set the
      SKBTX_IN_PROGRESS bit of the current skb. Therefore, in the
      gfar_clean_tx_ring function, where the TX timestamp is collected if
      necessary, the aforementioned bit is used to discriminate whether or not
      the TX timestamp should be delivered to the socket's error queue.
      
      But a stacked driver such as a DSA switch can also set the
      SKBTX_IN_PROGRESS bit, which is actually exactly what it should do in
      order to denote that the hardware timestamping process is undergoing.
      
      Therefore, gianfar would misinterpret the "in progress" bit as being its
      own, and deliver a second skb clone in the socket's error queue,
      completely throwing off a PTP process which is not expecting to receive
      it, _even though_ TX timestamping is not enabled for gianfar.
      
      There have been discussions [0] as to whether non-MAC drivers need or
      not to set SKBTX_IN_PROGRESS at all (whose purpose is to avoid sending 2
      timestamps, a sw and a hw one, to applications which only expect one).
      But as of this patch, there are at least 2 PTP drivers that would break
      in conjunction with gianfar: the sja1105 DSA switch and the felix
      switch, by way of its ocelot core driver.
      
      So regardless of that conclusion, fix the gianfar driver to not do stuff
      based on flags set by others and not intended for it.
      
      [0]: https://www.spinics.net/lists/netdev/msg619699.html
      
      Fixes: f0ee7acf ("gianfar: Add hardware TX timestamping support")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c26a2c2d
    • Chen Zhou's avatar
      net/wan/fsl_ucc_hdlc: remove set but not used variables 'ut_info' and 'ret' · 270fe2ce
      Chen Zhou authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/wan/fsl_ucc_hdlc.c: In function ucc_hdlc_irq_handler:
      drivers/net/wan/fsl_ucc_hdlc.c:643:23:
      	warning: variable ut_info set but not used [-Wunused-but-set-variable]
      drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_suspend:
      drivers/net/wan/fsl_ucc_hdlc.c:880:23:
      	warning: variable ut_info set but not used [-Wunused-but-set-variable]
      drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_resume:
      drivers/net/wan/fsl_ucc_hdlc.c:925:6:
      	warning: variable ret set but not used [-Wunused-but-set-variable]
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarChen Zhou <chenzhou10@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      270fe2ce
    • Olof Johansson's avatar
      riscv: export flush_icache_all to modules · 1833e327
      Olof Johansson authored
      This is needed by LKDTM (crash dump test module), it calls
      flush_icache_range(), which on RISC-V turns into flush_icache_all(). On
      other architectures, the actual implementation is exported, so follow
      that precedence and export it here too.
      
      Fixes build of CONFIG_LKDTM that fails with:
      ERROR: "flush_icache_all" [drivers/misc/lkdtm/lkdtm.ko] undefined!
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      1833e327
    • David Abdurachmanov's avatar
      riscv: reject invalid syscalls below -1 · 556f47ac
      David Abdurachmanov authored
      Running "stress-ng --enosys 4 -t 20 -v" showed a large number of kernel oops
      with "Unable to handle kernel paging request at virtual address" message. This
      happens when enosys stressor starts testing random non-valid syscalls.
      
      I forgot to redirect any syscall below -1 to sys_ni_syscall.
      
      With the patch kernel oops messages are gone while running stress-ng enosys
      stressor.
      Signed-off-by: default avatarDavid Abdurachmanov <david.abdurachmanov@sifive.com>
      Fixes: 5340627e ("riscv: add support for SECCOMP and SECCOMP_FILTER")
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      556f47ac
    • Luc Van Oostenryck's avatar
      riscv: fix compile failure with EXPORT_SYMBOL() & !MMU · 4d47ce15
      Luc Van Oostenryck authored
      When support for !MMU was added, the declaration of
      __asm_copy_to_user() & __asm_copy_from_user() were #ifdefed
      out hence their EXPORT_SYMBOL() give an error message like:
        .../riscv_ksyms.c:13:15: error: '__asm_copy_to_user' undeclared here
        .../riscv_ksyms.c:14:15: error: '__asm_copy_from_user' undeclared here
      
      Since these symbols are not defined with !MMU it's wrong to export them.
      Same for __clear_user() (even though this one is also declared in
      include/asm-generic/uaccess.h and thus doesn't give an error message).
      
      Fix this by doing the EXPORT_SYMBOL() directly where these symbols
      are defined: inside lib/uaccess.S itself.
      
      Fixes: 6bd33e1e ("riscv: fix compile failure with EXPORT_SYMBOL() & !MMU")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      4d47ce15
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · bf8d1cd4
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Four fixes and one spelling update, all in drivers: two in lpfc and
        the rest in mp3sas, cxgbi and target"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target/iblock: Fix protection error with blocks greater than 512B
        scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy()
        scsi: lpfc: fix spelling mistakes of asynchronous
        scsi: lpfc: fix build failure with DEBUGFS disabled
        scsi: mpt3sas: Fix double free in attach error handling
      bf8d1cd4
    • David S. Miller's avatar
      Merge branch 'ethtool-netlink-part-one' · 1b3b289f
      David S. Miller authored
      Michal Kubecek says:
      
      ====================
      ethtool netlink interface, part 1
      
      This is first part of netlink based alternative userspace interface for
      ethtool. It aims to address some long known issues with the ioctl
      interface, mainly lack of extensibility, raciness, limited error reporting
      and absence of notifications. The goal is to allow userspace ethtool
      utility to provide all features it currently does but without using the
      ioctl interface. However, some features provided by ethtool ioctl API will
      be available through other netlink interfaces (rtnetlink, devlink) if it's
      more appropriate.
      
      The interface uses generic netlink family "ethtool" and provides multicast
      group "monitor" which is used for notifications. Documentation for the
      interface is in Documentation/networking/ethtool-netlink.rst file. The
      netlink interface is optional, it is built when CONFIG_ETHTOOL_NETLINK
      (bool) option is enabled.
      
      There are three types of request messages distinguished by suffix "_GET"
      (query for information), "_SET" (modify parameters) and "_ACT" (perform an
      action). Kernel reply messages have name with additional suffix "_REPLY"
      (e.g. ETHTOOL_MSG_SETTINGS_GET_REPLY). Most "_SET" and "_ACT" message types
      do not have matching reply type as only some of them need additional reply
      data beyond numeric error code and extack. Kernel also broadcasts
      notification messages ("_NTF" suffix) on changes.
      
      Basic concepts:
      
      - make extensions easier not only by allowing new attributes but also by
        imposing as few artificial limits as possible, e.g. by using arbitrary
        size bit sets for most bitmap attributes or by not using fixed size
        strings
      - use extack for error reporting and warnings
      - send netlink notifications on changes (even if they were done using the
        ioctl interface) and actions
      - avoid the racy read/modify/write cycle between kernel and userspace by
        sending only attributes which userspace wants to change; there is still
        a read/modify/write cycle between generic kernel code and ethtool_ops
        handler in NIC driver but it is only in kernel and under RTNL lock
      - reduce the number of name lists that need to be kept in sync between
        kernel and userspace (e.g. recognized link modes)
      - where feasible, allow dump requests to query specific information for all
        network devices
      - as parsing and generating netlink messages is more complicated than
        simply copying data structures between userspace API and ethtool_ops
        handlers (which most ioctl commands do), split the code into multiple
        files in net/ethtool directory; move net/core/ethtool.c also to this
        directory and rename it to ioctl.c
      
      Changes between v8 and v9:
      
      - fix ethnl_update_u8()
      - fix description of ETHTOOL_A_LINKSTATE_LINK in rst file
      - add explanation of verbose vs. compact bitset usage to documentation
      - link ethtool-netlink.rst into toctree
      
      Main changes between v7 and v8:
      
      - preliminary patches sent as a separate series (already in net-next)
      - split notification related changes out of _SET patches
      - drop request specific flags from common header
      - use FLAG/flag rather than GFLAG/gflag for global flags (as there are
        only global flags now)
      - allow device names up to ALTIFNAMSIZ characters
      - rename ETHTOOL_A_BITSET_LIST to ETHTOOL_A_BITSET_NOMASK
      - rename ETHTOOL_A_BIT{,S}_* to ETHTOOL_A_BITSET_BIT{,S}_*
      - use standard bitset helpers for link modes (rather than in-place
        conversion)
      - use "default" rather than "standard" for unified _GET handlers
      - fixed 64-bit big endian bitset code
      
      Main changes between v6 and v7:
      
      - split complex messages into small single purpose ones (drop info and
        request masks and one level of nesting)
      - separate request information and reply data into two structures
      - refactor bitset handling (no simultaneous u32/ulong handling but avoid
        kmalloc() except for long bitmaps on 64-bit big endian architectures)
      - use only fixed size strings internally (will be replaced by char *
        eventually but that will require rewriting also existing ioctl code)
      - rework ethnl_update_* helpers to return error code
      - rename request flag constants (to ETHTOOL_[GR]FLAG_ prefix)
      - convert documentation to rst
      
      Main changes between v5 and v6:
      
      - use ETHTOOL_MSG_ prefix for message types
      - replace ETHA_ prefix for netlink attributes by ETHTOOL_A_
      - replace ETH_x_IM_y for infomask bits by ETHTOOL_IM_x_y
      - split GET reply types from SET requests and notifications
      - split kernel and userspace message types into different enums
      - remove INFO_GET requests from submitted part
      - drop EVENT notifications (use rtnetlink and on-demand string set load)
      - reorganize patches to reduce the number of intermitent warnings
      - unify request/reply header and its processing
      - another nest around strings in a string set for consistency
      - more consistent identifier naming
      - coding style cleanup
      - get rid of some of the helpers
      - set bad attribute in extack where applicable
      - various bug fixes
      - improve documentation and code comments, more kerneldoc comments
      - more verbose commit messages
      
      Changes between v4 and v5:
      
      - do not panic on failed initialization, only WARN()
      
      Main changes between RFC v3 and v4:
      
      - use more kerneldoc style comments
      - strict attribute policy checking
      - use macros for tables of link mode names and parameters
      - provide permanent hardware address in rtnetlink
      - coding style cleanup
      - split too long patches, reorder
      - wrap more ETHA_SETTINGS_* attributes in nests
      - add also some SET_* implementation into submitted part
      
      Main changes between RFC v2 and RFC v3:
      
      - do not allow building as a module (no netdev notifiers needed)
      - drop some obsolete fields
      - add permanent hw address, timestamping and private flags support
      - rework bitset handling to get rid of variable length arrays
      - notify monitor on device renames
      - restructure GET_SETTINGS/SET_SETTINGS messages
      - split too long patches and submit only first part of the series
      
      Main changes between RFC v1 and RFC v2:
      
      - support dumps for all "get" requests
      - provide notifications for changes related to supported request types
      - support getting string sets (both global and per device)
      - support getting/setting device features
      - get rid of family specific header, everything passed as attributes
      - split netlink code into multiple files in net/ethtool/ directory
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b3b289f
    • Michal Kubecek's avatar
      ethtool: provide link state with LINKSTATE_GET request · 3d2b847f
      Michal Kubecek authored
      Implement LINKSTATE_GET netlink request to get link state information.
      
      At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
      is returned.
      
      LINKSTATE_GET request can be used with NLM_F_DUMP (without device
      identification) to request the information for all devices in current
      network namespace providing the data.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d2b847f