1. 08 Nov, 2018 8 commits
  2. 07 Nov, 2018 3 commits
  3. 06 Nov, 2018 23 commits
    • Miroslav Lichvar's avatar
      igb: shorten maximum PHC timecounter update interval · 4c9b658e
      Miroslav Lichvar authored
      The timecounter needs to be updated at least once per ~550 seconds in
      order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
      timestamp.
      
      Since commit 500462a9 ("timers: Switch to a non-cascading wheel"),
      scheduling of delayed work seems to be less accurate and a requested
      delay of 540 seconds may actually be longer than 550 seconds. Also, the
      PHC may be adjusted to run up to 6% faster than real time and the system
      clock up to 10% slower. Shorten the delay to 360 seconds to be sure the
      timecounter is updated in time.
      
      This fixes an issue with HW timestamps on 82580/I350/I354 being off by
      ~1100 seconds for few seconds every ~9 minutes.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c9b658e
    • Brett Creeley's avatar
      ice: Fix the bytecount sent to netdev_tx_sent_queue · d944b469
      Brett Creeley authored
      Currently if the driver does a TSO offload the bytecount sent to
      netdev_tx_sent_queue will be incorrect. This is because in ice_tso we
      overwrite the initial value that we set in ice_tx_map. This creates a
      mismatch between the Tx and Tx clean flow. In the Tx clean flow we
      calculate the bytecount (called total_bytes) as we clean the
      descriptors so the value used in the Tx clean path is correct. Fix this
      by using += in ice_tso instead of =. This fixes the mismatch in
      bytecount mentioned above.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d944b469
    • Brett Creeley's avatar
      ice: Fix tx_timeout in PF driver · c585ea42
      Brett Creeley authored
      Prior to this commit the driver was running into tx_timeouts when a
      queue was stressed enough. This was happening because the HW tail
      and SW tail (NTU) were incorrectly out of sync. Consequently this was
      causing the HW head to collide with the HW tail, which to the hardware
      means that all descriptors posted for Tx have been processed.
      
      Due to the Tx logic used in the driver SW tail and HW tail are allowed
      to be out of sync. This is done as an optimization because it allows the
      driver to write HW tail as infrequently as possible, while still
      updating the SW tail index to keep track. However, there are situations
      where this results in the tail never getting updated, resulting in Tx
      timeouts.
      
      Tx HW tail write condition:
      	if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more)
      		writel(sw_tail, tx_ring->tail);
      
      An issue was found in the Tx logic that was causing the afore mentioned
      condition for updating HW tail to never happen, causing tx_timeouts.
      
      In ice_xmit_frame_ring we calculate how many descriptors we need for the
      Tx transaction based on the skb the kernel hands us. This is then passed
      into ice_maybe_stop_tx along with some extra padding to determine if we
      have enough descriptors available for this transaction. If we don't then
      we return -EBUSY to the stack, otherwise we move on and eventually
      prepare the Tx descriptors accordingly in ice_tx_map and set
      next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx
      with a value of MAX_SKB_FRAGS + 4. The key here is that this value is
      possibly less than the value we sent in the first call to
      ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused
      descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first
      call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update
      the HW tail because of the "Tx HW tail write condition" above. This is
      because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx
      instead of calling __ice_maybe_stop_tx and subsequently calling
      netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This
      bit is then checked in the "Tx HW tail write condition" by calling
      netif_xmit_stopped and subsequently updating HW tail if the
      afore mentioned bit is set.
      
      In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning
      the descriptors that HW sets the DD bit on and we have the budget. The
      HW head will eventually run into the HW tail in response to the
      description in the paragraph above.
      
      The next time through ice_xmit_frame_ring we make the initial call to
      ice_maybe_stop_tx with another skb from the stack. This time we do not
      have enough descriptors available and we return NETDEV_TX_BUSY to the
      stack and end up setting next_to_watch to NULL.
      
      This is where we are stuck. In ice_clean_tx_irq we never clean anything
      because next_to_watch is always NULL and in ice_xmit_frame_ring we never
      update HW tail because we already return NETDEV_TX_BUSY to the stack and
      eventually we hit a tx_timeout.
      
      This issue was fixed by making sure that the second call to
      ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value
      that was used on the initial call to ice_maybe_stop_tx in
      ice_xmit_frame_ring. This was done by adding the following defines to
      make the logic more clear and to reduce the chance of mucking this up
      again:
      
      ICE_CACHE_LINE_BYTES		64
      ICE_DESCS_PER_CACHE_LINE	(ICE_CACHE_LINE_BYTES / \
      				 sizeof(struct ice_tx_desc))
      ICE_DESCS_FOR_CTX_DESC		1
      ICE_DESCS_FOR_SKB_DATA_PTR	1
      
      The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we
      don't have to figure this out on every pass through the Tx path. Instead
      I added a sanity check in ice_probe to verify cache line size and print
      a message if it's not 64 Bytes. This will make it easier to file issues
      if they are seen when the cache line size is not 64 Bytes when reading
      from the GLPCI_CNF2 register.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c585ea42
    • Dave Ertman's avatar
      ice: Fix napi delete calls for remove · 25525b69
      Dave Ertman authored
      In the remove path, the vsi->netdev is being set to NULL before the call
      to free vectors. This is causing the netif_napi_del call to never be made.
      
      Add a call to ice_napi_del to the same location as the calls to
      unregister_netdev and just prior to them. This will use the reverse flow
      as the register and netif_napi_add calls.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      25525b69
    • Anirudh Venkataramanan's avatar
      ice: Fix typo in error message · 31082519
      Anirudh Venkataramanan authored
      Print should say "Enabling" instead of "Enaabling"
      Signed-off-by: default avatarAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      31082519
    • Md Fahad Iqbal Polash's avatar
      ice: Fix flags for port VLAN · 58297dd1
      Md Fahad Iqbal Polash authored
      According to the spec, whenever insert PVID field is set, the VLAN
      driver insertion mode should be set to 01b which isn't done currently.
      Fix it.
      Signed-off-by: default avatarMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      58297dd1
    • Anirudh Venkataramanan's avatar
      ice: Remove duplicate addition of VLANs in replay path · 9ecd25c2
      Anirudh Venkataramanan authored
      ice_restore_vlan and active_vlans were originally put in place to
      reprogram VLAN filters in the replay path. This is now done as part
      of the much broader VSI rebuild/replay framework. So remove both
      ice_restore_vlan and active_vlans
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9ecd25c2
    • Victor Raj's avatar
      ice: Free VSI contexts during for unload · 33e055fc
      Victor Raj authored
      In the unload path, all VSIs are freed. Also free the related VSI
      contexts to prevent memory leaks.
      Signed-off-by: default avatarVictor Raj <victor.raj@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      33e055fc
    • Akeem G Abodunrin's avatar
      ice: Fix dead device link issue with flow control · 0f5d4c21
      Akeem G Abodunrin authored
      Setting Rx or Tx pause parameter currently results in link loss on the
      interface, requiring the platform/host to be cold power cycled. Fix it.
      Signed-off-by: default avatarAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0f5d4c21
    • Anirudh Venkataramanan's avatar
      ice: Check for reset in progress during remove · afd9d4ab
      Anirudh Venkataramanan authored
      The remove path does not currently check to see if a
      reset is in progress before proceeding.  This can cause
      a resource collision resulting in various types of errors.
      
      Check for reset in progress and wait for a reasonable
      amount of time before allowing the remove to progress.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      afd9d4ab
    • Anirudh Venkataramanan's avatar
      ice: Set carrier state and start/stop queues in rebuild · ce317dd9
      Anirudh Venkataramanan authored
      Set the carrier state post rebuild by querying the link status. Also
      start/stop queues based on link status.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ce317dd9
    • Tao Ren's avatar
      net: phy: Allow BCM54616S PHY to setup internal TX/RX clock delay · 042cb564
      Tao Ren authored
      This patch allows users to enable/disable internal TX and/or RX clock
      delay for BCM54616S PHYs so as to satisfy RGMII timing specifications.
      
      On a particular platform, whether TX and/or RX clock delay is required
      depends on how PHY connected to the MAC IP. This requirement can be
      specified through "phy-mode" property in the platform device tree.
      
      The patch is inspired by commit 73333626 ("net: phy: Allow BCM5481x
      PHYs to setup internal TX/RX clock delay").
      Signed-off-by: default avatarTao Ren <taoren@fb.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      042cb564
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 8053e5b9
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Masami found a slight bug in his code where he transposed the
        arguments of a call to strpbrk.
      
        The reason this wasn't detected in our tests is that the only way this
        would transpire is when a kprobe event with a symbol offset is
        attached to a function that belongs to a module that isn't loaded yet.
        When the kprobe trace event is added, the offset would be truncated
        after it was parsed, and when the module is loaded, it would use the
        symbol without the offset (as the nul character added by the parsing
        would not be replaced with the original character)"
      
      * tag 'trace-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/kprobes: Fix strpbrk() argument order
      8053e5b9
    • Linus Torvalds's avatar
      Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm · 4581aa96
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "Ard spotted a typo in one of the assembly files which leads to a
        kernel oops when that code path is executed. Fix this"
      
      * 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm
      4581aa96
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a13511df
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Handle errors mid-stream of an all dump, from Alexey Kodanev.
      
       2) Fix build of openvswitch with certain combinations of netfilter
          options, from Arnd Bergmann.
      
       3) Fix interactions between GSO and BQL, from Eric Dumazet.
      
       4) Don't put a '/' in RTL8201F's sysfs file name, from Holger
          Hoffstätte.
      
       5) S390 qeth driver fixes from Julian Wiedmann.
      
       6) Allow ipv6 link local addresses for netconsole when both source and
          destination are link local, from Matwey V. Kornilov.
      
       7) Fix the BPF program address seen in /proc/kallsyms, from Song Liu.
      
       8) Initialize mutex before use in dsa microchip driver, from Tristram
          Ha.
      
       9) Out-of-bounds access in hns3, from Yunsheng Lin.
      
      10) Various netfilter fixes from Stefano Brivio, Jozsef Kadlecsik, Jiri
          Slaby, Florian Westphal, Eric Westbrook, Andrey Ryabinin, and Pablo
          Neira Ayuso.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (50 commits)
        net: alx: make alx_drv_name static
        net: bpfilter: fix iptables failure if bpfilter_umh is disabled
        sock_diag: fix autoloading of the raw_diag module
        net: core: netpoll: Enable netconsole IPv6 link local address
        ipv6: properly check return value in inet6_dump_all()
        rtnetlink: restore handling of dumpit return value in rtnl_dump_all()
        net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS
        bonding/802.3ad: fix link_failure_count tracking
        net: phy: realtek: fix RTL8201F sysfs name
        sctp: define SCTP_SS_DEFAULT for Stream schedulers
        sctp: fix strchange_flags name for Stream Change Event
        mlxsw: spectrum: Fix IP2ME CPU policer configuration
        openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS
        qed: fix link config error handling
        net: hns3: Fix for out-of-bounds access when setting pfc back pressure
        net/mlx4_en: use __netdev_tx_sent_queue()
        net: do not abort bulk send on BQL status
        net: bql: add __netdev_tx_sent_queue()
        s390/qeth: report 25Gbit link speed
        s390/qeth: sanitize ARP requests
        ...
      a13511df
    • Ard Biesheuvel's avatar
      ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm · 6282e916
      Ard Biesheuvel authored
      Due to what appears to be a copy/paste error, the opening ENTRY()
      of cpu_v7_hvc_switch_mm() lacks a matching ENDPROC(), and instead,
      the one for cpu_v7_smc_switch_mm() is duplicated.
      
      Given that it is ENDPROC() that emits the Thumb annotation, the
      cpu_v7_hvc_switch_mm() routine will be called in ARM mode on a
      Thumb2 kernel, resulting in the following splat:
      
        Internal error: Oops - undefined instruction: 0 [#1] SMP THUMB2
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-00030-g4d28ad89189d-dirty #488
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        PC is at cpu_v7_hvc_switch_mm+0x12/0x18
        LR is at flush_old_exec+0x31b/0x570
        pc : [<c0316efe>]    lr : [<c04117c7>]    psr: 00000013
        sp : ee899e50  ip : 00000000  fp : 00000001
        r10: eda28f34  r9 : eda31800  r8 : c12470e0
        r7 : eda1fc00  r6 : eda53000  r5 : 00000000  r4 : ee88c000
        r3 : c0316eec  r2 : 00000001  r1 : eda53000  r0 : 6da6c000
        Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      
      Note the 'ISA ARM' in the last line.
      
      Fix this by using the correct name in ENDPROC().
      
      Cc: <stable@vger.kernel.org>
      Fixes: 10115105 ("ARM: spectre-v2: add firmware based hardening")
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      6282e916
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · a422757e
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains the first batch of Netfilter fixes for
      your net tree:
      
      1) Fix splat with IPv6 defragmenting locally generated fragments,
         from Florian Westphal.
      
      2) Fix Incorrect check for missing attribute in nft_osf.
      
      3) Missing INT_MIN & INT_MAX definition for netfilter bridge uapi
         header, from Jiri Slaby.
      
      4) Revert map lookup in nft_numgen, this is already possible with
         the existing infrastructure without this extension.
      
      5) Fix wrong listing of set reference counter, make counter
         synchronous again, from Stefano Brivio.
      
      6) Fix CIDR 0 in hash:net,port,net, from Eric Westbrook.
      
      7) Fix allocation failure with large set, use kvcalloc().
         From Andrey Ryabinin.
      
      8) No need to disable BH when fetch ip set comment, patch from
         Jozsef Kadlecsik.
      
      9) Sanity check for valid sysfs entry in xt_IDLETIMER, from
         Taehee Yoo.
      
      10) Fix suspicious rcu usage via ip_set() macro at netlink dump,
          from Jozsef Kadlecsik.
      
      11) Fix setting default timeout via nfnetlink_cttimeout, this
          comes with preparation patch to add nf_{tcp,udp,...}_pernet()
          helper.
      
      12) Allow ebtables table nat to be of filter type via nft_compat.
          From Florian Westphal.
      
      13) Incorrect calculation of next bucket in early_drop, do no bump
          hash value, update bucket counter instead. From Vasily Khoruzhick.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a422757e
    • Rasmus Villemoes's avatar
      net: alx: make alx_drv_name static · 71311931
      Rasmus Villemoes authored
      alx_drv_name is not used outside main.c, so there's no reason for it to
      have external linkage.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71311931
    • Taehee Yoo's avatar
      net: bpfilter: fix iptables failure if bpfilter_umh is disabled · 97adadda
      Taehee Yoo authored
      When iptables command is executed, ip_{set/get}sockopt() try to upload
      bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko,
      command is failed.
      bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled.
      ip_{set/get}sockopt() only checks CONFIG_BPFILTER.
      So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled,
      iptables command is always failed.
      
      test config:
         CONFIG_BPFILTER=y
         # CONFIG_BPFILTER_UMH is not set
      
      test command:
         %iptables -L
         iptables: No chain/target/match by that name.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97adadda
    • Andrei Vagin's avatar
      sock_diag: fix autoloading of the raw_diag module · c34c1287
      Andrei Vagin authored
      IPPROTO_RAW isn't registred as an inet protocol, so
      inet_protos[protocol] is always NULL for it.
      
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Fixes: bf2ae2e4 ("sock_diag: request _diag module only when the family or proto has been registered")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c34c1287
    • Matwey V. Kornilov's avatar
      net: core: netpoll: Enable netconsole IPv6 link local address · d016b4a3
      Matwey V. Kornilov authored
      There is no reason to discard using source link local address when
      remote netconsole IPv6 address is set to be link local one.
      
      The patch allows administrators to use IPv6 netconsole without
      explicitly configuring source address:
      
          netconsole=@/,@fe80::5054:ff:fe2f:6012/
      Signed-off-by: default avatarMatwey V. Kornilov <matwey@sai.msu.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d016b4a3
    • Alexey Kodanev's avatar
      ipv6: properly check return value in inet6_dump_all() · e22d0bfa
      Alexey Kodanev authored
      Make sure we call fib6_dump_end() if it happens that skb->len
      is zero. rtnl_dump_all() can reset cb->args on the next loop
      iteration there.
      
      Fixes: 08e814c9 ("net/ipv6: Bail early if user only wants cloned entries")
      Fixes: ae677bbb ("net: Don't return invalid table id error when dumping all families")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22d0bfa
    • Alexey Kodanev's avatar
      rtnetlink: restore handling of dumpit return value in rtnl_dump_all() · 5e1acb4a
      Alexey Kodanev authored
      For non-zero return from dumpit() we should break the loop
      in rtnl_dump_all() and return the result. Otherwise, e.g.,
      we could get the memory leak in inet6_dump_fib() [1]. The
      pointer to the allocated struct fib6_walker there (saved
      in cb->args) can be lost, reset on the next iteration.
      
      Fix it by partially restoring the previous behavior before
      commit c63586dc ("net: rtnl_dump_all needs to propagate
      error from dumpit function"). The returned error from
      dumpit() is still passed further.
      
      [1]:
      unreferenced object 0xffff88001322a200 (size 96):
        comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
        hex dump (first 32 bytes):
          00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
          18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff  ..A6......A6....
        backtrace:
          [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
          [<000000007d12709f>] inet6_dump_fib+0x68d/0x940
          [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
          [<00000000d7cd302b>] netlink_dump+0x945/0x11a0
          [<000000002f43485f>] __netlink_dump_start+0x55d/0x800
          [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
          [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
          [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
          [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
          [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
          [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
          [<000000008c1b786f>] __sys_sendto+0x1e4/0x330
          [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
          [<00000000071f4d56>] do_syscall_64+0x9f/0x300
          [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<0000000057587684>] 0xffffffffffffffff
      
      Fixes: c63586dc ("net: rtnl_dump_all needs to propagate error from dumpit function")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1acb4a
  4. 05 Nov, 2018 5 commits
  5. 04 Nov, 2018 1 commit