1. 22 Aug, 2020 1 commit
    • Nikolay Aleksandrov's avatar
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov authored
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeaac363
  2. 21 Aug, 2020 3 commits
  3. 20 Aug, 2020 16 commits
  4. 19 Aug, 2020 11 commits
    • Wang Hai's avatar
      net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe() · cf96d977
      Wang Hai authored
      Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way,
      when probe fails, netdev can be freed automatically.
      
      Fixes: 4d5ae32f ("net: ethernet: Add a driver for Gemini gigabit ethernet")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf96d977
    • Sebastian Andrzej Siewior's avatar
      net: atlantic: Use readx_poll_timeout() for large timeout · 9553b62c
      Sebastian Andrzej Siewior authored
      Commit
         8dcf2ad3 ("net: atlantic: add hwmon getter for MAC temperature")
      
      implemented a read callback with an udelay(10000U). This fails to
      compile on ARM because the delay is >1ms. I doubt that it is needed to
      spin for 10ms even if possible on x86.
      
      >From looking at the code, the context appears to be preemptible so using
      usleep() should work and avoid busy spinning.
      
      Use readx_poll_timeout() in the poll loop.
      
      Fixes: 8dcf2ad3 ("net: atlantic: add hwmon getter for MAC temperature")
      Cc: Mark Starovoytov <mstarovoitov@marvell.com>
      Cc: Igor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9553b62c
    • Min Li's avatar
      ptp: ptp_clockmatrix: use i2c_master_send for i2c write · 957ff427
      Min Li authored
      The old code for i2c write would break on some controllers, which fails
      at handling Repeated Start Condition. So we will just use i2c_master_send
      to handle write in one transanction.
      
      Changes since v1:
      - Remove indentation change
      Signed-off-by: default avatarMin Li <min.li.xe@renesas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      957ff427
    • Johannes Berg's avatar
      netlink: fix state reallocation in policy export · d1fb5559
      Johannes Berg authored
      Evidently, when I did this previously, we didn't have more than
      10 policies and didn't run into the reallocation path, because
      it's missing a memset() for the unused policies. Fix that.
      
      Fixes: d07dcf9a ("netlink: add infrastructure to expose policies to userspace")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1fb5559
    • David S. Miller's avatar
      Merge branch 'Bug-fixes-for-ENA-ethernet-driver' · b4c8998b
      David S. Miller authored
      Shay Agroskin says:
      
      ====================
      Bug fixes for ENA ethernet driver
      
      This series adds the following:
      - Fix undesired call to ena_restore after returning from suspend
      - Fix condition inside a WARN_ON
      - Fix overriding previous value when updating missed_tx statistic
      
      v1->v2:
      - fix bug when calling reset routine after device resources are freed (Jakub)
      
      v2->v3:
      - fix wrong hash in 'Fixes' tag
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4c8998b
    • Shay Agroskin's avatar
      net: ena: Make missed_tx stat incremental · ccd143e5
      Shay Agroskin authored
      Most statistics in ena driver are incremented, meaning that a stat's
      value is a sum of all increases done to it since driver/queue
      initialization.
      
      This patch makes all statistics this way, effectively making missed_tx
      statistic incremental.
      Also added a comment regarding rx_drops and tx_drops to make it
      clearer how these counters are calculated.
      
      Fixes: 11095fdb ("net: ena: add statistics for missed tx packets")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccd143e5
    • Shay Agroskin's avatar
      net: ena: Change WARN_ON expression in ena_del_napi_in_range() · 8b147f6f
      Shay Agroskin authored
      The ena_del_napi_in_range() function unregisters the napi handler for
      rings in a given range.
      This function had the following WARN_ON macro:
      
          WARN_ON(ENA_IS_XDP_INDEX(adapter, i) &&
      	    adapter->ena_napi[i].xdp_ring);
      
      This macro prints the call stack if the expression inside of it is
      true [1], but the expression inside of it is the wanted situation.
      The expression checks whether the ring has an XDP queue and its index
      corresponds to a XDP one.
      
      This patch changes the expression to
          !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring
      which indicates an unwanted situation.
      
      Also, change the structure of the function. The napi handler is
      unregistered for all rings, and so there's no need to check whether the
      index is an XDP index or not. By removing this check the code becomes
      much more readable.
      
      Fixes: 548c4940 ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b147f6f
    • Shay Agroskin's avatar
      net: ena: Prevent reset after device destruction · 63d4a4c1
      Shay Agroskin authored
      The reset work is scheduled by the timer routine whenever it
      detects that a device reset is required (e.g. when a keep_alive signal
      is missing).
      When releasing device resources in ena_destroy_device() the driver
      cancels the scheduling of the timer routine without destroying the reset
      work explicitly.
      
      This creates the following bug:
          The driver is suspended and the ena_suspend() function is called
      	-> This function calls ena_destroy_device() to free the net device
      	   resources
      	    -> The driver waits for the timer routine to finish
      	    its execution and then cancels it, thus preventing from it
      	    to be called again.
      
          If, in its final execution, the timer routine schedules a reset,
          the reset routine might be called afterwards,and a redundant call to
          ena_restore_device() would be made.
      
      By changing the reset routine we allow it to read the device's state
      accurately.
      This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
      before resetting the device and making both the destruction function and
      the flag check are under rtnl lock.
      The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
      routine. Also surround the flag check with 'likely' because
      we expect that the reset routine would be called only when
      ENA_FLAG_TRIGGER_RESET flag is set.
      
      The destruction of the timer and reset services in __ena_shutoff() have to
      stay, even though the timer routine is destroyed in ena_destroy_device().
      This is to avoid a case in which the reset routine is scheduled after
      free_netdev() in __ena_shutoff(), which would create an access to freed
      memory in adapter->flags.
      
      Fixes: 8c5c7abd ("net: ena: add power management ops to the ENA driver")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63d4a4c1
    • Yonghong Song's avatar
      bpftool: Handle EAGAIN error code properly in pids collection · 00fa1d83
      Yonghong Song authored
      When the error code is EAGAIN, the kernel signals the user
      space should retry the read() operation for bpf iterators.
      Let us do it.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200818222312.2181675-1-yhs@fb.com
      00fa1d83
    • Yonghong Song's avatar
      bpf: Avoid visit same object multiple times · e60572b8
      Yonghong Song authored
      Currently when traversing all tasks, the next tid
      is always increased by one. This may result in
      visiting the same task multiple times in a
      pid namespace.
      
      This patch fixed the issue by seting the next
      tid as pid_nr_ns(pid, ns) + 1, similar to
      funciton next_tgid().
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/bpf/20200818222310.2181500-1-yhs@fb.com
      e60572b8
    • Yonghong Song's avatar
      bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator · e679654a
      Yonghong Song authored
      In our production system, we observed rcu stalls when
      'bpftool prog` is running.
        rcu: INFO: rcu_sched self-detected stall on CPU
        rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
        \x09(t=21031 jiffies g=2534773 q=179750)
        NMI backtrace for cpu 7
        CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
        Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
        Call Trace:
        <IRQ>
        dump_stack+0x57/0x70
        nmi_cpu_backtrace.cold+0x14/0x53
        ? lapic_can_unplug_cpu.cold+0x39/0x39
        nmi_trigger_cpumask_backtrace+0xb7/0xc7
        rcu_dump_cpu_stacks+0xa2/0xd0
        rcu_sched_clock_irq.cold+0x1ff/0x3d9
        ? tick_nohz_handler+0x100/0x100
        update_process_times+0x5b/0x90
        tick_sched_timer+0x5e/0xf0
        __hrtimer_run_queues+0x12a/0x2a0
        hrtimer_interrupt+0x10e/0x280
        __sysvec_apic_timer_interrupt+0x51/0xe0
        asm_call_on_stack+0xf/0x20
        </IRQ>
        sysvec_apic_timer_interrupt+0x6f/0x80
        asm_sysvec_apic_timer_interrupt+0x12/0x20
        RIP: 0010:task_file_seq_get_next+0x71/0x220
        Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
        RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
        RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
        RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
        RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
        R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
        R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
        task_file_seq_next+0x52/0xa0
        bpf_seq_read+0xb9/0x320
        vfs_read+0x9d/0x180
        ksys_read+0x5f/0xe0
        do_syscall_64+0x38/0x60
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f8815f4f76e
        Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
        RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
        RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
        RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
        RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
        R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
        R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e
      
      Note that `bpftool prog` actually calls a task_file bpf iterator
      program to establish an association between prog/map/link/btf anon
      files and processes.
      
      In the case where the above rcu stall occured, we had a process
      having 1587 tasks and each task having roughly 81305 files.
      This implied 129 million bpf prog invocations. Unfortunwtely none of
      these files are prog/map/link/btf files so bpf iterator/prog needs
      to traverse all these files and not able to return to user space
      since there are no seq_file buffer overflow.
      
      This patch fixed the issue in bpf_seq_read() to limit the number
      of visited objects. If the maximum number of visited objects is
      reached, no more objects will be visited in the current syscall.
      If there is nothing written in the seq_file buffer, -EAGAIN will
      return to the user so user can try again.
      
      The maximum number of visited objects is set at 1 million.
      In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
      visiting 1 million files takes around 0.18 seconds.
      
      We did not use cond_resched() since for some iterators, e.g.,
      netlink iterator, where rcu read_lock critical section spans between
      consecutive seq_ops->next(), which makes impossible to do cond_resched()
      in the key while loop of function bpf_seq_read().
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/bpf/20200818222309.2181348-1-yhs@fb.com
      e679654a
  5. 18 Aug, 2020 9 commits
    • Colin Ian King's avatar
      net: ipv4: remove duplicate "the the" phrase in Kconfig text · ad664118
      Colin Ian King authored
      The Kconfig help text contains the phrase "the the" in the help
      text. Fix this and reformat the block of help text.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad664118
    • Colin Ian King's avatar
      net: mscc: ocelot: remove duplicate "the the" phrase in Kconfig text · 17340552
      Colin Ian King authored
      The Kconfig help text contains the phrase "the the" in the help
      text. Fix this.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17340552
    • David S. Miller's avatar
      Merge branch 'ethtool-netlink-bug-fixes' · 0df55a03
      David S. Miller authored
      Maxim Mikityanskiy says:
      
      ====================
      ethtool-netlink bug fixes
      
      This series contains a few bug fixes for ethtool-netlink. These bugs are
      specific for the netlink interface, and the legacy ioctl interface is
      not affected. These patches aim to have the same behavior in
      ethtool-netlink as in the legacy ethtool.
      
      Please also see the sibling series for the userspace tool.
      
      v2 changes: Added Fixes tags.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0df55a03
    • Maxim Mikityanskiy's avatar
      ethtool: Don't omit the netlink reply if no features were changed · f01204ec
      Maxim Mikityanskiy authored
      The legacy ethtool userspace tool shows an error when no features could
      be changed. It's useful to have a netlink reply to be able to show this
      error when __netdev_update_features wasn't called, for example:
      
      1. ethtool -k eth0
         large-receive-offload: off
      2. ethtool -K eth0 rx-fcs on
      3. ethtool -K eth0 lro on
         Could not change any device features
         rx-lro: off [requested on]
      4. ethtool -K eth0 lro on
         # The output should be the same, but without this patch the kernel
         # doesn't send the reply, and ethtool is unable to detect the error.
      
      This commit makes ethtool-netlink always return a reply when requested,
      and it still avoids unnecessary calls to __netdev_update_features if the
      wanted features haven't changed.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f01204ec
    • Maxim Mikityanskiy's avatar
      ethtool: Account for hw_features in netlink interface · 2847bfed
      Maxim Mikityanskiy authored
      ethtool-netlink ignores dev->hw_features and may confuse the drivers by
      asking them to enable features not in the hw_features bitmask. For
      example:
      
      1. ethtool -k eth0
         tls-hw-tx-offload: off [fixed]
      2. ethtool -K eth0 tls-hw-tx-offload on
         tls-hw-tx-offload: on
      3. ethtool -k eth0
         tls-hw-tx-offload: on [fixed]
      
      Fitler out dev->hw_features from req_wanted to fix it and to resemble
      the legacy ethtool behavior.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2847bfed
    • Maxim Mikityanskiy's avatar
      ethtool: Fix preserving of wanted feature bits in netlink interface · 840110a4
      Maxim Mikityanskiy authored
      Currently, ethtool-netlink calculates new wanted bits as:
      (req_wanted & req_mask) | (old_active & ~req_mask)
      
      It completely discards the old wanted bits, so they are forgotten with
      the next ethtool command. Sample steps to reproduce:
      
      1. ethtool -k eth0
         tx-tcp-segmentation: on # TSO is on from the beginning
      2. ethtool -K eth0 tx off
         tx-tcp-segmentation: off [not requested]
      3. ethtool -k eth0
         tx-tcp-segmentation: off [requested on]
      4. ethtool -K eth0 rx off # Some change unrelated to TSO
      5. ethtool -k eth0
         tx-tcp-segmentation: off # "Wanted on" is forgotten
      
      This commit fixes it by changing the formula to:
      (req_wanted & req_mask) | (old_wanted & ~req_mask),
      where old_active was replaced by old_wanted to account for the wanted
      bits.
      
      The shortcut condition for the case where nothing was changed now
      compares wanted bitmasks, instead of wanted to active.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      840110a4
    • Xin Long's avatar
      ipv6: some fixes for ipv6_dev_find() · 4ef1a7cb
      Xin Long authored
      This patch is to do 3 things for ipv6_dev_find():
      
        As David A. noticed,
      
        - rt6_lookup() is not really needed. Different from __ip_dev_find(),
          ipv6_dev_find() doesn't have a compatibility problem, so remove it.
      
        As Hideaki suggested,
      
        - "valid" (non-tentative) check for the address is also needed.
          ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
          traverse the address hash list, but it's heavy to be called
          inside ipv6_dev_find(). This patch is to reuse the code of
          ipv6_chk_addr_and_flags() for ipv6_dev_find().
      
        - dev parameter is passed into ipv6_dev_find(), as link-local
          addresses from user space has sin6_scope_id set and the dev
          lookup needs it.
      
      Fixes: 81f6cb31 ("ipv6: add ipv6_dev_find()")
      Suggested-by: default avatarYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ef1a7cb
    • Jiri Wiesner's avatar
      bonding: fix active-backup failover for current ARP slave · 0410d071
      Jiri Wiesner authored
      When the ARP monitor is used for link detection, ARP replies are
      validated for all slaves (arp_validate=3) and fail_over_mac is set to
      active, two slaves of an active-backup bond may get stuck in a state
      where both of them are active and pass packets that they receive to
      the bond. This state makes IPv6 duplicate address detection fail. The
      state is reached thus:
      1. The current active slave goes down because the ARP target
         is not reachable.
      2. The current ARP slave is chosen and made active.
      3. A new slave is enslaved. This new slave becomes the current active
         slave and can reach the ARP target.
      As a result, the current ARP slave stays active after the enslave
      action has finished and the log is littered with "PROBE BAD" messages:
      > bond0: PROBE: c_arp ens10 && cas ens11 BAD
      The workaround is to remove the slave with "going back" status from
      the bond and re-enslave it. This issue was encountered when DPDK PMD
      interfaces were being enslaved to an active-backup bond.
      
      I would be possible to fix the issue in bond_enslave() or
      bond_change_active_slave() but the ARP monitor was fixed instead to
      keep most of the actions changing the current ARP slave in the ARP
      monitor code. The current ARP slave is set as inactive and backup
      during the commit phase. A new state, BOND_LINK_FAIL, has been
      introduced for slaves in the context of the ARP monitor. This allows
      administrators to see how slaves are rotated for sending ARP requests
      and attempts are made to find a new active slave.
      
      Fixes: b2220cad ("bonding: refactor ARP active-backup monitor")
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0410d071
    • Miaohe Lin's avatar
      net: handle the return value of pskb_carve_frag_list() correctly · eabe8618
      Miaohe Lin authored
      pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear().
      we should handle this correctly or we would get wrong sk_buff.
      
      Fixes: 6fa01ccd ("skbuff: Add pskb_extract() helper function")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eabe8618