Commits · 290af86629b25ffd1ed6232c4e9107da031705cb · nexedi / linux

09 Jan, 2018 2 commits

bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866

Alexei Starovoitov authored Jan 09, 2018

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

290af866

bpf: avoid false sharing of map refcount with max_entries · be95a845

Daniel Borkmann authored Jan 09, 2018

In addition to commit b2157399 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.htmlSigned-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

be95a845

08 Jan, 2018 1 commit

bpf: prevent out-of-bounds speculation · b2157399

Alexei Starovoitov authored Jan 07, 2018

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b2157399

06 Jan, 2018 2 commits

selftests/bpf: fix test_align · 2b36047e

Alexei Starovoitov authored Jan 05, 2018

since commit 82abbf8d the verifier rejects the bit-wise
arithmetic on pointers earlier.
The test 'dubious pointer arithmetic' now has less output to match on.
Adjust it.

Fixes: 82abbf8d ("bpf: do not allow root to mangle valid pointers")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2b36047e

bpf: sockmap missing NULL psock check · 5731a879

John Fastabend authored Jan 04, 2018

Add psock NULL check to handle a racing sock event that can get the
sk_callback_lock before this case but after xchg happens causing the
refcnt to hit zero and sock user data (psock) to be null and queued
for garbage collection.

Also add a comment in the code because this is a bit subtle and
not obvious in my opinion.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5731a879

05 Jan, 2018 4 commits

sh_eth: fix SH7757 GEther initialization · 51335502

Sergei Shtylyov authored Jan 04, 2018

Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
'sh_eth' driver can only reset and initialize TSU of the first controller
pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
'struct sh_eth_plat_data', however the platform code still never sets this
flag. I think  that we can infer this information from the 'devno' variable
(set  to 'platform_device::id') and reset/init the Ether controller pair
only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
removed...

Fixes: 150647fb ("net: sh_eth: change the condition of initialization")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51335502

Merge tag 'linux-can-fixes-for-4.15-20180104' of... · 3e6e867a

David S. Miller authored Jan 05, 2018

Merge tag 'linux-can-fixes-for-4.15-20180104' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2018-01-04

this is a pull request for net/master consisting of 4 patches.

The first patch is by Oliver Hartkopp, it improves the error checking
during the creation of a vxcan link. Wolfgang Grandegger's patch for the
gs_usb driver fixes the return value of the "set_bittiming" callback.
Luu An Phu provides a patch for the flexcan driver to fix the frame
length check in the flexcan_start_xmit() function. The last patch is by
Martin Lederhilger for the ems_usb driver and improves the error
reporting for error warning and passive frames.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3e6e867a

net: fec: free/restore resource in related probe error pathes · d1616f07

Fugang Duan authored Jan 04, 2018

Fixes in probe error path:
- Restore dev_id before failed_ioremap path.
  Fixes: ("net: fec: restore dev_id in the cases of probe error")
- Call of_node_put(phy_node) before failed_phy path.
  Fixes: ("net: fec: Support phys probed from devicetree and fixed-link")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1616f07

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f737be8d

David S. Miller authored Jan 05, 2018

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix chain filtering when dumping rules via nf_tables_dump_rules().

2) Fix accidental change in NF_CT_STATE_UNTRACKED_BIT through uapi,
   introduced when removing the untracked conntrack object, from
   Florian Westphal.

3) Fix potential nul-dereference when releasing dump filter in
   nf_tables_dump_obj_done(), patch from Hangbin Liu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f737be8d

04 Jan, 2018 15 commits

uapi/if_ether.h: prevent redefinition of struct ethhdr · 6926e041

Hauke Mehrtens authored Jan 03, 2018

Musl provides its own ethhdr struct definition. Add a guard to prevent
its definition of the appropriate musl header has already been included.

glibc does not implement this header, but when glibc will implement this
they can just define __UAPI_DEF_ETHHDR 0 to make it work with the
kernel.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6926e041

ipv6: fix general protection fault in fib6_add() · 7bbfe00e

Wei Wang authored Jan 03, 2018

In fib6_add(), pn could be NULL if fib6_add_1() failed to return a fib6
node. Checking pn != fn before accessing pn->leaf makes sure pn is not
NULL.
This fixes the following GPF reported by syzkaller:
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 3201 Comm: syzkaller001778 Not tainted 4.15.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fib6_add+0x736/0x15a0 net/ipv6/ip6_fib.c:1244
RSP: 0018:ffff8801c7626a70 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000020 RCX: ffffffff84794465
RDX: 0000000000000004 RSI: ffff8801d38935f0 RDI: 0000000000000282
RBP: ffff8801c7626da0 R08: 1ffff10038ec4c35 R09: 0000000000000000
R10: ffff8801c7626c68 R11: 0000000000000000 R12: 00000000fffffffe
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000009
FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:0000000009b70840
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000020be1000 CR3: 00000001d585a006 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1006
 ip6_route_multipath_add+0xd14/0x16c0 net/ipv6/route.c:3833
 inet6_rtm_newroute+0xdc/0x160 net/ipv6/route.c:3957
 rtnetlink_rcv_msg+0x733/0x1020 net/core/rtnetlink.c:4411
 netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4423
 netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
 sock_sendmsg_nosec net/socket.c:636 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:646
 sock_write_iter+0x31a/0x5d0 net/socket.c:915
 call_write_iter include/linux/fs.h:1772 [inline]
 do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
 do_iter_write+0x154/0x540 fs/read_write.c:932
 compat_writev+0x225/0x420 fs/read_write.c:1246
 do_compat_writev+0x115/0x220 fs/read_write.c:1267
 C_SYSC_writev fs/read_write.c:1278 [inline]
 compat_SyS_writev+0x26/0x30 fs/read_write.c:1274
 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
 do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:125
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7bbfe00e

RDS: null pointer dereference in rds_atomic_free_op · 7d11f77f

Mohamed Ghannam authored Jan 03, 2018

set rm->atomic.op_active to 0 when rds_pin_pages() fails
or the user supplied address is invalid,
this prevents a NULL pointer usage in rds_atomic_free_op()
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d11f77f

sh_eth: fix TSU resource handling · dfe8266b

Sergei Shtylyov authored Jan 03, 2018

When switching the driver to the managed device API, I managed to break
the case of a dual Ether devices sharing a single TSU: the 2nd Ether port
wouldn't probe. Iwamatsu-san has tried to fix this but his patch was buggy
and he then dropped the ball...

The solution is to limit calling devm_request_mem_region() to the first
of the two ports sharing the same TSU, so devm_ioremap_resource() can't
be used anymore for the TSU resource...

Fixes: d5e07e69 ("sh_eth: use managed device API")
Reported-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfe8266b

net: stmmac: enable EEE in MII, GMII or RGMII only · 879626e3

Jerome Brunet authored Jan 03, 2018

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6 ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

879626e3

rtnetlink: give a user socket to get_target_net() · f428fe4a

Andrei Vagin authored Jan 02, 2018

This function is used from two places: rtnl_dump_ifinfo and
rtnl_getlink. In rtnl_getlink(), we give a request skb into
get_target_net(), but in rtnl_dump_ifinfo, we give a response skb
into get_target_net().
The problem here is that NETLINK_CB() isn't initialized for the response
skb. In both cases we can get a user socket and give it instead of skb
into get_target_net().

This bug was found by syzkaller with this call-trace:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
RSP: 0018:ffff8801c880f348 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8443f900
RDX: 000000000000007b RSI: ffffffff86510f40 RDI: 00000000000003d8
RBP: ffff8801c880f360 R08: 0000000000000000 R09: 1ffff10039101e4f
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86510f40
R13: 000000000000000c R14: 0000000000000004 R15: 0000000000000011
FS:  0000000001a1a880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020151000 CR3: 00000001c9511005 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  netlink_ns_capable+0x26/0x30 net/netlink/af_netlink.c:886
  get_target_net+0x9d/0x120 net/core/rtnetlink.c:1765
  rtnl_dump_ifinfo+0x2e5/0xee0 net/core/rtnetlink.c:1806
  netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2222
  __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2319
  netlink_dump_start include/linux/netlink.h:214 [inline]
  rtnetlink_rcv_msg+0x7f0/0xb10 net/core/rtnetlink.c:4485
  netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2441
  rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
  netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
  netlink_unicast+0x4be/0x6a0 net/netlink/af_netlink.c:1334
  netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897

Cc: Jiri Benc <jbenc@redhat.com>
Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f428fe4a

MAINTAINERS: Update my email address. · fb32dd3a

Pravin B Shelar authored Jan 02, 2018

Signed-off-by: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb32dd3a

Merge tag 'mac80211-for-davem-2018-01-04' of... · af8530cb

David S. Miller authored Jan 04, 2018

Merge tag 'mac80211-for-davem-2018-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two fixes:
 * drop mesh frames appearing to be from ourselves
 * check another netlink attribute for existence
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

af8530cb

can: ems_usb: improve error reporting for error warning and error passive · 6ebc5e8f

Martin Lederhilger authored Dec 21, 2017

This patch adds the missing CAN_ERR_CRTL to cf->can_id in case of
CAN_STATE_ERROR_WARNING or CAN_STATE_ERROR_PASSIVE
Signed-off-by: Martin Lederhilger <m.lederhilger@ds-automotion.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

6ebc5e8f

can: flex_can: Correct the checking for frame length in flexcan_start_xmit() · 13454c14

Luu An Phu authored Jan 02, 2018

The flexcan_start_xmit() function compares the frame length with data
register length to write frame content into data[0] and data[1]
register. Data register length is 4 bytes and frame maximum length is 8
bytes.

Fix the check that compares frame length with 3. Because the register
length is 4.
Signed-off-by: Luu An Phu <phu.luuan@nxp.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

13454c14

can: gs_usb: fix return value of the "set_bittiming" callback · d5b42e66

Wolfgang Grandegger authored Dec 13, 2017

The "set_bittiming" callback treats a positive return value as error!
For that reason "can_changelink()" will quit silently after setting
the bittiming values without processing ctrlmode, restart-ms, etc.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

d5b42e66

can: vxcan: improve handling of missing peer name attribute · b4c2951a

Oliver Hartkopp authored Dec 02, 2017

Picking up the patch from Serhey Popovych (commit 191cdb38,
"veth: Be more robust on network device creation when no attributes").

When the peer name attribute is not provided the former implementation tries
to register the given device name twice ... which leads to -EEXIST.
If only one device name is given apply an automatic generated and valid name
for the peer.

Cc: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

b4c2951a

net: dsa: b53: Turn off Broadcom tags for more switches · 54e98b5d

Florian Fainelli authored Jan 03, 2018

Models such as BCM5395/97/98 and BCM53125/24/53115 and compatible require that
we turn on managed mode to actually act on Broadcom tags, otherwise they just
pass them through on ingress (host -> switch) and don't insert them in egress
(switch -> host). Turning on managed mode is simple, but requires us to
properly support ARL misses on multicast addresses which is a much more
involved set of changes not suitable for a bug fix for this release.
Reported-by: Jochen Friedrich <jochen@scram.de>
Fixes: 7edc58d6 ("net: dsa: b53: Turn on Broadcom tags")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54e98b5d

mac80211: mesh: drop frames appearing to be from us · 736a80bb

Johannes Berg authored Jan 04, 2018

If there are multiple mesh stations with the same MAC address,
they will both get confused and start throwing warnings.

Obviously in this case nothing can actually work anyway, so just
drop frames that look like they're from ourselves early on.
Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

736a80bb

nl80211: Check for the required netlink attribute presence · 3ea15452

Hao Chen authored Jan 03, 2018

nl80211_nan_add_func() does not check if the required attribute
NL80211_NAN_FUNC_FOLLOW_UP_DEST is present when processing
NL80211_CMD_ADD_NAN_FUNCTION request. This request can be issued
by users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attribute presence.
Signed-off-by: Hao Chen <flank3rsky@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

3ea15452

03 Jan, 2018 16 commits

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 820d1d5e

David S. Miller authored Jan 03, 2018

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-01-03

This series contains fixes for i40e and i40evf.

Amritha removes the UDP support for big buffer cloud filters since it is
not supported and having UDP enabled is a bug.

Alex fixes a bug in the __i40e_chk_linearize() which did not take into
account large (16K or larger) fragments that are split over 2 descriptors,
which could result in a transmit hang.

Jake fixes an issue where a devices own MAC address could be removed from
the unicast address list, so force a check on every address sync to ensure
removal does not happen.

Jiri Pirko fixes the return value when a filter configuration is not
supported, do not return "invalid" but return "not supported" so that
the core can react correctly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

820d1d5e

3c59x: fix missing dma_mapping_error check and bad ring refill logic · ee4aa8df

Neil Horman authored Jan 03, 2018

A few spots in 3c59x missed calls to dma_mapping_error checks, casuing
WARN_ONS to trigger.  Clean those up.  While we're at it, refactor the
refill code a bit so that if skb allocation or dma mapping fails, we
recycle the existing buffer.  This prevents holes in the rx ring, and
makes for much simpler logic

Note: This is compile only tested.  Ted, if you could run this and
confirm that it continues to work properly, I would appreciate it, as I
currently don't have access to this hardware
Signed-off-by: Neil Horman <nhorman@redhat.com>
CC: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: tedheadster@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>

ee4aa8df

Merge branch 'ena-fixes' · 74c88af5

David S. Miller authored Jan 03, 2018

Netanel Belgazal says:

====================
bug fixes for ENA Ethernet driver

Changes from V1:
Revome incorrect "ena: invoke netif_carrier_off() only after netdev
  registered" patch

This patchset contains 2 bug fixes:
* handle rare race condition during MSI-X initialization
* fix error processing in ena_down()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

74c88af5

net: ena: fix error handling in ena_down() sequence · ee4552aa

Netanel Belgazal authored Jan 03, 2018

ENA admin command queue errors are not handled as part of ena_down().
As a result, in case of error admin queue transitions to non-running
state and aborts all subsequent commands including those coming from
ena_up(). Reset scheduled by the driver from the timer service
context would not proceed due to sharing rtnl with ena_up()/ena_down()
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee4552aa

net: ena: unmask MSI-X only after device initialization is completed · 7853b49c

Netanel Belgazal authored Jan 03, 2018

Under certain conditions MSI-X interrupt might arrive right after it
was unmasked in ena_up(). There is a chance it would be processed by
the driver before device ENA_FLAG_DEV_UP flag is set. In such a case
the interrupt is ignored.
ENA device operates in auto-masked mode, therefore ignoring
interrupt leaves it masked for good.
Moving unmask of interrupt to be the last step in ena_up().
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7853b49c

cxgb4: Fix FW flash errors · 15962a18

Arjun Vynipadath authored Jan 03, 2018

commit 96ac18f1 ("cxgb4: Add support for new flash parts")
removed initialization of adapter->params.sf_fw_start causing issues
while flashing firmware to card. We no longer need sf_fw_start
in adapter->params as we already have macros defined for FW flash
addresses.

Fixes: 96ac18f1 ("cxgb4: Add support for new flash parts")
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15962a18

i40e: flower: Fix return value for unsupported offload · bc4244c6

Jiri Pirko authored Dec 22, 2017

When filter configuration is not supported, drivers should return
-EOPNOTSUPP so the core can react correctly.

Fixes: 2f4b411a ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

bc4244c6

i40e: don't remove netdev->dev_addr when syncing uc list · 458867b2

Jacob Keller authored Dec 20, 2017

In some circumstances, such as with bridging, it is possible that the
stack will add a devices own MAC address to its unicast address list.

If, later, the stack deletes this address, then the i40e driver will
receive a request to remove this address.

The driver stores its current MAC address as part of the MAC/VLAN hash
array, since it is convenient and matches exactly how the hardware
expects to be told which traffic to receive.

This causes a problem, since for more devices, the MAC address is stored
separately, and requests to delete a unicast address should not have the
ability to remove the filter for the MAC address.

Fix this by forcing a check on every address sync to ensure we do not
remove the device address.

There is a very narrow possibility of a race between .set_mac and
.set_rx_mode, if we don't change netdev->dev_addr before updating our
internal MAC list in .set_mac. This might be possible if .set_rx_mode is
going to remove MAC "XYZ" from the list, at the same time as .set_mac
changes our dev_addr to MAC "XYZ", we might possibly queue a delete,
then an add in .set_mac, then queue a delete in .set_rx_mode's
dev_uc_sync and then update netdev->dev_addr. We can avoid this by
moving the copy into dev_addr prior to the changes to the MAC filter
list.

A similar race on the other side does not cause problems, as if we're
changing our MAC form A to B, and we race with .set_rx_mode, it could
queue a delete from A, we'd update our address, and allow the delete.
This seems like a race, but in reality we're about to queue a delete of
A anyways, so it would not cause any issues.

A race in the initialization code is unlikely because the netdevice has
not yet been fully initialized and the stack should not be adding or
removing addresses yet.

Note that we don't (yet) need similar code for the VF driver because it
does not make use of __dev_uc_sync and __dev_mc_sync, but instead roles
its own method for handling updates to the MAC/VLAN list, which already
has code to protect against removal of the hardware address.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

458867b2

i40e/i40evf: Account for frags split over multiple descriptors in check linearize · 248de22e

Alexander Duyck authored Dec 08, 2017

The original code for __i40e_chk_linearize didn't take into account the
fact that if a fragment is 16K in size or larger it has to be split over 2
descriptors and the smaller of those 2 descriptors will be on the trailing
edge of the transmit. As a result we can get into situations where we didn't
catch requests that could result in a Tx hang.

This patch takes care of that by subtracting the length of all but the
trailing edge of the stale fragment before we test for sum. By doing this
we can guarantee that we have all cases covered, including the case of a
fragment that spans multiple descriptors. We don't need to worry about
checking the inner portions of this since 12K is the maximum aligned DMA
size and that is larger than any MSS will ever be since the MTU limit for
jumbos is something on the order of 9K.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

248de22e

Merge branch 'fec-clean-up-in-the-cases-of-probe-error' · 5f0850e1

David S. Miller authored Jan 03, 2018

Fugang Duan says:

====================
net: fec: clean up in the cases of probe error

The simple patches just clean up in the cases of probe error like restore dev_id and
handle the defer probe when regulator is still not ready.

v2:
* Fabio Estevam's comment to suggest split v1 to separate patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5f0850e1

net: fec: defer probe if regulator is not ready · 3f38c683

Fugang Duan authored Jan 03, 2018

Defer probe if regulator is not ready. E.g. some regulator is fixed
regulator controlled by i2c expander gpio, the i2c device may be probed
after the driver, then it should handle the case of defer probe error.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f38c683

net: fec: restore dev_id in the cases of probe error · e90f686b

Fugang Duan authored Jan 03, 2018

The static variable dev_id always plus one before netdev registerred.
It should restore the dev_id value in the cases of probe error.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e90f686b

i40e: Remove UDP support for big buffer · 64e711ca

Amritha Nambiar authored Nov 17, 2017

Since UDP based filters are not supported via big buffer cloud
filters, remove UDP support.  Also change a few return types to
indicate unsupported vs invalid configuration.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

64e711ca

vxlan: trivial indenting fix. · f1c8d372

William Tu authored Jan 02, 2018

Fix indentation of reserved_flags2 field in vxlanhdr_gpe.

Fixes: e1e5314d ("vxlan: implement GPE")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1c8d372

sctp: fix error path in sctp_stream_init · 79d08951

Marcelo Ricardo Leitner authored Jan 02, 2018

syzbot noticed a NULL pointer dereference panic in sctp_stream_free()
which was caused by an incomplete error handling in sctp_stream_init().
By not clearing stream->outcnt, it made a for() in sctp_stream_free()
think that it had elements to free, but not, leading to the panic.

As suggested by Xin Long, this patch also simplifies the error path by
moving it to the only if() that uses it.

See-also: https://www.spinics.net/lists/netdev/msg473756.html
See-also: https://www.spinics.net/lists/netdev/msg465024.htmlReported-by: syzbot <syzkaller@googlegroups.com>
Fixes: f952be79 ("sctp: introduce struct sctp_stream_out_ext")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79d08951

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · ba779198

David S. Miller authored Jan 03, 2018

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-01-02

This series contains fixes for e1000 and e1000e.

Tushar Dave adds a check to the driver so that it won't attempt to disable a
device that is already disabled for e1000.

Benjamin Poirier provides a fix to e1000e, where a previous commit that
Benjamin submitted changed the meaning of the return value for
"check_for_link" for copper media and not all the instances were properly
updated.  Benjamin fixes the remaining instances that needed the change.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ba779198