1. 24 Aug, 2021 16 commits
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Add basic infrastructure for XDP samples · 156f886c
      Kumar Kartikeya Dwivedi authored
      This file implements some common helpers to consolidate differences in
      features and functionality between the various XDP samples and give them
      a consistent look, feel, and reporting capabilities.
      
      This commit only adds support for receive statistics, which does not
      rely on any tracepoint, but on the XDP program installed on the device
      by each XDP redirect sample.
      
      Some of the key features are:
       * A concise output format accompanied by helpful text explaining its
         fields.
       * An elaborate output format building upon the concise one, and folding
         out details in case of errors and staying out of view otherwise.
       * Printing driver names for devices redirecting packets.
       * Getting mac address for interface.
       * Printing summarized total statistics for the entire session.
       * Ability to dynamically switch between concise and verbose mode, using
         SIGQUIT (Ctrl + \).
      
      In later patches, the support will be extended for each tracepoint with
      its own custom output in concise and verbose mode.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
      156f886c
    • Kumar Kartikeya Dwivedi's avatar
      tools: include: Add ethtool_drvinfo definition to UAPI header · f2e85d4a
      Kumar Kartikeya Dwivedi authored
      Instead of copying the whole header in, just add the struct definitions
      we need for now. In the future it can be synced as a copy of in-tree
      header if required.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-3-memxor@gmail.com
      f2e85d4a
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Fix a couple of warnings · 50b796e6
      Kumar Kartikeya Dwivedi authored
      cookie_uid_helper_example.c: In function ‘main’:
      cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive
      	writing 10 bytes into a region of size between 8 and 58
      	[-Wformat-overflow=]
        178 |  sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
            |								       ^~~~~~~~~~
      /home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note:
      	‘sprintf’ output between 53 and 103 bytes into a destination of size 100
        178 |  sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
            |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        179 |         file);
            |         ~~~~~
      
      Fix by using snprintf and a sufficiently sized buffer.
      
      tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of
      	size 11 [-Wstringop-overread]
         35 |         key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
            |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Use size as 11.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
      50b796e6
    • Andrey Ignatov's avatar
      bpf: Fix possible out of bound write in narrow load handling · d7af7e49
      Andrey Ignatov authored
      Fix a verifier bug found by smatch static checker in [0].
      
      This problem has never been seen in prod to my best knowledge. Fixing it
      still seems to be a good idea since it's hard to say for sure whether
      it's possible or not to have a scenario where a combination of
      convert_ctx_access() and a narrow load would lead to an out of bound
      write.
      
      When narrow load is handled, one or two new instructions are added to
      insn_buf array, but before it was only checked that
      
      	cnt >= ARRAY_SIZE(insn_buf)
      
      And it's safe to add a new instruction to insn_buf[cnt++] only once. The
      second try will lead to out of bound write. And this is what can happen
      if `shift` is set.
      
      Fix it by making sure that if the BPF_RSH instruction has to be added in
      addition to BPF_AND then there is enough space for two more instructions
      in insn_buf.
      
      The full report [0] is below:
      
      kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
      kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
      
      kernel/bpf/verifier.c
          12282
          12283 			insn->off = off & ~(size_default - 1);
          12284 			insn->code = BPF_LDX | BPF_MEM | size_code;
          12285 		}
          12286
          12287 		target_size = 0;
          12288 		cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
          12289 					 &target_size);
          12290 		if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      Bounds check.
      
          12291 		    (ctx_field_size && !target_size)) {
          12292 			verbose(env, "bpf verifier is misconfigured\n");
          12293 			return -EINVAL;
          12294 		}
          12295
          12296 		if (is_narrower_load && size < target_size) {
          12297 			u8 shift = bpf_ctx_narrow_access_offset(
          12298 				off, size, size_default) * 8;
          12299 			if (ctx_field_size <= 4) {
          12300 				if (shift)
          12301 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
                                                               ^^^^^
      increment beyond end of array
      
          12302 									insn->dst_reg,
          12303 									shift);
      --> 12304 				insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
                                                       ^^^^^
      out of bounds write
      
          12305 								(1 << size * 8) - 1);
          12306 			} else {
          12307 				if (shift)
          12308 					insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
          12309 									insn->dst_reg,
          12310 									shift);
          12311 				insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
                                              ^^^^^^^^^^^^^^^
      Same.
      
          12312 								(1ULL << size * 8) - 1);
          12313 			}
          12314 		}
          12315
          12316 		new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
          12317 		if (!new_prog)
          12318 			return -ENOMEM;
          12319
          12320 		delta += cnt - 1;
          12321
          12322 		/* keep walking new program and skip insns we just inserted */
          12323 		env->prog = new_prog;
          12324 		insn      = new_prog->insnsi + i + delta;
          12325 	}
          12326
          12327 	return 0;
          12328 }
      
      [0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/
      
      v1->v2:
      - clarify that problem was only seen by static checker but not in prod;
      
      Fixes: 46f53a65 ("bpf: Allow narrow loads with offset > 0")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
      d7af7e49
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG' · f63693e3
      Alexei Starovoitov authored
      Xu Liu says:
      
      ====================
      
      We'd like to be able to identify netns from sk_msg hooks
      to accelerate local process communication form different netns.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f63693e3
    • Xu Liu's avatar
      selftests/bpf: Test for get_netns_cookie · 6cbca1ee
      Xu Liu authored
      Add test to use get_netns_cookie() from BPF_PROG_TYPE_SK_MSG.
      Signed-off-by: default avatarXu Liu <liuxu623@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210820071712.52852-3-liuxu623@gmail.com
      6cbca1ee
    • Xu Liu's avatar
      bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG · fab60e29
      Xu Liu authored
      We'd like to be able to identify netns from sk_msg hooks
      to accelerate local process communication form different netns.
      Signed-off-by: default avatarXu Liu <liuxu623@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com
      fab60e29
    • Alexei Starovoitov's avatar
      Merge branch 'selftests/bpf: minor fixups' · 8c0bb89e
      Alexei Starovoitov authored
      Li Zhijian says:
      
      ====================
      Fix a few issues reported by 0Day/LKP during runing selftests/bpf.
      
      Changelog:
      V2:
      - folded previous similar standalone patch to [1/5], and add acked tag
        from Song Liu
      - add acked tag to [2/5], [3/5] from Song Liu
      - [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED
      are generated by make. Otherwise, it will break out-of-tree install:
      'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install'
      - [5/5]: new patch
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8c0bb89e
    • Li Zhijian's avatar
      selftests/bpf: Exit with KSFT_SKIP if no Makefile found · 00e11160
      Li Zhijian authored
      This would happend when we run the tests after install kselftests
       root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh
       TAP version 13
       1..1
       # selftests: bpf: test_doc_build.sh
       perl: warning: Setting locale failed.
       perl: warning: Please check that your locale settings:
               LANGUAGE = (unset),
               LC_ALL = (unset),
               LC_ADDRESS = "en_US.UTF-8",
               LC_NAME = "en_US.UTF-8",
               LC_MONETARY = "en_US.UTF-8",
               LC_PAPER = "en_US.UTF-8",
               LC_IDENTIFICATION = "en_US.UTF-8",
               LC_TELEPHONE = "en_US.UTF-8",
               LC_MEASUREMENT = "en_US.UTF-8",
               LC_TIME = "en_US.UTF-8",
               LC_NUMERIC = "en_US.UTF-8",
               LANG = "en_US.UTF-8"
           are supported and installed on your system.
       perl: warning: Falling back to the standard locale ("C").
       # skip:    bpftool files not found!
       #
       ok 1 selftests: bpf: test_doc_build.sh # SKIP
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210820025549.28325-1-lizhijian@cn.fujitsu.com
      00e11160
    • Li Zhijian's avatar
      selftests/bpf: Add missing files required by test_bpftool.sh for installing · 404bd9ff
      Li Zhijian authored
      test_bpftool.sh relies on bpftool and test_bpftool.py.
      
      'make install' will install bpftool to INSTALL_PATH/bpf/bpftool, and
      export it to PATH so that it can be used after installing.
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210820015556.23276-5-lizhijian@cn.fujitsu.com
      404bd9ff
    • Li Zhijian's avatar
      selftests/bpf: Add default bpftool built by selftests to PATH · 7a3bdca2
      Li Zhijian authored
      For 'make run_tests':
      selftests will build bpftool into tools/testing/selftests/bpf/tools/sbin/bpftool
      by default.
      
      ==================
      root@lkp-skl-d01 /opt/rootfs/v5.14-rc4# make -C tools/testing/selftests/bpf run_tests
      make: Entering directory '/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf'
        MKDIR    include
        MKDIR    libbpf
        MKDIR    bpftool
      [...]
        GEN     /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/profiler.skel.h
        CC      /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/prog.o
        GEN     /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pid_iter.skel.h
        CC      /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pids.o
        LINK    /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/bpftool
        INSTALL bpftool
        GEN      vmlinux.h
      [...]
       # test_feature_dev_json (test_bpftool.TestBpftool) ... ERROR
       # test_feature_kernel (test_bpftool.TestBpftool) ... ERROR
       # test_feature_kernel_full (test_bpftool.TestBpftool) ... ERROR
       # test_feature_kernel_full_vs_not_full (test_bpftool.TestBpftool) ... ERROR
       # test_feature_macros (test_bpftool.TestBpftool) ... Error: bug: failed to retrieve CAP_BPF status: Invalid argument
       # ERROR
       #
       # ======================================================================
       # ERROR: test_feature_dev_json (test_bpftool.TestBpftool)
       # ----------------------------------------------------------------------
       # Traceback (most recent call last):
       #   File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 57, in wrapper
       #     return f(*args, iface, **kwargs)
       #   File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 82, in test_feature_dev_json
       #     res = bpftool_json(["feature", "probe", "dev", iface])
       #   File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 42, in bpftool_json
       #     res = _bpftool(args)
       #   File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 34, in _bpftool
       #     return subprocess.check_output(_args)
       #   File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
       #     **kwargs).stdout
       #   File "/usr/lib/python3.7/subprocess.py", line 487, in run
       #     output=stdout, stderr=stderr)
       # subprocess.CalledProcessError: Command '['bpftool', '-j', 'feature', 'probe', 'dev', 'dummy0']' returned non-zero exit status 255.
       #
      ==================
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210820015556.23276-4-lizhijian@cn.fujitsu.com
      7a3bdca2
    • Li Zhijian's avatar
      selftests/bpf: Make test_doc_build.sh work from script directory · 5a980b5b
      Li Zhijian authored
      Previously, it fails as below:
      -------------
      root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh
      ++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh
      + SCRIPT_REL_PATH=test_doc_build.sh
      ++ dirname test_doc_build.sh
      + SCRIPT_REL_DIR=.
      ++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../
      + KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4
      + cd /opt/rootfs/v5.14-rc4
      + for tgt in docs docs-clean
      + make -s -C /opt/rootfs/v5.14-rc4/. docs
      make: *** No rule to make target 'docs'.  Stop.
      + for tgt in docs docs-clean
      + make -s -C /opt/rootfs/v5.14-rc4/. docs-clean
      make: *** No rule to make target 'docs-clean'.  Stop.
      -----------
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210820015556.23276-3-lizhijian@cn.fujitsu.com
      5a980b5b
    • Li Zhijian's avatar
      selftests/bpf: Enlarge select() timeout for test_maps · 2d82d73d
      Li Zhijian authored
      0Day robot observed that it's easily timeout on a heavy load host.
      -------------------
       # selftests: bpf: test_maps
       # Fork 1024 tasks to 'test_update_delete'
       # Fork 1024 tasks to 'test_update_delete'
       # Fork 100 tasks to 'test_hashmap'
       # Fork 100 tasks to 'test_hashmap_percpu'
       # Fork 100 tasks to 'test_hashmap_sizes'
       # Fork 100 tasks to 'test_hashmap_walk'
       # Fork 100 tasks to 'test_arraymap'
       # Fork 100 tasks to 'test_arraymap_percpu'
       # Failed sockmap unexpected timeout
       not ok 3 selftests: bpf: test_maps # exit=1
       # selftests: bpf: test_lru_map
       # nr_cpus:8
      -------------------
      Since this test will be scheduled by 0Day to a random host that could have
      only a few cpus(2-8), enlarge the timeout to avoid a false NG report.
      
      In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps',
      and knew 10S is likely enough, but i still perfer to a larger value 30.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com
      2d82d73d
    • Yucong Sun's avatar
      selftests/bpf: Reduce flakyness in timer_mim · a6258837
      Yucong Sun authored
      This patch extends wait time in timer_mim. As observed in slow CI environment,
      it is possible to have interrupt/preemption long enough to cause the test to
      fail, almost 1 failure in 5 runs.
      Signed-off-by: default avatarYucong Sun <fallentree@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210823213629.3519641-1-fallentree@fb.com
      a6258837
    • Alexei Starovoitov's avatar
      Merge branch 'Refactor cgroup_bpf internals to use more specific attach_type' · 4ed589a2
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      
      The cgroup_bpf struct has a few arrays (effective, progs, and flags) of
      size MAX_BPF_ATTACH_TYPE. These are meant to separate progs by their
      attach type, currently represented by the bpf_attach_type enum.
      
      There are some bpf_attach_type values which are not valid attach types
      for cgroup bpf programs. Programs with these attach types will never be
      handled by cgroup_bpf_{attach,detach} and thus will never be held in
      cgroup_bpf structs. Even if such programs did make it into their
      reserved slot in those arrays, they would never be executed.
      
      Accordingly we can migrate to a new internal cgroup_bpf-specific enum
      for these arrays, saving some bytes per cgroup and making it more
      obvious which BPF programs belong there. netns_bpf_attach_type is an
      existing example of this pattern, let's do similar for cgroup_bpf.
      
      v1->v2: Address Daniel's comments
      	* Reverse xmas tree ordering for def changes
      	* Helper macro to reduce to_cgroup_bpf_attach_type boilerplate
      		* checkpatch.pl complains: "ERROR: Macros with complex values should
      		be enclosed in parentheses". Found some existing macros (do 'git grep
      		"define case"') which get same complaint. Think it's fine to keep
      		as-is since it's immediately undef'd.
      	* Remove CG_BPF_ prefix from cgroup_bpf_attach_type
      		* Although I agree that the prefix is redundant, the de-prefixed
      		names feel a bit too 'general' given the internal use of the enum.
      		e.g. when someone sees CGROUP_INET6_BIND it's not obvious that it
      		should only be used in certain ways internally.
      		* Don't feel strongly about this, just my thoughts as a noob to the
      		internals.
      	* Rebase onto latest bpf-next/master
      		* No significant conflicts, some small boilerplate adjustments
      		needed to catch up to Andrii's "bpf: Refactor BPF_PROG_RUN_ARRAY
      		family of macros into functions" change
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4ed589a2
    • Dave Marchevsky's avatar
      bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum · 6fc88c35
      Dave Marchevsky authored
      Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
      attach types and a function to map bpf_attach_type values to the new
      enum. Inspired by netns_bpf_attach_type.
      
      Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
      possible.  Functionality is unchanged as attach_type_to_prog_type
      switches in bpf/syscall.c were preventing non-cgroup programs from
      making use of the invalid cgroup_bpf array slots.
      
      As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
      arrays were sized using MAX_BPF_ATTACH_TYPE.
      
      bpf_cgroup_storage is notably not migrated as struct
      bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
      member which is not meant to be opaque. Similarly, bpf_cgroup_link
      continues to report its bpf_attach_type member to userspace via fdinfo
      and bpf_link_info.
      
      To ease disambiguation, bpf_attach_type variables are renamed from
      'type' to 'atype' when changed to cgroup_bpf_attach_type.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com
      6fc88c35
  2. 23 Aug, 2021 1 commit
    • Jiang Wang's avatar
      af_unix: Fix NULL pointer bug in unix_shutdown · d359902d
      Jiang Wang authored
      Commit 94531cfc ("af_unix: Add unix_stream_proto for sockmap")
      introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the
      unhash function will call prot->unhash(), which is NULL for SEQPACKET.
      And kernel will panic. On ARM32, it will show following messages: (it
      likely affects x86 too).
      
      Fix the bug by checking the prot->unhash is NULL or not first.
      
      Kernel log:
      <--- cut here ---
       Unable to handle kernel NULL pointer dereference at virtual address
      00000000
       pgd = 2fba1ffb
       *pgd=00000000
       Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2
       Modules linked in:
       CPU: 1 PID: 1999 Comm: falkon Tainted: G        W
      5.14.0-rc5-01175-g94531cfc-dirty #9240
       Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
       PC is at 0x0
       LR is at unix_shutdown+0x81/0x1a8
       pc : [<00000000>]    lr : [<c08f3311>]    psr: 600f0013
       sp : e45aff70  ip : e463a3c0  fp : beb54f04
       r10: 00000125  r9 : e45ae000  r8 : c4a56664
       r7 : 00000001  r6 : c4a56464  r5 : 00000001  r4 : c4a56400
       r3 : 00000000  r2 : c5a6b180  r1 : 00000000  r0 : c4a56400
       Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
       Control: 50c5387d  Table: 05aa804a  DAC: 00000051
       Register r0 information: slab PING start c4a56400 pointer offset 0
       Register r1 information: NULL pointer
       Register r2 information: slab task_struct start c5a6b180 pointer offset 0
       Register r3 information: NULL pointer
       Register r4 information: slab PING start c4a56400 pointer offset 0
       Register r5 information: non-paged memory
       Register r6 information: slab PING start c4a56400 pointer offset 100
       Register r7 information: non-paged memory
       Register r8 information: slab PING start c4a56400 pointer offset 612
       Register r9 information: non-slab/vmalloc memory
       Register r10 information: non-paged memory
       Register r11 information: non-paged memory
       Register r12 information: slab filp start e463a3c0 pointer offset 0
       Process falkon (pid: 1999, stack limit = 0x9ec48895)
       Stack: (0xe45aff70 to 0xe45b0000)
       ff60:                                     e45ae000 c5f26a00 00000000 00000125
       ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4
       ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc
       ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04
       ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000
       [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50)
       [<c07f7fa3>] (__sys_shutdown) from [<c010024b>]
      (__sys_trace_return+0x1/0x16)
       Exception stack(0xe45affa8 to 0xe45afff0)
      
      Fixes: 94531cfc ("af_unix: Add unix_stream_proto for sockmap")
      Reported-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: default avatarJiang Wang <jiang.wang@bytedance.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com
      d359902d
  3. 19 Aug, 2021 7 commits
    • Prankur Gupta's avatar
      selftests/bpf: Add tests for {set|get} socket option from setsockopt BPF · f2a6ee92
      Prankur Gupta authored
      Adding selftests for the newly added functionality to call bpf_setsockopt()
      and bpf_getsockopt() from setsockopt BPF programs.
      
      Test Details:
      
      1. BPF Program
      
         Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt
         If the cca for the socket is not cubic do nothing
         If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case)
         then change the cc from cubic to reno
      
      2. User Space Program
      
         Creates an AF_INET6 socket and set the cca for that to be "cubic"
         Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt
         Verify the cca for the socket changed to reno
      Signed-off-by: default avatarPrankur Gupta <prankgup@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210817224221.3257826-3-prankgup@fb.com
      f2a6ee92
    • Prankur Gupta's avatar
      bpf: Add support for {set|get} socket options from setsockopt BPF · 2c531639
      Prankur Gupta authored
      Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF
      programs. An example use case is when the user sets the IPV6_TCLASS socket
      option, we would also like to change the tcp-cc for that socket.
      
      We don't have any use case for calling bpf_setsockopt() from supposedly read-
      only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only
      at this point.
      Signed-off-by: default avatarPrankur Gupta <prankgup@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210817224221.3257826-2-prankgup@fb.com
      2c531639
    • Stanislav Fomichev's avatar
      bpf: Use kvmalloc for map keys in syscalls · 44779a4b
      Stanislav Fomichev authored
      Same as previous patch but for the keys. memdup_bpfptr is renamed
      to kvmemdup_bpfptr (and converted to kvmalloc).
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com
      44779a4b
    • Stanislav Fomichev's avatar
      bpf: Use kvmalloc for map values in syscall · f0dce1d9
      Stanislav Fomichev authored
      Use kvmalloc/kvfree for temporary value when manipulating a map via
      syscall. kmalloc might not be sufficient for percpu maps where the value
      is big (and further multiplied by hundreds of CPUs).
      
      Can be reproduced with netcnt test on qemu with "-smp 255".
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com
      f0dce1d9
    • Yucong Sun's avatar
      selftests/bpf: Adding delay in socketmap_listen to reduce flakyness · 3666b167
      Yucong Sun authored
      This patch adds a 1ms delay to reduce flakyness of the test.
      Signed-off-by: default avatarYucong Sun <fallentree@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819163609.2583758-1-fallentree@fb.com
      3666b167
    • Yonghong Song's avatar
      bpf: Fix NULL event->prog pointer access in bpf_overflow_handler · 594286b7
      Yonghong Song authored
      Andrii reported that libbpf CI hit the following oops when
      running selftest send_signal:
        [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
        [ 1243.161066] #PF: supervisor read access in kernel mode
        [ 1243.161066] #PF: error_code(0x0000) - not-present page
        [ 1243.161066] PGD 0 P4D 0
        [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
        [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
        [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
        [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
        [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
        [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
        [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
        [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
        [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
        [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
        [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
        [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
        [ 1243.161066] Call Trace:
        [ 1243.161066]  <IRQ>
        [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
        [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
        [ 1243.161066]  ? __lock_acquire+0x378/0x2730
        [ 1243.161066]  ? __lock_acquire+0x372/0x2730
        [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
        [ 1243.161066]  ? find_held_lock+0x2b/0x80
        [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
        [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
        [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
        [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
        [ 1243.161066]  hrtimer_interrupt+0x110/0x220
        [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
        [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
        [ 1243.161066]  </IRQ>
        [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
        [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
        [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
        [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
        [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
        [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
        [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
        [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
        [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
        [ 1243.161066]  ? finish_task_switch+0xab/0x250
        [ 1243.161066]  ? finish_task_switch+0x70/0x250
        [ 1243.161066]  __schedule+0x36b/0xbb0
        [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
        [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
        [ 1243.161066]  schedule+0x43/0xe0
        [ 1243.161066]  pipe_read+0x30b/0x450
        [ 1243.161066]  ? wait_woken+0x80/0x80
        [ 1243.161066]  new_sync_read+0x164/0x170
        [ 1243.161066]  vfs_read+0x122/0x1b0
        [ 1243.161066]  ksys_read+0x93/0xd0
        [ 1243.161066]  do_syscall_64+0x35/0x80
        [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The oops can also be reproduced with the following steps:
        ./vmtest.sh -s
        # at qemu shell
        cd /root/bpf && while true; do ./test_progs -t send_signal
      
      Further analysis showed that the failure is introduced with
      commit b89fbfbb ("bpf: Implement minimal BPF perf link").
      With the above commit, the following scenario becomes possible:
          cpu1                        cpu2
                                      hrtimer_interrupt -> bpf_overflow_handler
          (due to closing link_fd)
          bpf_perf_link_release ->
          perf_event_free_bpf_prog ->
          perf_event_free_bpf_handler ->
            WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler)
            event->prog = NULL
                                      bpf_prog_run(event->prog, &ctx)
      
      In the above case, the event->prog is NULL for bpf_prog_run, hence
      causing oops.
      
      To fix the issue, check whether event->prog is NULL or not. If it
      is, do not call bpf_prog_run. This seems working as the above
      reproducible step runs more than one hour and I didn't see any
      failures.
      
      Fixes: b89fbfbb ("bpf: Implement minimal BPF perf link")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com
      594286b7
    • Daniel Borkmann's avatar
      bpf: Undo off-by-one in interpreter tail call count limit · f9dabe01
      Daniel Borkmann authored
      The BPF interpreter as well as x86-64 BPF JIT were both in line by allowing
      up to 33 tail calls (however odd that number may be!). Recently, this was
      changed for the interpreter to reduce it down to 32 with the assumption that
      this should have been the actual limit "which is in line with the behavior of
      the x86 JITs" according to b61a28cf ("bpf: Fix off-by-one in tail call
      count limiting").
      
      Paul recently reported:
      
        I'm a bit surprised by this because I had previously tested the tail call
        limit of several JIT compilers and found it to be 33 (i.e., allowing chains
        of up to 34 programs). I've just extended a test program I had to validate
        this again on the x86-64 JIT, and found a limit of 33 tail calls again [1].
      
        Also note we had previously changed the RISC-V and MIPS JITs to allow up to
        33 tail calls [2, 3], for consistency with other JITs and with the interpreter.
        We had decided to increase these two to 33 rather than decrease the other
        JITs to 32 for backward compatibility, though that probably doesn't matter
        much as I'd expect few people to actually use 33 tail calls.
      
        [1] https://github.com/pchaigno/tail-call-bench/commit/ae7887482985b4b1745c9b2ef7ff9ae506c82886
        [2] 96bc4432 ("bpf, riscv: Limit to 33 tail calls")
        [3] e49e6f6d ("bpf, mips: Limit to 33 tail calls")
      
      Therefore, revert b61a28cf to re-align interpreter to limit a maximum of
      33 tail calls. While it is unlikely to hit the limit for the vast majority,
      programs in the wild could one way or another depend on this, so lets rather
      be a bit more conservative, and lets align the small remainder of JITs to 33.
      If needed in future, this limit could be slightly increased, but not decreased.
      
      Fixes: b61a28cf ("bpf: Fix off-by-one in tail call count limiting")
      Reported-by: default avatarPaul Chaignon <paul@cilium.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/CAO5pjwTWrC0_dzTbTHFPSqDwA56aVH+4KFGVqdq8=ASs0MqZGQ@mail.gmail.com
      f9dabe01
  4. 18 Aug, 2021 3 commits
  5. 17 Aug, 2021 13 commits