1. 18 May, 2021 17 commits
  2. 17 May, 2021 3 commits
    • Kumar Kartikeya Dwivedi's avatar
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Add low level TC-BPF management API · 715c5ce4
      Kumar Kartikeya Dwivedi authored
      This adds functions that wrap the netlink API used for adding, manipulating,
      and removing traffic control filters.
      
      The API summary:
      
      A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
      This means that creating a hook leads to creation of the backing qdisc,
      while destruction either removes all filters attached to a hook, or destroys
      qdisc if requested explicitly (as discussed below).
      
      The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
      query, and detach tc filters. All functions return 0 on success, and a
      negative error code on failure.
      
      bpf_tc_hook_create - Create a hook
      Parameters:
      	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
      		proper enum constant. Note that parent must be unset when
      		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
      		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
      		valid value for attach_point.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_hook_destroy - Destroy a hook
      Parameters:
      	@hook - Cannot be NULL. The behaviour depends on value of
      		attach_point. If BPF_TC_INGRESS, all filters attached to
      		the ingress hook will be detached. If BPF_TC_EGRESS, all
      		filters attached to the egress hook will be detached. If
      		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
      		deleted, also detaching all filters. As before, parent must
      		be unset for these attach_points, and set for BPF_TC_CUSTOM.
      
      		It is advised that if the qdisc is operated on by many programs,
      		then the program at least check that there are no other existing
      		filters before deleting the clsact qdisc. An example is shown
      		below:
      
      		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
      				    .attach_point = BPF_TC_INGRESS);
      		/* set opts as NULL, as we're not really interested in
      		 * getting any info for a particular filter, but just
      	 	 * detecting its presence.
      		 */
      		r = bpf_tc_query(&hook, NULL);
      		if (r == -ENOENT) {
      			/* no filters */
      			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
      			return bpf_tc_hook_destroy(&hook);
      		} else {
      			/* failed or r == 0, the latter means filters do exist */
      			return r;
      		}
      
      		Note that there is a small race between checking for no
      		filters and deleting the qdisc. This is currently unavoidable.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_attach - Attach a filter to a hook
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		attached to. Requirements for ifindex and attach_point are
      		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
      		is also supported.  In that case, parent must be set to the
      		handle where the filter will be attached (using BPF_TC_PARENT).
      		E.g. to set parent to 1:16 like in tc command line, the
      		equivalent would be BPF_TC_PARENT(1, 16).
      
      	@opts - Cannot be NULL. The following opts are optional:
      		* handle   - The handle of the filter
      		* priority - The priority of the filter
      			     Must be >= 0 and <= UINT16_MAX
      		Note that when left unset, they will be auto-allocated by
      		the kernel. The following opts must be set:
      		* prog_fd - The fd of the loaded SCHED_CLS prog
      		The following opts must be unset:
      		* prog_id - The ID of the BPF prog
      		The following opts are optional:
      		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
      			  allows replacing an existing filter instead of
      			  failing with -EEXIST.
      		The following opts will be filled by bpf_tc_attach on a
      		successful attach operation if they are unset:
      		* handle   - The handle of the attached filter
      		* priority - The priority of the attached filter
      		* prog_id  - The ID of the attached SCHED_CLS prog
      		This way, the user can know what the auto allocated values
      		for optional opts like handle and priority are for the newly
      		attached filter, if they were unset.
      
      		Note that some other attributes are set to fixed default
      		values listed below (this holds for all bpf_tc_* APIs):
      		protocol as ETH_P_ALL, direct action mode, chain index of 0,
      		and class ID of 0 (this can be set by writing to the
      		skb->tc_classid field from the BPF program).
      
      bpf_tc_detach
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		detached from. Requirements are same as described above
      		in bpf_tc_attach.
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      
      bpf_tc_query
      Parameters:
      	@hook - Cannot be NULL. Represents the hook where the filter lookup will
      		be performed. Requirements are same as described above in
      		bpf_tc_attach().
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      		The following fields will be filled by bpf_tc_query upon a
      		successful lookup:
      		* prog_id
      
      Some usage examples (using BPF skeleton infrastructure):
      
      BPF program (test_tc_bpf.c):
      
      	#include <linux/bpf.h>
      	#include <bpf/bpf_helpers.h>
      
      	SEC("classifier")
      	int cls(struct __sk_buff *skb)
      	{
      		return 0;
      	}
      
      Userspace loader:
      
      	struct test_tc_bpf *skel = NULL;
      	int fd, r;
      
      	skel = test_tc_bpf__open_and_load();
      	if (!skel)
      		return -ENOMEM;
      
      	fd = bpf_program__fd(skel->progs.cls);
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
      			    if_nametoindex("lo"), .attach_point =
      			    BPF_TC_INGRESS);
      	/* Create clsact qdisc */
      	r = bpf_tc_hook_create(&hook);
      	if (r < 0)
      		goto end;
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &opts);
      	if (r < 0)
      		goto end;
      	/* Print the auto allocated handle and priority */
      	printf("Handle=%u", opts.handle);
      	printf("Priority=%u", opts.priority);
      
      	opts.prog_fd = opts.prog_id = 0;
      	bpf_tc_detach(&hook, &opts);
      end:
      	test_tc_bpf__destroy(skel);
      
      This is equivalent to doing the following using tc command line:
        # tc qdisc add dev lo clsact
        # tc filter add dev lo ingress bpf obj foo.o sec classifier da
        # tc filter del dev lo ingress handle <h> prio <p> bpf
      ... where the handle and priority can be found using:
        # tc filter show dev lo ingress
      
      Another example replacing a filter (extending prior example):
      
      	/* We can also choose both (or one), let's try replacing an
      	 * existing filter.
      	 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
      			    opts.handle, .priority = opts.priority,
      			    .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &replace_opts);
      	if (r == -EEXIST) {
      		/* Expected, now use BPF_TC_F_REPLACE to replace it */
      		replace_opts.flags = BPF_TC_F_REPLACE;
      		return bpf_tc_attach(&hook, &replace_opts);
      	} else if (r < 0) {
      		return r;
      	}
      	/* There must be no existing filter with these
      	 * attributes, so cleanup and return an error.
      	 */
      	replace_opts.prog_fd = replace_opts.prog_id = 0;
      	bpf_tc_detach(&hook, &replace_opts);
      	return -1;
      
      To obtain info of a particular filter:
      
      	/* Find info for filter with handle 1 and priority 50 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
      			    .priority = 50);
      	r = bpf_tc_query(&hook, &info_opts);
      	if (r == -ENOENT)
      		printf("Filter not found");
      	else if (r < 0)
      		return r;
      	printf("Prog ID: %u", info_opts.prog_id);
      	return 0;
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
      [ Daniel: also did major patch cleanup ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
      715c5ce4
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Add various netlink helpers · 8bbb77b7
      Kumar Kartikeya Dwivedi authored
      This change introduces a few helpers to wrap open coded attribute
      preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
      is useful to wrap send + recv handling in a generic way. Subsequent patch
      will also use this function for sending and receiving a netlink response.
      The libbpf_nl_get_link() helper has been removed instead, moving socket
      creation into the newly named libbpf_netlink_send_recv().
      
      Every nested attribute's closure must happen using the helper
      nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
      enforced using nlattr_begin_nested() helper. Other simple attributes
      can be added directly.
      
      The maxsz parameter corresponds to the size of the request structure
      which is being filled in, so for instance with req being:
      
        struct {
      	struct nlmsghdr nh;
      	struct tcmsg t;
      	char buf[4096];
        } req;
      
      Then, maxsz should be sizeof(req).
      
      This change also converts the open coded attribute preparation with these
      helpers. Note that the only failure the internal call to nlattr_add()
      could result in the nested helper would be -EMSGSIZE, hence that is what
      we return to our caller.
      
      The libbpf_netlink_send_recv() call takes care of opening the socket,
      sending the netlink message, receiving the response, potentially invoking
      callbacks, and return errors if any, and then finally close the socket.
      This allows users to avoid identical socket setup code in different places.
      The only user of libbpf_nl_get_link() has been converted to make use of it.
      __bpf_set_link_xdp_fd_replace() has also been refactored to use it.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      [ Daniel: major patch cleanup ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
      8bbb77b7
  3. 14 May, 2021 3 commits
  4. 12 May, 2021 2 commits
  5. 11 May, 2021 6 commits
  6. 10 May, 2021 4 commits
  7. 08 May, 2021 5 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · b7415964
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix to avoid over-allocating the kernel's mapping on !MMU systems,
         which could lead to up to 2MiB of lost memory
      
       - The SiFive address extension errata only manifest on rv64, they are
         now disabled on rv32 where they are unnecessary
      
       - A pair of late-landing cleanups
      
      * tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: remove unused handle_exception symbol
        riscv: Consistify protect_kernel_linear_mapping_text_rodata() use
        riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y
        riscv: Only extend kernel reservation if mapped read-only
      b7415964
    • Linus Torvalds's avatar
      drm/i915/display: fix compiler warning about array overrun · fec4d427
      Linus Torvalds authored
      intel_dp_check_mst_status() uses a 14-byte array to read the DPRX Event
      Status Indicator data, but then passes that buffer at offset 10 off as
      an argument to drm_dp_channel_eq_ok().
      
      End result: there are only 4 bytes remaining of the buffer, yet
      drm_dp_channel_eq_ok() wants a 6-byte buffer.  gcc-11 correctly warns
      about this case:
      
        drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_check_mst_status’:
        drivers/gpu/drm/i915/display/intel_dp.c:3491:22: warning: ‘drm_dp_channel_eq_ok’ reading 6 bytes from a region of size 4 [-Wstringop-overread]
         3491 |                     !drm_dp_channel_eq_ok(&esi[10], intel_dp->lane_count)) {
              |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        drivers/gpu/drm/i915/display/intel_dp.c:3491:22: note: referencing argument 1 of type ‘const u8 *’ {aka ‘const unsigned char *’}
        In file included from drivers/gpu/drm/i915/display/intel_dp.c:38:
        include/drm/drm_dp_helper.h:1466:6: note: in a call to function ‘drm_dp_channel_eq_ok’
         1466 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
              |      ^~~~~~~~~~~~~~~~~~~~
             6:14 elapsed
      
      This commit just extends the original array by 2 zero-initialized bytes,
      avoiding the warning.
      
      There may be some underlying bug in here that caused this confusion, but
      this is at least no worse than the existing situation that could use
      random data off the stack.
      
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fec4d427
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 07db0563
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is a set of minor fixes in various drivers (qla2xxx, ufs,
        scsi_debug, lpfc) one doc fix and a fairly large update to the fnic
        driver to remove the open coded iteration functions in favour of the
        scsi provided ones"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: fnic: Use scsi_host_busy_iter() to traverse commands
        scsi: fnic: Kill 'exclude_id' argument to fnic_cleanup_io()
        scsi: scsi_debug: Fix cmd_per_lun, set to max_queue
        scsi: ufs: core: Narrow down fast path in system suspend path
        scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
        scsi: ufs: core: Do not put UFS power into LPM if link is broken
        scsi: qla2xxx: Prevent PRLI in target mode
        scsi: qla2xxx: Add marginal path handling support
        scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found
        scsi: ufs: core: Fix a typo in ufs-sysfs.c
        scsi: lpfc: Fix bad memory access during VPD DUMP mailbox command
        scsi: lpfc: Fix DMA virtual address ptr assignment in bsg
        scsi: lpfc: Fix illegal memory access on Abort IOCBs
        scsi: blk-mq: Fix build warning when making htmldocs
      07db0563
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 0f979d81
      Linus Torvalds authored
      Pull more Kbuild updates from Masahiro Yamada:
      
       - Convert sh and sparc to use generic shell scripts to generate the
         syscall headers
      
       - refactor .gitignore files
      
       - Update kernel/config_data.gz only when the content of the .config
         is really changed, which avoids the unneeded re-link of vmlinux
      
       - move "remove stale files" workarounds to scripts/remove-stale-files
      
       - suppress unused-but-set-variable warnings by default for Clang
         as well
      
       - fix locale setting LANG=C to LC_ALL=C
      
       - improve 'make distclean'
      
       - always keep intermediate objects from scripts/link-vmlinux.sh
      
       - move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
      
       - misc cleanups
      
      * tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
        linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
        kbuild: Don't remove link-vmlinux temporary files on exit/signal
        kbuild: remove the unneeded comments for external module builds
        kbuild: make distclean remove tag files in sub-directories
        kbuild: make distclean work against $(objtree) instead of $(srctree)
        kbuild: refactor modname-multi by using suffix-search
        kbuild: refactor fdtoverlay rule
        kbuild: parameterize the .o part of suffix-search
        arch: use cross_compiling to check whether it is a cross build or not
        kbuild: remove ARCH=sh64 support from top Makefile
        .gitignore: prefix local generated files with a slash
        kbuild: replace LANG=C with LC_ALL=C
        Makefile: Move -Wno-unused-but-set-variable out of GCC only block
        kbuild: add a script to remove stale generated files
        kbuild: update config_data.gz only when the content of .config is changed
        .gitignore: ignore only top-level modules.builtin
        .gitignore: move tags and TAGS close to other tag files
        kernel/.gitgnore: remove stale timeconst.h and hz.bc
        usr/include: refactor .gitignore
        genksyms: fix stale comment
        ...
      0f979d81
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ab159ac5
      Linus Torvalds authored
      Pull powerpc updates and fixes from Michael Ellerman:
       "A bit of a mixture of things, tying up some loose ends.
      
        There's the removal of the nvlink code, which dependend on a commit in
        the vfio tree. Then the enablement of huge vmalloc which was in next
        for a few weeks but got dropped due to conflicts. And there's also a
        few fixes.
      
        Summary:
      
         - Remove the nvlink support now that it's only user has been removed.
      
         - Enable huge vmalloc mappings for Radix MMU (P9).
      
         - Fix KVM conversion to gfn-based MMU notifier callbacks.
      
         - Fix a kexec/kdump crash with hot plugged CPUs.
      
         - Fix boot failure on 32-bit with CONFIG_STACKPROTECTOR.
      
         - Restore alphabetic order of the selects under CONFIG_PPC.
      
        Thanks to: Christophe Leroy, Christoph Hellwig, Nicholas Piggin,
        Sandipan Das, and Sourabh Jain"
      
      * tag 'powerpc-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
        powerpc/kconfig: Restore alphabetic order of the selects under CONFIG_PPC
        powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
        powerpc/powernv/memtrace: Fix dcache flushing
        powerpc/kexec_file: Use current CPU info while setting up FDT
        powerpc/64s/radix: Enable huge vmalloc mappings
        powerpc/powernv: remove the nvlink support
      ab159ac5