1. 13 Oct, 2023 3 commits
  2. 12 Oct, 2023 7 commits
    • Martin KaFai Lau's avatar
      Merge branch 'Add cgroup sockaddr hooks for unix sockets' · d2dc885b
      Martin KaFai Lau authored
      Daan De Meyer says:
      
      ====================
      Changes since v10:
      
      * Removed extra check from bpf_sock_addr_set_sun_path() again in favor of
        calling unix_validate_addr() everywhere in af_unix.c before calling the hooks.
      
      Changes since v9:
      
      * Renamed bpf_sock_addr_set_unix_addr() to bpf_sock_addr_set_sun_path() and
        rennamed arguments to match the new name.
      * Added an extra check to bpf_sock_addr_set_sun_path() to disallow changing the
        address of an unnamed unix socket.
      * Removed unnecessary NULL check on uaddrlen in
        __cgroup_bpf_run_filter_sock_addr().
      
      Changes since v8:
      
      * Added missing test programs to last patch
      
      Changes since v7:
      
      * Fixed formatting nit in comment
      * Renamed from cgroup/connectun to cgroup/connect_unix (and similar for all
        other hooks)
      
      Changes since v6:
      
      * Actually removed bpf_bind() helper for AF_UNIX hooks.
      * Fixed merge conflict
      * Updated comment to mention uaddrlen is read-only for AF_INET[6]
      * Removed unnecessary forward declaration of struct sock_addr_test
      * Removed unused BPF_CGROUP_RUN_PROG_UNIX_CONNECT()
      * Fixed formatting nit reported by checkpatch
      * Added more information to commit message about recvmsg() on connected socket
      
      Changes since v5:
      
      * Fixed kernel version in bpftool documentation (6.3 => 6.7).
      * Added connection mode socket recvmsg() test.
      * Removed bpf_bind() helper for AF_UNIX hooks.
      * Added missing getpeernameun and getsocknameun BPF test programs.
      * Added note for bind() test being unused currently.
      
      Changes since v4:
      
      * Dropped support for intercepting bind() as when using bind() with unix sockets
        and a pathname sockaddr, bind() will create an inode in the filesystem that
        needs to be cleaned up. If the address is rewritten, users might try to clean
        up the wrong file and leak the actual socket file in the filesystem.
      * Changed bpf_sock_addr_set_unix_addr() to use BTF_KFUNC_HOOK_CGROUP_SKB instead
        of BTF_KFUNC_HOOK_COMMON.
      * Removed unix socket related changes from BPF_CGROUP_PRE_CONNECT_ENABLED() as
        unix sockets do not support pre-connect.
      * Added tests for getpeernameun and getsocknameun hooks.
      * We now disallow an empty sockaddr in bpf_sock_addr_set_unix_addr() similar to
        unix_validate_addr().
      * Removed unnecessary cgroup_bpf_enabled() checks
      * Removed unnecessary error checks
      
      Changes since v3:
      
      * Renamed bpf_sock_addr_set_addr() to bpf_sock_addr_set_unix_addr() and
        made it only operate on AF_UNIX sockaddrs. This is because for the other
        families, users usually want to configure more than just the address so
        a generic interface will not fit the bill here. e.g. for AF_INET and AF_INET6,
        users would generally also want to be able to configure the port which the
        current interface doesn't support. So we expose an AF_UNIX specific function
        instead.
      * Made the tests in the new sock addr tests more generic (similar to test_sock_addr.c),
        this should make it easier to migrate the other sock addr tests in the future.
      * Removed the new kfunc hook and attached to BTF_KFUNC_HOOK_COMMON instead
      * Set uaddrlen to 0 when the family is AF_UNSPEC
      * Pass in the addrlen to the hook from IPv6 code
      * Fixed mount directory mkdir() to ignore EEXIST
      
      Changes since v2:
      
      * Configuring the sock addr is now done via a new kfunc bpf_sock_addr_set()
      * The addrlen is exposed as u32 in bpf_sock_addr_kern
      * Selftests are updated to use the new kfunc
      * Selftests are now added as a new sock_addr test in prog_tests/
      * Added BTF_KFUNC_HOOK_SOCK_ADDR for BPF_PROG_TYPE_CGROUP_SOCK_ADDR
      * __cgroup_bpf_run_filter_sock_addr() now returns the modified addrlen
      
      Changes since v1:
      
      * Split into multiple patches instead of one single patch
      * Added unix support for all socket address hooks instead of only connect()
      * Switched approach to expose the socket address length to the bpf hook
      instead of recalculating the socket address length in kernelspace to
      properly support abstract unix socket addresses
      * Modified socket address hook tests to calculate the socket address length
      once and pass it around everywhere instead of recalculating the actual unix
      socket address length on demand.
      * Added some missing section name tests for getpeername()/getsockname()
      
      This patch series extends the cgroup sockaddr hooks to include support for unix
      sockets. To add support for unix sockets, struct bpf_sock_addr_kern is extended
      to expose the socket address length to the bpf program. Along with that, a new
      kfunc bpf_sock_addr_set_unix_addr() is added to safely allow modifying an
      AF_UNIX sockaddr from bpf programs.
      
      I intend to use these new hooks in systemd to reimplement the LogNamespace=
      feature, which allows running multiple instances of systemd-journald to
      process the logs of different services. systemd-journald also processes
      syslog messages, so currently, using log namespaces means all services running
      in the same log namespace have to live in the same private mount namespace
      so that systemd can mount the journal namespace's associated syslog socket
      over /dev/log to properly direct syslog messages from all services running
      in that log namespace to the correct systemd-journald instance. We want to
      relax this requirement so that processes running in disjoint mount namespaces
      can still run in the same log namespace. To achieve this, we can use these
      new hooks to rewrite the socket address of any connect(), sendto(), ...
      syscalls to /dev/log to the socket address of the journal namespace's syslog
      socket instead, which will transparently do the redirection without requiring
      use of a mount namespace and mounting over /dev/log.
      
      Aside from the above usecase, these hooks can more generally be used to
      transparently redirect unix sockets to different addresses as required by
      services.
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      d2dc885b
    • Daan De Meyer's avatar
      selftests/bpf: Add tests for cgroup unix socket address hooks · 82ab6b50
      Daan De Meyer authored
      These selftests are written in prog_tests style instead of adding
      them to the existing test_sock_addr tests. Migrating the existing
      sock addr tests to prog_tests style is left for future work. This
      commit adds support for testing bind() sockaddr hooks, even though
      there's no unix socket sockaddr hook for bind(). We leave this code
      intact for when the INET and INET6 tests are migrated in the future
      which do support intercepting bind().
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-10-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      82ab6b50
    • Daan De Meyer's avatar
      selftests/bpf: Make sure mount directory exists · af2752ed
      Daan De Meyer authored
      The mount directory for the selftests cgroup tree might
      not exist so let's make sure it does exist by creating
      it ourselves if it doesn't exist.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-9-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      af2752ed
    • Daan De Meyer's avatar
      documentation/bpf: Document cgroup unix socket address hooks · 3243fef6
      Daan De Meyer authored
      Update the documentation to mention the new cgroup unix sockaddr
      hooks.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-8-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      3243fef6
    • Daan De Meyer's avatar
      bpftool: Add support for cgroup unix socket address hooks · 8b3cba98
      Daan De Meyer authored
      Add the necessary plumbing to hook up the new cgroup unix sockaddr
      hooks into bpftool.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-7-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      8b3cba98
    • Daan De Meyer's avatar
      libbpf: Add support for cgroup unix socket address hooks · bf90438c
      Daan De Meyer authored
      Add the necessary plumbing to hook up the new cgroup unix sockaddr
      hooks into libbpf.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      bf90438c
    • Daan De Meyer's avatar
      bpf: Implement cgroup sockaddr hooks for unix sockets · 859051dd
      Daan De Meyer authored
      These hooks allows intercepting connect(), getsockname(),
      getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
      socket hooks get write access to the address length because the
      address length is not fixed when dealing with unix sockets and
      needs to be modified when a unix socket address is modified by
      the hook. Because abstract socket unix addresses start with a
      NUL byte, we cannot recalculate the socket address in kernelspace
      after running the hook by calculating the length of the unix socket
      path using strlen().
      
      These hooks can be used when users want to multiplex syscall to a
      single unix socket to multiple different processes behind the scenes
      by redirecting the connect() and other syscalls to process specific
      sockets.
      
      We do not implement support for intercepting bind() because when
      using bind() with unix sockets with a pathname address, this creates
      an inode in the filesystem which must be cleaned up. If we rewrite
      the address, the user might try to clean up the wrong file, leaking
      the socket in the filesystem where it is never cleaned up. Until we
      figure out a solution for this (and a use case for intercepting bind()),
      we opt to not allow rewriting the sockaddr in bind() calls.
      
      We also implement recvmsg() support for connected streams so that
      after a connect() that is modified by a sockaddr hook, any corresponding
      recmvsg() on the connected socket can also be modified to make the
      connected program think it is connected to the "intended" remote.
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      859051dd
  3. 11 Oct, 2023 3 commits
  4. 09 Oct, 2023 7 commits
  5. 06 Oct, 2023 8 commits
  6. 04 Oct, 2023 12 commits