1. 14 Feb, 2017 32 commits
  2. 09 Jan, 2017 8 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-4kuar-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · bda65b42
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 4K UAR
      
      The following series of patches optimizes the usage of the UAR area which is
      contained within the BAR 0-1. Previous versions of the firmware and the driver
      assumed each system page contains a single UAR. This patch set will query the
      firmware for a new capability that if published, means that the firmware can
      support UARs of fixed 4K regardless of system page size. In the case of
      powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
      system page. Since user space processes by default consume eight UARs per
      context this means that with this change a process will need a single system
      page to fulfill that requirement and in fact make use of more UARs which is
      better in terms of performance.
      
      In addition to optimizing user-space processes, we introduce an allocator
      that can be used by kernel consumers to allocate blue flame registers
      (which are areas within a UAR that are used to write doorbells). This provides
      further optimization on using the UAR area since the Ethernet driver makes
      use of a single blue flame register per system page and now it will use two
      blue flame registers per 4K.
      
      The series also makes changes to naming conventions and now the terms used in
      the driver code match the terms used in the PRM (programmers reference manual).
      Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
      register).
      
      In order to support compatibility between different versions of
      library/driver/firmware, the library has now means to notify the kernel driver
      that it supports the new scheme and the kernel can notify the library if it
      supports this extension. So mixed versions of libraries can run concurrently
      without any issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bda65b42
    • Eric Dumazet's avatar
      tcp: make TCP_INFO more consistent · b369e7fd
      Eric Dumazet authored
      tcp_get_info() has to lock the socket, so lets lock it
      for an extended critical section, so that various fields
      have consistent values.
      
      This solves an annoying issue that some applications
      reported when multiple counters are updated during one
      particular rx/rx event, and TCP_INFO was called from
      another cpu.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b369e7fd
    • David S. Miller's avatar
      Merge branch 'bpf-verifier-improvements' · c22e5c12
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: verifier improvements
      
      A number of bpf verifier improvements from Gianluca.
      See individual patches for details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c22e5c12
    • Alexei Starovoitov's avatar
      bpf: rename ARG_PTR_TO_STACK · 39f19ebb
      Alexei Starovoitov authored
      since ARG_PTR_TO_STACK is no longer just pointer to stack
      rename it to ARG_PTR_TO_MEM and adjust comment.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f19ebb
    • Gianluca Borello's avatar
      bpf: allow helpers access to variable memory · 06c1c049
      Gianluca Borello authored
      Currently, helpers that read and write from/to the stack can do so using
      a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE.
      ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so
      that the verifier can safely check the memory access. However, requiring
      the argument to be a constant can be limiting in some circumstances.
      
      Since the current logic keeps track of the minimum and maximum value of
      a register throughout the simulated execution, ARG_CONST_STACK_SIZE can
      be changed to also accept an UNKNOWN_VALUE register in case its
      boundaries have been set and the range doesn't cause invalid memory
      accesses.
      
      One common situation when this is useful:
      
      int len;
      char buf[BUFSIZE]; /* BUFSIZE is 128 */
      
      if (some_condition)
      	len = 42;
      else
      	len = 84;
      
      some_helper(..., buf, len & (BUFSIZE - 1));
      
      The compiler can often decide to assign the constant values 42 or 48
      into a variable on the stack, instead of keeping it in a register. When
      the variable is then read back from stack into the register in order to
      be passed to the helper, the verifier will not be able to recognize the
      register as constant (the verifier is not currently tracking all
      constant writes into memory), and the program won't be valid.
      
      However, by allowing the helper to accept an UNKNOWN_VALUE register,
      this program will work because the bitwise AND operation will set the
      range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE),
      so the verifier can guarantee the helper call will be safe (assuming the
      argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more
      check against 0 would be needed). Custom ranges can be set not only with
      ALU operations, but also by explicitly comparing the UNKNOWN_VALUE
      register with constants.
      
      Another very common example happens when intercepting system call
      arguments and accessing user-provided data of variable size using
      bpf_probe_read(). One can load at runtime the user-provided length in an
      UNKNOWN_VALUE register, and then read that exact amount of data up to a
      compile-time determined limit in order to fit into the proper local
      storage allocated on the stack, without having to guess a suboptimal
      access size at compile time.
      
      Also, in case the helpers accepting the UNKNOWN_VALUE register operate
      in raw mode, disable the raw mode so that the program is required to
      initialize all memory, since there is no guarantee the helper will fill
      it completely, leaving possibilities for data leak (just relevant when
      the memory used by the helper is the stack, not when using a pointer to
      map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will
      be treated as ARG_PTR_TO_STACK.
      Signed-off-by: default avatarGianluca Borello <g.borello@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06c1c049
    • Gianluca Borello's avatar
      bpf: allow adjusted map element values to spill · f0318d01
      Gianluca Borello authored
      commit 48461135 ("bpf: allow access into map value arrays")
      introduces the ability to do pointer math inside a map element value via
      the PTR_TO_MAP_VALUE_ADJ register type.
      
      The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ
      is spilled into the stack, limiting several use cases, especially when
      generating bpf code from a compiler.
      
      Handle this case by explicitly enabling the register type
      PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and
      max_value are reset just for BPF_LDX operations that don't result in a
      restore of a spilled register from stack.
      Signed-off-by: default avatarGianluca Borello <g.borello@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0318d01
    • Gianluca Borello's avatar
      bpf: allow helpers access to map element values · 5722569b
      Gianluca Borello authored
      Enable helpers to directly access a map element value by passing a
      register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper
      arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK.
      
      This enables several use cases. For example, a typical tracing program
      might want to capture pathnames passed to sys_open() with:
      
      struct trace_data {
      	char pathname[PATHLEN];
      };
      
      SEC("kprobe/sys_open")
      void bpf_sys_open(struct pt_regs *ctx)
      {
      	struct trace_data data;
      	bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di);
      
      	/* consume data.pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      Such a program could easily hit the stack limit in case PATHLEN needs to
      be large or more local variables need to exist, both of which are quite
      common scenarios. Allowing direct helper access to map element values,
      one could do:
      
      struct bpf_map_def SEC("maps") scratch_map = {
      	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
      	.key_size = sizeof(u32),
      	.value_size = sizeof(struct trace_data),
      	.max_entries = 1,
      };
      
      SEC("kprobe/sys_open")
      int bpf_sys_open(struct pt_regs *ctx)
      {
      	int id = 0;
      	struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id);
      	if (!p)
      		return;
      	bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di);
      
      	/* consume p->pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      And wouldn't risk exhausting the stack.
      
      Code changes are loosely modeled after commit 6841de8b ("bpf: allow
      helpers access the packet directly"). Unlike with PTR_TO_PACKET, these
      changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not
      ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be
      trivial, but since there is not currently a use case for that, it's
      reasonable to limit the set of changes.
      
      Also, add new tests to make sure accesses to map element values from
      helpers never go out of boundary, even when adjusted.
      Signed-off-by: default avatarGianluca Borello <g.borello@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5722569b
    • Gianluca Borello's avatar
      bpf: split check_mem_access logic for map values · dbcfe5f7
      Gianluca Borello authored
      Move the logic to check memory accesses to a PTR_TO_MAP_VALUE_ADJ from
      check_mem_access() to a separate helper check_map_access_adj(). This
      enables to use those checks in other parts of the verifier as well,
      where boundaries on PTR_TO_MAP_VALUE_ADJ might need to be checked, for
      example when checking helper function arguments. The same thing is
      already happening for other types such as PTR_TO_PACKET and its
      check_packet_access() helper.
      
      The code has been copied verbatim, with the only difference of removing
      the "off += reg->max_value" statement and moving the sum into the call
      statement to check_map_access(), as that was only needed due to the
      earlier common check_map_access() call.
      Signed-off-by: default avatarGianluca Borello <g.borello@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbcfe5f7