1. 28 Sep, 2020 1 commit
    • Magnus Karlsson's avatar
      xsk: Fix possible crash in socket_release when out-of-memory · 1fd17c8c
      Magnus Karlsson authored
      Fix possible crash in socket_release when an out-of-memory error has
      occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag
      encountered an error in xp_create_and_assign_umem, the bind code
      jumped to the exit routine but erroneously forgot to set the err value
      before jumping. This meant that the exit routine thought the setup
      went well and set the state of the socket to XSK_BOUND. The xsk socket
      release code will then, at application exit, think that this is a
      properly setup socket, when it is not, leading to a crash when all
      fields in the socket have in fact not been initialized properly. Fix
      this by setting the err variable in xsk_bind so that the socket is not
      set to XSK_BOUND which leads to the clean-up in xsk_release not being
      triggered.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com
      1fd17c8c
  2. 26 Sep, 2020 1 commit
    • John Fastabend's avatar
      bpf: Add comment to document BTF type PTR_TO_BTF_ID_OR_NULL · ba5f4cfe
      John Fastabend authored
      The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
      denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
      and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
      type PTR_TO_SOCKET is guaranteed to _not_ have a null value.
      
      In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
      different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
      but it is safe to read this pointer in the program context because
      the program context will handle any faults. The fallout is for
      PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
      use the type in branch analysis. Additionally, authors need to be
      extra careful when passing PTR_TO_BTF_ID into helpers. In general
      helpers consuming type PTR_TO_BTF_ID will need to assume it may
      be null.
      
      Seeing the above is not obvious to readers without the back knowledge
      lets add a comment in the type definition.
      
      Editorial comment, as networking and tracing programs get closer
      and more tightly merged we may need to consider a new type that we
      can ensure is non-null for branch analysis and also passing into
      helpers.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      ba5f4cfe
  3. 25 Sep, 2020 16 commits
  4. 24 Sep, 2020 17 commits
  5. 23 Sep, 2020 5 commits
    • David S. Miller's avatar
      Merge branch 'net-bridge-mcast-IGMPv3-MLDv2-fast-path-part-2' · 68d4fd30
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: mcast: IGMPv3/MLDv2 fast-path (part 2)
      
      This is the second part of the IGMPv3/MLDv2 support which adds support
      for the fast-path. In order to be able to handle source entries we add
      mdb support for S,G entries (i.e. we add source address support to
      br_ip), that requires to extend the current mdb netlink API, fortunately
      we just add another attribute which will contain nested future mdb
      attributes, then we use it to add support for S,G user- add, del and
      dump. The lookup sequence is simple: when IGMPv3/MLDv2 are enabled do
      the S,G lookup first and if it fails fallback to *,G. The more complex
      part is when we begin handling source lists and auto-installing S,G entries
      and *,G filter mode transitions. We have the following cases:
       1) *,G INCLUDE -> EXCLUDE transition: we need to install the port in
          all of *,G's installed S,G entries for proper replication (except
          the ones explicitly blocked), this is also necessary when adding a
          new *,G EXCLUDE port group
      
       2) *,G EXCLUDE -> INCLUDE transition: we need to remove the port from
          all of *,G's installed S,G entries, this is also necessary when
          removing a *,G port group
      
       3) New S,G port entry: we need to install all current *,G EXCLUDE ports
      
       4) Remove S,G port entry: if all other port groups were auto-installed we
          can safely remove them and delete the whole S,G entry
      
      Currently we compute these operations from the available ports, their
      source lists and their filter mode. In the future we can extend the port
      group structure and reduce the running time of these ops. Also one
      current limitation is that host-joined S,G entries are not supported.
      I.e. one cannot add "dev bridge port bridge" mdb S,G entries. The host
      join is currently considered an EXCLUDE {} join, so it's reflected in
      all of *,G's installed S,G entries. If an S,G,port entry is added as
      temporary then the kernel can take it over if a source shows up from a
      report, permanent entries are skipped. In order to properly handle
      blocked sources we add a new port group blocked flag to avoid forwarding
      to that port group in the S,G. Finally when forwarding we use the port
      group filter mode (if it's INCLUDE and the port group is from a *,G then
      don't replicate to it, respectively if it's EXCLUDE then forward) and the
      blocked flag (obviously if it's set - skip that port unless it's a
      router port) to decide if the port should be skipped. Another limitation
      is that we can't do some of the above transitions without small traffic
      drop while installing/removing entries. That will be taken care of when
      we add atomic swap of port replication lists later.
      
      Patch break down:
       patches 1-3: prepare the mdb code for better extack support which is
                    used in future patches to return a more meaningful error
       patches 4-6: add the source address field to struct br_ip, and do minor
                    cleanups around it
       patches 7-8: extend the mdb netlink API so we can send new mdb
                    attributes and uses the new API for S,G entry add/del/dump
                    support
       patch     9: takes care of S,G entries when doing a lookup (first S,G
                    then *,G lookup)
       patch    10: adds a new port group field and attribute for origin protocol
                    we use the already available RTPROT_ definitions,
                    currently user-space entries are added as RTPROT_STATIC and
                    kernel entries are added as RTPROT_KERNEL, we may allow
                    user-space to set custom values later (e.g. for FRR, clag)
       patch    11: adds an internal S,G,port rhashtable to speed up filter
                    mode transitions
       patch    12: initial automatic install of S,G entries based on port
                    groups' source lists
       patch    13: handles port group modes on transitions or when new
                    port group entries are added
       patch    14: self-explanatory - adds support for blocked port group
                    entries needed to stop forwarding to particular S,G,port
                    entries
       patch    15: handles host-join/leave state changes, treats host-joins
                    as EXCLUDE {} groups (reflected in all *,G's S,G entries)
       patch    16: finally adds the fast-path filter mode and block flag
                    support
      
      Here're the sets that will come next (in order):
       - iproute2 support for IGMPv3/MLDv2
       - selftests for all mode transitions and group flags
       - explicit host tracking for proper fast-leave support
       - atomic port replication lists (these are also needed for broadcast
         forwarding optimizations)
       - mode transition optimization and removal of open-coded sorted lists
      
      Not implemented yet:
       - Host IGMPv3/MLDv2 filter support (currently we handle only join/leave
         as before)
       - Proper other querier source timer and value updates
       - IGMPv3/v2 MLDv2/v1 compat (I have a few rough patches for this one)
      
      v2: fix build with CONFIG_BATMAN_ADV_MCAST in patch 6
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68d4fd30
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: when forwarding handle filter mode and blocked flag · 36cfec73
      Nikolay Aleksandrov authored
      We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the
      mdst entry is a *,G or when the port has the blocked flag.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36cfec73
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: handle host state · 094b82fd
      Nikolay Aleksandrov authored
      Since host joins are considered as EXCLUDE {} joins we need to reflect
      that in all of *,G ports' S,G entries. Since the S,Gs can have
      host_joined == true only set automatically we can safely set it to false
      when removing all automatically added entries upon S,G delete.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      094b82fd
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: add support for blocked port groups · 9116ffbf
      Nikolay Aleksandrov authored
      When excluding S,G entries we need a way to block a particular S,G,port.
      The new port group flag is managed based on the source's timer as per
      RFCs 3376 and 3810. When a source expires and its port group is in
      EXCLUDE mode, it will be blocked.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9116ffbf
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: handle port group filter modes · 8266a049
      Nikolay Aleksandrov authored
      We need to handle group filter mode transitions and initial state.
      To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
      a new port group in EXCLUDE mode) we need to add that port to all of
      *,G ports' S,G entries for proper replication. When the EXCLUDE state is
      changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
      called after the source list processing because the assumption is that
      all of the group's S,G entries will be created before transitioning to
      EXCLUDE mode, i.e. most importantly its blocked entries will already be
      added so it will not get automatically added to them.
      The transition EXCLUDE -> INCLUDE happens only when a port group timer
      expires, it requires us to remove that port from all of *,G ports' S,G
      entries where it was automatically added previously.
      Finally when we are adding a new S,G entry we must add all of *,G's
      EXCLUDE ports to it.
      In order to distinguish automatically added *,G EXCLUDE ports we have a
      new port group flag - MDB_PG_FLAGS_STAR_EXCL.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8266a049