1. 12 Aug, 2019 16 commits
  2. 11 Aug, 2019 11 commits
    • David S. Miller's avatar
      Merge branch 'drop_monitor-Capture-dropped-packets-and-metadata' · 6e5ee483
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      drop_monitor: Capture dropped packets and metadata
      
      So far drop monitor supported only one mode of operation in which a
      summary of recent packet drops is periodically sent to user space as a
      netlink event. The event only includes the drop location (program
      counter) and number of drops in the last interval.
      
      While this mode of operation allows one to understand if the system is
      dropping packets, it is not sufficient if a more detailed analysis is
      required. Both the packet itself and related metadata are missing.
      
      This patchset extends drop monitor with another mode of operation where
      the packet - potentially truncated - and metadata (e.g., drop location,
      timestamp, netdev) are sent to user space as a netlink event. Thanks to
      the extensible nature of netlink, more metadata can be added in the
      future.
      
      To avoid performing expensive operations in the context in which
      kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
      skb drop list. The list is then processed in process context (using a
      workqueue), where the netlink messages are allocated, prepared and
      finally sent to user space.
      
      A follow-up patchset will integrate drop monitor with devlink and allow
      the latter to call into drop monitor to report hardware drops. In the
      future, XDP drops can be added as well, thereby making drop monitor the
      go-to netlink channel for diagnosing all packet drops.
      
      Example usage with patched dropwatch [1] can be found here [2]. Example
      dissection of drop monitor netlink events with patched wireshark [3] can
      be found here [4]. I will submit both changes upstream after the kernel
      changes are accepted. Another change worth making is adding a dropmon
      pseudo interface to libpcap, similar to the nflog interface [5]. This
      will allow users to specifically listen on dropmon traffic instead of
      capturing all netlink packets via the nlmon netdev.
      
      Patches #1-#5 prepare the code towards the actual changes in later
      patches.
      
      Patch #6 adds another mode of operation to drop monitor in which the
      dropped packet itself is notified to user space along with metadata.
      
      Patch #7 allows users to truncate reported packets to a specific length,
      in case only the headers are of interest. The original length of the
      packet is added as metadata to the netlink notification.
      
      Patch #8 allows user to query the current configuration of drop monitor
      (e.g., alert mode, truncation length).
      
      Patches #9-#10 allow users to tune the length of the per-CPU skb drop
      list according to their needs.
      
      Changes since v1 [6]:
      * Add skb protocol as metadata. This allows user space to correctly
        dissect the packet instead of blindly assuming it is an Ethernet
        packet
      
      Changes since RFC [7]:
      * Limit the length of the per-CPU skb drop list and make it configurable
      * Do not use the hysteresis timer in packet alert mode
      * Introduce alert mode operations in a separate patch and only then
        introduce the new alert mode
      * Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside
        a union with 'dev_scratch' and therefore not guaranteed to point to a
        valid netdev
      * Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop
        monitor while it is monitoring
      * Did not change schedule_work() in favor of schedule_work_on() as I did
        not observe a change in number of tail drops
      
      [1] https://github.com/idosch/dropwatch/tree/packet-mode
      [2] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt
      [3] https://github.com/idosch/wireshark/tree/drop-monitor-v2
      [4] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt
      [5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c
      [6] https://patchwork.ozlabs.org/cover/1143443/
      [7] https://patchwork.ozlabs.org/cover/1135226/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e5ee483
    • Ido Schimmel's avatar
      drop_monitor: Expose tail drop counter · e9feb580
      Ido Schimmel authored
      Previous patch made the length of the per-CPU skb drop list
      configurable. Expose a counter that shows how many packets could not be
      enqueued to this list.
      
      This allows users determine the desired queue length.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9feb580
    • Ido Schimmel's avatar
      drop_monitor: Make drop queue length configurable · 30328d46
      Ido Schimmel authored
      In packet alert mode, each CPU holds a list of dropped skbs that need to
      be processed in process context and sent to user space. To avoid
      exhausting the system's memory the maximum length of this queue is
      currently set to 1000.
      
      Allow users to tune the length of this queue according to their needs.
      The configured length is reported to user space when drop monitor
      configuration is queried.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30328d46
    • Ido Schimmel's avatar
      drop_monitor: Add a command to query current configuration · 444be061
      Ido Schimmel authored
      Users should be able to query the current configuration of drop monitor
      before they start using it. Add a command to query the existing
      configuration which currently consists of alert mode and packet
      truncation length.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      444be061
    • Ido Schimmel's avatar
      drop_monitor: Allow truncation of dropped packets · 57986617
      Ido Schimmel authored
      When sending dropped packets to user space it is not always necessary to
      copy the entire packet as usually only the headers are of interest.
      
      Allow user to specify the truncation length and add the original length
      of the packet as additional metadata to the netlink message.
      
      By default no truncation is performed.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57986617
    • Ido Schimmel's avatar
      drop_monitor: Add packet alert mode · ca30707d
      Ido Schimmel authored
      So far drop monitor supported only one alert mode in which a summary of
      locations in which packets were recently dropped was sent to user space.
      
      This alert mode is sufficient in order to understand that packets were
      dropped, but lacks information to perform a more detailed analysis.
      
      Add a new alert mode in which the dropped packet itself is passed to
      user space along with metadata: The drop location (as program counter
      and resolved symbol), ingress netdevice and drop timestamp. More
      metadata can be added in the future.
      
      To avoid performing expensive operations in the context in which
      kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
      queued on per-CPU skb drop list. Then, in process context the netlink
      message is allocated, prepared and finally sent to user space.
      
      The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting
      the system's memory. Subsequent patches will make this limit
      configurable and also add a counter that indicates how many skbs were
      tail dropped.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca30707d
    • Ido Schimmel's avatar
      drop_monitor: Add alert mode operations · 28315f79
      Ido Schimmel authored
      The next patch is going to add another alert mode in which the dropped
      packet is notified to user space, instead of only a summary of recent
      drops.
      
      Abstract the differences between the modes by adding alert mode
      operations. The operations are selected based on the currently
      configured mode and associated with the probes and the work item just
      before tracing starts.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28315f79
    • Ido Schimmel's avatar
      drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration · c5ab9b1c
      Ido Schimmel authored
      Currently, the configure command does not do anything but return an
      error. Subsequent patches will enable the command to change various
      configuration options such as alert mode and packet truncation.
      
      Similar to other netlink-based configuration channels, make sure only
      users with the CAP_NET_ADMIN capability set can execute this command.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5ab9b1c
    • Ido Schimmel's avatar
      drop_monitor: Reset per-CPU data before starting to trace · 44075f56
      Ido Schimmel authored
      The function reset_per_cpu_data() allocates and prepares a new skb for
      the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is
      stored in the per-CPU 'data' variable and the old is returned.
      
      The function is invoked during module initialization and from the
      workqueue, before an alert is sent. This means that it is possible to
      receive an alert with stale data, if we stopped tracing when the
      hysteresis timer ('data->send_timer') was pending.
      
      Instead of invoking the function during module initialization, invoke it
      just before we start tracing and ensure we get a fresh skb.
      
      This also allows us to remove the calls to initialize the timer and the
      work item from the module initialization path, since both could have
      been triggered by the error paths of reset_per_cpu_data().
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44075f56
    • Ido Schimmel's avatar
      drop_monitor: Initialize timer and work item upon tracing enable · 70c69274
      Ido Schimmel authored
      The timer and work item are currently initialized once during module
      init, but subsequent patches will need to associate different functions
      with the work item, based on the configured alert mode.
      
      Allow subsequent patches to make that change by initializing and
      de-initializing these objects during tracing enable and disable.
      
      This also guarantees that once the request to disable tracing returns,
      no more netlink notifications will be generated.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70c69274
    • Ido Schimmel's avatar
      drop_monitor: Split tracing enable / disable to different functions · 7c747838
      Ido Schimmel authored
      Subsequent patches will need to enable / disable tracing based on the
      configured alerting mode.
      
      Reduce the nesting level and prepare for the introduction of this
      functionality by splitting the tracing enable / disable operations into
      two different functions.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c747838
  3. 10 Aug, 2019 13 commits