Commit a7635872 authored by Kirill Smelkov's avatar Kirill Smelkov

Merge branch 'master' into x/pinglat

* master: (609 commits)
  docs: references_guide.md: add/fix search examples/tools links (#2186)
  Fix misc file permissions (#2185)
  sync with latest bpf (#2184)
  sync with latest libbpf repo (#2183)
  docs: fix broken link of bpf_log2l(#2176)
  examples/tracing: some minor fixes
  Fix tools/syscount -l (#2180)
  examples/tracing/bitehist.py: add example of linear histogram (#2177)
  cachestat: bring back HITRATIO column
  Fix debuginfo search on Ubuntu
  Add installation instructions for Amazon Linux 1 AMI Sign-Off-By Travis Davies <trdavies@amazon.com>
  [iovisor/bcc] trace: Incorrect symbol offsets when using build_id (#2161) (#2162)
  profile: exclude CPU idle stacks by default (#2166)
  fix cpuunclaimed.py with cfs_rq structure change (#2164)
  tools: rename "deadlock_detector" to "deadlock" (#2152) (#2160)
  use libbpf api in bpf_attach_xdp (#2158)
  support symbol resolution of short-lived process.  (#2144)
  profile.py: return kernel annotations for folded stacks
  use libbpf APIs from libbpf.c (#2156)
  ddos_detector.py to monitor DDOS attacks (#2140)
  ...
parents 9fd77679 518bd445
......@@ -3,6 +3,7 @@
*.swo
*.pyc
.idea
*~
# Build artifacts
/build/
......
[submodule "src/cc/libbpf"]
path = src/cc/libbpf
url = https://github.com/libbpf/libbpf.git
......@@ -3,4 +3,6 @@ install:
- sudo apt-get install -y python-pip
- sudo pip install pep8
script:
- find tools/ -type f -name "*.py" | xargs pep8 -r --show-source --ignore=E123,E125,E126,E127,E128,E302
- set -euo pipefail
- ./scripts/check-helpers.sh
- ./scripts/py-style-check.sh
......@@ -9,6 +9,12 @@ endif()
enable_testing()
# populate submodules (libbpf)
if(NOT EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/src/cc/libbpf/src)
execute_process(COMMAND git submodule update --init --recursive
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
endif()
include(cmake/GetGitRevisionDescription.cmake)
include(cmake/version.cmake)
include(CMakeDependentOption)
......@@ -16,6 +22,9 @@ include(GNUInstallDirs)
include(CheckCXXCompilerFlag)
include(cmake/FindCompilerFlag.cmake)
option(ENABLE_LLVM_NATIVECODEGEN "Enable use of llvm nativecodegen module (needed by rw-engine)" ON)
option(ENABLE_RTTI "Enable compiling with real time type information" OFF)
option(ENABLE_LLVM_SHARED "Enable linking LLVM as a shared library" OFF)
option(ENABLE_CLANG_JIT "Enable Loading BPF through Clang Frontend" ON)
option(ENABLE_USDT "Enable User-level Statically Defined Tracing" ON)
CMAKE_DEPENDENT_OPTION(ENABLE_CPP_API "Enable C++ API" ON "ENABLE_USDT" OFF)
......@@ -26,7 +35,7 @@ if(NOT PYTHON_ONLY AND ENABLE_CLANG_JIT)
find_package(BISON)
find_package(FLEX)
find_package(LLVM REQUIRED CONFIG)
message(STATUS "Found LLVM: ${LLVM_INCLUDE_DIRS}")
message(STATUS "Found LLVM: ${LLVM_INCLUDE_DIRS} ${LLVM_PACKAGE_VERSION}")
find_package(LibElf REQUIRED)
# clang is linked as a library, but the library path searching is
......@@ -44,6 +53,7 @@ find_library(libclangParse NAMES clangParse HINTS ${CLANG_SEARCH})
find_library(libclangRewrite NAMES clangRewrite HINTS ${CLANG_SEARCH})
find_library(libclangSema NAMES clangSema HINTS ${CLANG_SEARCH})
find_library(libclangSerialization NAMES clangSerialization HINTS ${CLANG_SEARCH})
find_library(libclangASTMatchers NAMES clangASTMatchers HINTS ${CLANG_SEARCH})
if(libclangBasic STREQUAL "libclangBasic-NOTFOUND")
message(FATAL_ERROR "Unable to find clang libraries")
endif()
......@@ -83,6 +93,7 @@ set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall ${CXX_ISYSTEM_DIRS}")
add_subdirectory(src)
add_subdirectory(introspection)
if(ENABLE_CLANG_JIT)
add_subdirectory(examples)
add_subdirectory(man)
......
......@@ -9,7 +9,7 @@ _(Written by Brendan Gregg.)_
bcc has 2 types of scripts, in different directories:
- **/examples**: intended as short examples of bcc & eBPF code. You should focus on keeping it short, neat, and documented (code comments). A submission can just be the example code.
- **/tools**: intended as production safe performance and troubleshooting tools. You should focus on it being useful, tested, low overhead, documented (incl. all caveats), and easy to use. A submission should involve 4 changes: the tool, a man page, an example file, and an addition to README.md. Follow [my lead](https://github.com/brendangregg/bcc/commit/9fa156273b395cfc5505f0fff5d6b7b1396f7daa), and see the checklist below. These will be run in mission critical environments as root, so if spending hours testing isn't for you, please submit your idea as an issue instead, or chat with us on irc.
- **/tools**: intended as production safe performance and troubleshooting tools. You should focus on it being useful, tested, low overhead, documented (incl. all caveats), and easy to use. A submission should involve 4 changes: the tool, a man page, an example file, and an addition to README.md. Follow [my lead](https://github.com/brendangregg/bcc/commit/9fa156273b395cfc5505f0fff5d6b7b1396f7daa), and see the checklist below. These are run in mission critical environments as root (tech companies, financial institutions, government agencies), so if spending hours testing isn't for you, please submit your idea as an issue instead, or chat with us on irc.
More detail for each below.
......@@ -31,7 +31,9 @@ A checklist for bcc tool development:
1. **Measure the overhead of the tool**. If you are running a micro-benchmark, how much slower is it with the tool running. Is more CPU consumed? Try to determine the worst case: run the micro-benchmark so that CPU headroom is exhausted, and then run the bcc tool. Can overhead be lowered?
1. **Test again, and stress test**. You want to discover and fix all the bad things before others hit them.
1. **Consider command line options**. Should it have -p for filtering on a PID? -T for timestamps? -i for interval? See other tools for examples, and copy the style: the usage message should list example usage at the end. Remember to keep the tool doing one thing and doing it well. Also, if there's one option that seems to be the common case, perhaps it should just be the first argument and not need a switch (no -X). A special case of this is *stat tools, like iostat/vmstat/etc, where the convention is [interval [count]].
1. **Concise, intuitive, self-explanatory output**. The default output should meet the common need concisely. Leave much less useful fields and data to be shown with options: -v for verbose, etc. Consider including a startup message that's self-explanatory, eg "Tracing block I/O. Output every 1 seconds. Ctrl-C to end.". Also, try hard to keep the output less than 80 characters wide, especially the default output of the tool. That way, the output not only fits on the smallest reasonable terminal, it also fits well in slide decks, blog posts, articles, and printed material, all of which help education and adoption. Publishers of technical books often have templates they require books to conform to: it may not be an option to shrink or narrow the font to fit your output.
1. **Concise, intuitive, self-explanatory output**. The default output should meet the common need concisely. Leave much less useful fields and data to be shown with options: -v for verbose, etc. Consider including a startup message that's self-explanatory, eg "Tracing block I/O. Output every 1 seconds. Ctrl-C to end.".
1. **Default output <80 chars wide**. Try hard to keep the output less than 80 characters wide, especially the default output of the tool. That way, the output not only fits on the smallest reasonable terminal, it also fits well in slide decks, blog posts, articles, and printed material, all of which help education and adoption. Publishers of technical books often have templates they require books to conform to: it may not be an option to shrink or narrow the font to fit your output.
1. **Short tool name**. Follow the style of the other tools, which follow the style of other /usr/bin utilities. They are short and easy to type. No underscores.
1. **Use pep8 to check Python style**: pep8 --show-source --ignore=E123,E125,E126,E127,E128,E302 filename . Note that it misses some things, like consistent usage, so you'll still need to double check your script.
1. **Make sure your script is Python3-ready**: Adding `from __future__ import absolute_import, division, print_function, unicode_literals` helps make your script Python3-ready.
1. **Write an _example.txt file**. Copy the style in tools/biolatency_example.txt: start with an intro sentence, then have examples, and finish with the USAGE message. Explain everything: the first example should explain what we are seeing, even if this seems obvious. For some people it won't be obvious. Also explain why we are running the tool: what problems it's solving. It can take a long time (hours) to come up with good examples, but it's worth it. These will get copied around (eg, presentations, articles).
......
Copyright 2015 PLUMgrid
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
FROM ubuntu:xenial
FROM ubuntu:bionic
MAINTAINER Brenden Blanco <bblanco@gmail.com>
......
......@@ -38,3 +38,10 @@ A: You need to obtain a recent version of the Linux source code
Q: hello_world.py fails with:
ImportError: No module named past.builtins
A: sudo pip install future
Q: Running one of the bcc tools produces an import error:
Traceback (most recent call last):
File "./execsnoop", line 20, in <module>
from bcc import BPF
ImportError: No module named bcc
A: Make sure the python bcc bindings package (python2-bcc) is installed.
......@@ -2,16 +2,19 @@
* [Kernel Configuration](#kernel-configuration)
* [Packages](#packages)
- [Ubuntu](#ubuntu-xenial---binary)
- [Ubuntu](#ubuntu---binary)
- [Fedora](#fedora---binary)
- [Arch](#arch---aur)
- [Gentoo](#gentoo---portage)
- [openSUSE](#opensuse---binary)
- [RHEL](#rhel---binary)
- [Amazon Linux 1](#Amazon-Linux-1---Binary)
* [Source](#source)
- [Debian](#debian---source)
- [Ubuntu](#ubuntu---source)
- [Fedora](#fedora---source)
- [openSUSE](#opensuse---source)
- [Amazon Linux](#amazon-linux---source)
* [Older Instructions](#older-instructions)
## Kernel Configuration
......@@ -48,70 +51,39 @@ Kernel compile flags can usually be checked by looking at `/proc/config.gz` or
# Packages
## Ubuntu Xenial - Binary
## Ubuntu - Binary
Only the nightly packages are built for Ubuntu 16.04, but the steps are very straightforward. No need to upgrade the kernel or compile from source!
The stable and the nightly packages are built for Ubuntu Xenial (16.04), Ubuntu Artful (17.10) and Ubuntu Bionic (18.04). The steps are very straightforward, no need to upgrade the kernel or compile from source!
```bash
echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list
sudo apt-get update
sudo apt-get install bcc-tools libbcc-examples
```
## Ubuntu Trusty - Binary
**Kernel**
Install a 4.3+ kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline,
for example:
```bash
VER=4.5.1-040501
PREFIX=http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5.1-wily/
REL=201604121331
wget ${PREFIX}/linux-headers-${VER}-generic_${VER}.${REL}_amd64.deb
wget ${PREFIX}/linux-headers-${VER}_${VER}.${REL}_all.deb
wget ${PREFIX}/linux-image-${VER}-generic_${VER}.${REL}_amd64.deb
sudo dpkg -i linux-*${VER}.${REL}*.deb
# reboot
```
Update PREFIX to the latest date, and you can browse the files in the PREFIX url to find the REL number.
**Stable and Signed Packages**
**Signed Packages**
Tagged and signed bcc binary packages are built for Ubuntu Trusty (14.04) and
hosted at https://repo.iovisor.org/apt/.
To install:
```bash
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys D4284CDD
echo "deb https://repo.iovisor.org/apt trusty main" | sudo tee /etc/apt/sources.list.d/iovisor.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 4052245BD4284CDD
echo "deb https://repo.iovisor.org/apt/$(lsb_release -cs) $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/iovisor.list
sudo apt-get update
sudo apt-get install binutils bcc bcc-tools libbcc-examples python-bcc
sudo apt-get install bcc-tools libbcc-examples linux-headers-$(uname -r)
```
(replace `xenial` with `artful` or `bionic` as appropriate). Tools will be installed under /usr/share/bcc/tools.
**Nightly Packages**
```bash
echo "deb [trusted=yes] https://repo.iovisor.org/apt/trusty trusty-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list
echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list
sudo apt-get update
sudo apt-get install bcc-tools libbcc-examples
sudo apt-get install bcc-tools libbcc-examples linux-headers-$(uname -r)
```
(replace `xenial` with `artful` or `bionic` as appropriate)
Test it:
```
sudo python /usr/share/bcc/examples/hello_world.py
sudo python /usr/share/bcc/examples/tracing/task_switch.py
```
**Ubuntu Packages**
The previous commands will install the latest bcc from the iovisor repositories. It is also available from the standard Ubuntu multiverse repository, under the package name `bpfcc-tools`.
(Optional) Install pyroute2 for additional networking features
```bash
git clone https://github.com/svinota/pyroute2
cd pyroute2; sudo make install
sudo python /usr/share/bcc/examples/networking/simple_tc.py
sudo apt-get install bpfcc-tools linux-headers-$(uname -r)
```
The tools are installed in /sbin with a -bpfcc extension. Try running `sudo opensnoop-bpfcc`.
## Fedora - Binary
Ensure that you are running a 4.2+ kernel with `uname -r`. If not, install a 4.2+ kernel from
......@@ -123,12 +95,24 @@ sudo dnf update
# reboot
```
Nightly bcc binary packages for Fedora 23, 24, and 25 are hosted at
`https://repo.iovisor.org/yum/nightly/f{23,24,25}`.
**Nightly Packages**
Nightly bcc binary packages for Fedora 25, 26, 27, and 28 are hosted at
`https://repo.iovisor.org/yum/nightly/f{25,26,27}`.
To install:
```bash
echo -e '[iovisor]\nbaseurl=https://repo.iovisor.org/yum/nightly/f25/$basearch\nenabled=1\ngpgcheck=0' | sudo tee /etc/yum.repos.d/iovisor.repo
echo -e '[iovisor]\nbaseurl=https://repo.iovisor.org/yum/nightly/f27/$basearch\nenabled=1\ngpgcheck=0' | sudo tee /etc/yum.repos.d/iovisor.repo
sudo dnf install bcc-tools kernel-headers kernel-devel
```
**Stable and Signed Packages**
Stable bcc binary packages for Fedora 25, 26, 27, and 28 are hosted at
`https://repo.iovisor.org/yum/main/f{25,26,27}`.
```bash
echo -e '[iovisor]\nbaseurl=https://repo.iovisor.org/yum/main/f27/$basearch\nenabled=1' | sudo tee /etc/yum.repos.d/iovisor.repo
sudo dnf install bcc-tools kernel-devel-$(uname -r) kernel-headers-$(uname -r)
```
......@@ -173,6 +157,30 @@ sudo zypper ref
sudo zypper in bcc-tools bcc-examples
```
## RHEL - Binary
For RHEL 7.6, bcc is already included in the official yum repository as bcc-tools. As part of the install, the following dependencies are installed: bcc.x86_64 0:0.6.1-2.el7 ,llvm-private.x86_64 0:6.0.1-2.el7 ,python-bcc.x86_64 0:0.6.1-2.el7,python-netaddr.noarch 0:0.7.5-9.el7
```
yum install bcc-tools
```
## Amazon Linux 1 - Binary
Use case 1. Install BCC for latest kernel available in repo:
Tested on Amazon Linux AMI release 2018.03 (kernel 4.14.88-72.73.amzn1.x86_64)
```
sudo yum update kernel
sudo yum install bcc
sudo reboot
```
Use case 2. Install BCC for your AMI's default kernel (no reboot required):
Tested on Amazon Linux AMI release 2018.03 (kernel 4.14.77-70.59.amzn1.x86_64)
```
sudo yum install kernel-headers-$(uname -r | cut -d'.' -f1-5)
sudo yum install kernel-devel-$(uname -r | cut -d'.' -f1-5)
sudo yum install bcc
```
# Source
......@@ -222,7 +230,7 @@ apt-get -t jessie-backports install linux-base linux-image-4.9.0-0.bpo.2-amd64 l
apt-get install debhelper cmake libllvm3.8 llvm-3.8-dev libclang-3.8-dev \
libelf-dev bison flex libedit-dev clang-format-3.8 python python-netaddr \
python-pyroute2 luajit libluajit-5.1-dev arping iperf netperf ethtool \
devscripts zlib1g-dev
devscripts zlib1g-dev libfl-dev
```
#### Sudo
......@@ -265,7 +273,7 @@ sudo dpkg -i *bcc*.deb
To build the toolchain from source, one needs:
* LLVM 3.7.1 or newer, compiled with BPF support (default=on)
* Clang, built from the same tree as LLVM
* cmake, gcc (>=4.7), flex, bison
* cmake (>=3.1), gcc (>=4.7), flex, bison
* LuaJIT, if you want Lua support
### Install build dependencies
......@@ -278,7 +286,11 @@ deb-src http://llvm.org/apt/$VER/ llvm-toolchain-$VER-3.7 main" | \
wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
# All versions
# For bionic
sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev
# For other versions
sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
libllvm3.7 llvm-3.7-dev libclang-3.7-dev python zlib1g-dev libelf-dev
......@@ -305,7 +317,7 @@ sudo dnf install -y bison cmake ethtool flex git iperf libstdc++-static \
elfutils-libelf-devel
sudo dnf install -y luajit luajit-devel # for Lua support
sudo dnf install -y \
http://pkgs.repoforge.org/netperf/netperf-2.6.0-1.el6.rf.x86_64.rpm
http://repo.iovisor.org/yum/extra/mageia/cauldron/x86_64/netperf-2.7.0-1.mga6.x86_64.rpm
sudo pip install pyroute2
```
......@@ -339,7 +351,7 @@ sudo make install
```
sudo zypper in bison cmake flex gcc gcc-c++ git libelf-devel libstdc++-devel \
llvm-devel pkg-config python-devel python-setuptools python3-devel \
llvm-devel clang-devel pkg-config python-devel python-setuptools python3-devel \
python3-setuptools
sudo zypper in luajit-devel # for lua support in openSUSE Leap 42.2 or later
sudo zypper in lua51-luajit-devel # for lua support in openSUSE Tumbleweed
......@@ -361,6 +373,51 @@ sudo make install
popd
```
## Amazon Linux - Source
Tested on Amazon Linux AMI release 2018.03 (kernel 4.14.47-56.37.amzn1.x86_64)
### Install packages required for building
```
# enable epel to get iperf, luajit, luajit-devel, cmake3 (cmake3 is required to support c++11)
sudo yum-config-manager --enable epel
sudo yum install -y bison cmake3 ethtool flex git iperf libstdc++-static python-netaddr gcc gcc-c++ make zlib-devel elfutils-libelf-devel
sudo yum install -y luajit luajit-devel
sudo yum install -y http://repo.iovisor.org/yum/extra/mageia/cauldron/x86_64/netperf-2.7.0-1.mga6.x86_64.rpm
sudo pip install pyroute2
sudo yum install -y ncurses-devel
```
### Install clang 3.7.1 pre-built binaries
```
wget http://releases.llvm.org/3.7.1/clang+llvm-3.7.1-x86_64-fedora22.tar.xz
tar xf clang*
(cd clang* && sudo cp -R * /usr/local/)
```
### Build bcc
```
git clone https://github.com/iovisor/bcc.git
pushd .
mkdir bcc/build; cd bcc/build
cmake3 .. -DCMAKE_INSTALL_PREFIX=/usr
time make
sudo make install
popd
```
### Setup required to run the tools
```
sudo yum -y install kernel-devel-$(uname -r)
sudo mount -t debugfs debugfs /sys/kernel/debug
```
### Test
```
sudo /usr/share/bcc/tools/execsnoop
```
# Older Instructions
## Build LLVM and Clang development libs
......
- 2018-05-03: [Linux System Monitoring with eBPF](https://www.circonus.com/2018/05/linux-system-monitoring-with-ebpf)
- 2018-02-22: [Some advanced BCC topics](https://lwn.net/Articles/747640)
- 2018-01-23: [BPFd: Running BCC tools remotely across systems and architectures](https://lwn.net/Articles/744522)
- 2017-12-22: [An introduction to the BPF Compiler Collection](https://lwn.net/Articles/742082)
- 2017-09-13: [Performance Analysis Superpowers with Linux BPF](https://www.slideshare.net/brendangregg/ossna-2017-performance-analysis-superpowers-with-linux-bpf)
- 2017-07-28: [Tracing a packet journey using Linux tracepoints, perf and eBPF](https://blog.yadutaf.fr/2017/07/28/tracing-a-packet-journey-using-linux-tracepoints-perf-ebpf/)
- 2017-07-13: [Performance Superpowers with Enhanced BPF](https://www.usenix.org/conference/atc17/program/presentation/gregg-superpowers)
......
......@@ -82,6 +82,7 @@ pair of .c and .py files, and some are directories of files.
#### Tools:
<center><a href="images/bcc_tracing_tools_2017.png"><img src="images/bcc_tracing_tools_2017.png" border=0 width=700></a></center>
- tools/[argdist](tools/argdist.py): Display function parameter values as a histogram or frequency count. [Examples](tools/argdist_example.txt).
- tools/[bashreadline](tools/bashreadline.py): Print entered bash commands system wide. [Examples](tools/bashreadline_example.txt).
- tools/[biolatency](tools/biolatency.py): Summarize block device I/O latency as a histogram. [Examples](tools/biolatency_example.txt).
......@@ -96,11 +97,12 @@ pair of .c and .py files, and some are directories of files.
- tools/[cachetop](tools/cachetop.py): Trace page cache hit/miss ratio by processes. [Examples](tools/cachetop_example.txt).
- tools/[cpudist](tools/cpudist.py): Summarize on- and off-CPU time per task as a histogram. [Examples](tools/cpudist_example.txt)
- tools/[cpuunclaimed](tools/cpuunclaimed.py): Sample CPU run queues and calculate unclaimed idle CPU. [Examples](tools/cpuunclaimed_example.txt)
- tools/[criticalstat](tools/criticalstat.py): Trace and report long atomic critical sections in the kernel. [Examples](tools/criticalstat_example.txt)
- tools/[dbslower](tools/dbslower.py): Trace MySQL/PostgreSQL queries slower than a threshold. [Examples](tools/dbslower_example.txt).
- tools/[dbstat](tools/dbstat.py): Summarize MySQL/PostgreSQL query latency as a histogram. [Examples](tools/dbstat_example.txt).
- tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
- tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt).
- tools/[deadlock_detector](tools/deadlock_detector.py): Detect potential deadlocks on a running process. [Examples](tools/deadlock_detector_example.txt).
- tools/[deadlock](tools/deadlock.py): Detect potential deadlocks on a running process. [Examples](tools/deadlock_example.txt).
- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[ext4dist](tools/ext4dist.py): Summarize ext4 operation latency distribution as a histogram. [Examples](tools/ext4dist_example.txt).
- tools/[ext4slower](tools/ext4slower.py): Trace slow ext4 operations. [Examples](tools/ext4slower_example.txt).
......@@ -112,6 +114,7 @@ pair of .c and .py files, and some are directories of files.
- tools/[funcslower](tools/funcslower.py): Trace slow kernel or user function calls. [Examples](tools/funcslower_example.txt).
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[inject](tools/inject.py): Targeted error injection with call chain and predicates [Examples](tools/inject_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[llcstat](tools/llcstat.py): Summarize CPU cache references and misses by process. [Examples](tools/llcstat_example.txt).
- tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt).
......@@ -128,6 +131,9 @@ pair of .c and .py files, and some are directories of files.
- tools/[reset-trace](tools/reset-trace.sh): Reset the state of tracing. Maintenance tool only. [Examples](tools/reset-trace_example.txt).
- tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt).
- tools/[runqlen](tools/runqlen.py): Run queue length as a histogram. [Examples](tools/runqlen_example.txt).
- tools/[runqslower](tools/runqslower.py): Trace long process scheduling delays. [Examples](tools/runqslower_example.txt).
- tools/[shmsnoop](tools/shmsnoop.py): Trace System V shared memory syscalls. [Examples](tools/shmsnoop_example.txt).
- tools/[sofdsnoop](tools/sofdsnoop.py): Trace FDs passed through unix sockets. [Examples](tools/sofdsnoop_example.txt).
- tools/[slabratetop](tools/slabratetop.py): Kernel SLAB/SLUB memory cache allocation rate top. [Examples](tools/slabratetop_example.txt).
- tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
- tools/[solisten](tools/solisten.py): Trace TCP socket listen. [Examples](tools/solisten_example.txt).
......@@ -138,8 +144,11 @@ pair of .c and .py files, and some are directories of files.
- tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt).
- tools/[tcpdrop](tools/tcpdrop.py): Trace kernel-based TCP packet drops with details. [Examples](tools/tcpdrop_example.txt).
- tools/[tcplife](tools/tcplife.py): Trace TCP sessions and summarize lifespan. [Examples](tools/tcplife_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[tcpstates](tools/tcpstates.py): Trace TCP session state changes with durations. [Examples](tools/tcpstates_example.txt).
- tools/[tcpsubnet](tools/tcpsubnet.py): Summarize and aggregate TCP send by subnet. [Examples](tools/tcpsubnet_example.txt).
- tools/[tcptop](tools/tcptop.py): Summarize TCP send/recv throughput by host. Top for TCP. [Examples](tools/tcptop_example.txt).
- tools/[tcptracer](tools/tcptracer.py): Trace TCP established connections (connect(), accept(), close()). [Examples](tools/tcptracer_example.txt).
- tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt).
......@@ -171,6 +180,12 @@ Examples:
- examples/networking/[tunnel_monitor/](examples/networking/tunnel_monitor): Efficiently monitor traffic flows. [Example video](https://www.youtube.com/watch?v=yYy3Cwce02k).
- examples/networking/vlan_learning/[vlan_learning.py](examples/networking/vlan_learning/vlan_learning.py) examples/[vlan_learning.c](examples/networking/vlan_learning/vlan_learning.c): Demux Ethernet traffic into worker veth+namespaces.
### BPF Introspection:
Tools that help to introspect BPF programs.
- introspection/[bps.c](introspection/bps.c): List all BPF programs loaded into the kernel. 'ps' for BPF programs. [Examples](introspection/bps_example.txt).
## Motivation
BPF guarantees that the programs loaded into the kernel cannot crash, and
......
......@@ -96,5 +96,6 @@ Command line tools for BPF Compiler Collection (BCC)
%exclude /usr/share/bcc/examples/*/*/*.pyo
%files -n bcc-tools
/usr/share/bcc/introspection/*
/usr/share/bcc/tools/*
/usr/share/bcc/man/*
......@@ -5,6 +5,31 @@
%else
%{!?with_lua: %global with_lua 1}
%endif
# use --with shared to only link against libLLVM.so
%if 0%{?fedora} >= 28 || 0%{?rhel} > 7
%bcond_without llvm_shared
%else
%bcond_with llvm_shared
%endif
# Build python3 support for distributions that have it
%if 0%{?fedora} >= 28 || 0%{?rhel} > 7
%bcond_without python3
%else
%bcond_with python3
%endif
%if %{with python3}
%global __python %{__python3}
%global python_bcc python3-bcc
%global python_cmds python2;python3
%else
%global __python %{__python2}
%global python_bcc python2-bcc
%global python_cmds python2
%endif
%define debug_package %{nil}
Name: bcc
......@@ -20,12 +45,18 @@ Source0: bcc.tar.gz
ExclusiveArch: x86_64 ppc64 aarch64 ppc64le
BuildRequires: bison cmake >= 2.8.7 flex make
BuildRequires: gcc gcc-c++ python2-devel elfutils-libelf-devel-static
%if %{with python3}
BuildRequires: python3-devel
%endif
%if %{with_lua}
BuildRequires: luajit luajit-devel
%endif
%if %{without local_clang_static}
BuildRequires: llvm-devel llvm-static
BuildRequires: llvm-devel
BuildRequires: clang-devel
%if %{without llvm_shared}
BuildRequires: llvm-static
%endif
%endif
BuildRequires: pkgconfig ncurses-devel
......@@ -48,13 +79,19 @@ mkdir build
pushd build
cmake .. -DREVISION_LAST=%{version} -DREVISION=%{version} \
-DCMAKE_INSTALL_PREFIX=/usr \
%{?lua_config}
%{?lua_config} \
-DPYTHON_CMD="%{python_cmds}" \
%{?with_llvm_shared:-DENABLE_LLVM_SHARED=1}
make %{?_smp_mflags}
popd
%install
pushd build
make install/strip DESTDIR=%{buildroot}
# mangle shebangs
find %{buildroot}/usr/share/bcc/{tools,examples} -type f -exec \
sed -i -e '1 s|^#!/usr/bin/python$|#!'%{__python}'|' \
-e '1 s|^#!/usr/bin/env python$|#!'%{__python}'|' {} \;
%package -n libbcc
Summary: Shared Library for BPF Compiler Collection (BCC)
......@@ -62,12 +99,22 @@ Requires: elfutils-libelf
%description -n libbcc
Shared Library for BPF Compiler Collection (BCC)
%package -n python-bcc
Summary: Python bindings for BPF Compiler Collection (BCC)
%package -n python2-bcc
Summary: Python2 bindings for BPF Compiler Collection (BCC)
Requires: libbcc = %{version}-%{release}
%description -n python-bcc
%{?python_provide:%python_provide python2-bcc}
%description -n python2-bcc
Python bindings for BPF Compiler Collection (BCC)
%if %{with python3}
%package -n python3-bcc
Summary: Python3 bindings for BPF Compiler Collection (BCC)
Requires: libbcc = %{version}-%{release}
%{?python_provide:%python_provide python3-bcc}
%description -n python3-bcc
Python bindings for BPF Compiler Collection (BCC)
%endif
%if %{with_lua}
%package -n bcc-lua
Summary: Standalone tool to run BCC tracers written in Lua
......@@ -78,7 +125,7 @@ Standalone tool to run BCC tracers written in Lua
%package -n libbcc-examples
Summary: Examples for BPF Compiler Collection (BCC)
Requires: python-bcc = %{version}-%{release}
Requires: %{python_bcc} = %{version}-%{release}
%if %{with_lua}
Requires: bcc-lua = %{version}-%{release}
%endif
......@@ -87,7 +134,7 @@ Examples for BPF Compiler Collection (BCC)
%package -n bcc-tools
Summary: Command line tools for BPF Compiler Collection (BCC)
Requires: python-bcc = %{version}-%{release}
Requires: %{python_bcc} = %{version}-%{release}
%description -n bcc-tools
Command line tools for BPF Compiler Collection (BCC)
......@@ -95,8 +142,13 @@ Command line tools for BPF Compiler Collection (BCC)
/usr/lib64/*
/usr/include/bcc/*
%files -n python-bcc
%{python_sitelib}/bcc*
%files -n python2-bcc
%{python2_sitelib}/bcc*
%if %{with python3}
%files -n python3-bcc
%{python3_sitelib}/bcc*
%endif
%if %{with_lua}
%files -n bcc-lua
......@@ -113,6 +165,7 @@ Command line tools for BPF Compiler Collection (BCC)
%exclude /usr/share/bcc/examples/*/*/*.pyo
%files -n bcc-tools
/usr/share/bcc/introspection/*
/usr/share/bcc/tools/*
/usr/share/bcc/man/*
......@@ -121,6 +174,11 @@ Command line tools for BPF Compiler Collection (BCC)
%postun -n libbcc -p /sbin/ldconfig
%changelog
* Wed Jul 18 2018 Brenden Blanco <bblanco@gmail.com> - 0.6.0-1
- Make python3 the default when possible
- Add with llvm_shared conditional
- Add python2/python3 package targets
* Mon Nov 21 2016 William Cohen <wcohen@redhat.com> - 0.2.0-1
- Revise bcc.spec to address rpmlint issues and build properly in Fedora koji.
......
......@@ -15,3 +15,16 @@ else()
endif()
set(CMAKE_REQUIRED_FLAGS "${_backup_c_flags}")
endif()
# check whether reallocarray availability
# this is used to satisfy reallocarray usage under src/cc/libbpf/
CHECK_CXX_SOURCE_COMPILES(
"
#define _GNU_SOURCE
#include <stdlib.h>
int main(void)
{
return !!reallocarray(NULL, 1, 1);
}
" HAVE_REALLOCARRAY_SUPPORT)
set(llvm_raw_libs bitwriter bpfcodegen irreader linker
mcjit objcarcopts option passes nativecodegen lto)
if(ENABLE_LLVM_SHARED)
set(llvm_libs "LLVM")
else()
set(llvm_raw_libs bitwriter bpfcodegen debuginfodwarf irreader linker
mcjit objcarcopts option passes lto)
if(ENABLE_LLVM_NATIVECODEGEN)
set(llvm_raw_libs ${llvm_raw_libs} nativecodegen)
endif()
list(FIND LLVM_AVAILABLE_LIBS "LLVMCoverage" _llvm_coverage)
if (${_llvm_coverage} GREATER -1)
list(APPEND llvm_raw_libs coverage)
......@@ -8,14 +14,25 @@ list(FIND LLVM_AVAILABLE_LIBS "LLVMCoroutines" _llvm_coroutines)
if (${_llvm_coroutines} GREATER -1)
list(APPEND llvm_raw_libs coroutines)
endif()
if (${LLVM_PACKAGE_VERSION} VERSION_EQUAL 6 OR ${LLVM_PACKAGE_VERSION} VERSION_GREATER 6)
list(APPEND llvm_raw_libs bpfasmparser)
list(APPEND llvm_raw_libs bpfdisassembler)
endif()
llvm_map_components_to_libnames(_llvm_libs ${llvm_raw_libs})
llvm_expand_dependencies(llvm_libs ${_llvm_libs})
endif()
# order is important
set(clang_libs
${libclangFrontend}
${libclangSerialization}
${libclangDriver}
${libclangDriver})
if (${LLVM_PACKAGE_VERSION} VERSION_EQUAL 8 OR ${LLVM_PACKAGE_VERSION} VERSION_GREATER 8)
list(APPEND clang_libs ${libclangASTMatchers})
endif()
list(APPEND clang_libs
${libclangParse}
${libclangSema}
${libclangCodeGen}
......
usr/share/bcc/introspection/*
usr/share/bcc/tools/*
usr/share/bcc/man/*
bcc (0.8.0-1) unstable; urgency=low
* Support for kernel up to 5.0
-- Brenden Blanco <bblanco@gmail.com> Fri, 11 Jan 2019 17:00:00 +0000
bcc (0.7.0-1) unstable; urgency=low
* Support for kernel up to 4.18
-- Brenden Blanco <bblanco@gmail.com> Tue, 04 Sep 2018 17:00:00 +0000
bcc (0.6.1-1) unstable; urgency=low
* Build support for Fedora 28 and Ubuntu 18.04
* Add option to change license
* Optimizations for some uses of bpf_probe_reads
-- Brenden Blanco <bblanco@gmail.com> Mon, 23 Jul 2018 17:00:00 +0000
bcc (0.6.0-1) unstable; urgency=low
* Support for kernel up to 4.17
* Many bugfixes
* Many new tools
* Improved python3 support
-- Brenden Blanco <bblanco@gmail.com> Wed, 13 Jun 2018 17:00:00 +0000
bcc (0.5.0-1) unstable; urgency=low
* Support for USDT in ARM64
* Bugfixes for 4.14 in some tools
* Fixes for smoke test failures
* Runtime memory usage reductions
-- Brenden Blanco <bblanco@gmail.com> Wed, 29 Nov 2017 17:00:00 +0000
bcc (0.4.0-1) unstable; urgency=low
* Bugfixes
* Support for kernel up to 4.14
-- Brenden Blanco <bblanco@gmail.com> Fri, 20 Oct 2017 17:00:00 +0000
bcc (0.3.0-1) unstable; urgency=low
* Many bugfixes
......
......@@ -3,16 +3,19 @@ Maintainer: Brenden Blanco <bblanco@plumgrid.com>
Section: misc
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), cmake, libllvm3.7 | libllvm3.8,
llvm-3.7-dev | llvm-3.8-dev, libclang-3.7-dev | libclang-3.8-dev,
Build-Depends: debhelper (>= 9), cmake,
libllvm3.7 [!arm64] | libllvm3.8 [!arm64] | libllvm6.0,
llvm-3.7-dev [!arm64] | llvm-3.8-dev [!arm64] | llvm-6.0-dev,
libclang-3.7-dev [!arm64] | libclang-3.8-dev [!arm64] | libclang-6.0-dev,
clang-format | clang-format-3.7 [!arm64] | clang-format-3.8 [!arm64] | clang-format-6.0,
libelf-dev, bison, flex, libfl-dev, libedit-dev, zlib1g-dev, git,
clang-format | clang-format-3.7 | clang-format-3.8, python (>= 2.7),
python-netaddr, python-pyroute2, luajit, libluajit-5.1-dev, arping,
inetutils-ping | iputils-ping, iperf, netperf, ethtool, devscripts
python (>= 2.7), python-netaddr, python-pyroute2, luajit,
libluajit-5.1-dev, arping, inetutils-ping | iputils-ping, iperf, netperf,
ethtool, devscripts, python3, dh-python
Homepage: https://github.com/iovisor/bcc
Package: libbcc
Architecture: amd64
Architecture: all
Depends: libc6, libstdc++6, libelf1
Description: Shared Library for BPF Compiler Collection (BCC)
Shared Library for BPF Compiler Collection to control BPF programs
......@@ -20,20 +23,25 @@ Description: Shared Library for BPF Compiler Collection (BCC)
Package: libbcc-examples
Architecture: any
Depends: libbcc
Description: Shared Library for BPF Compiler Collection (BCC)
Depends: libbcc (= ${binary:Version})
Description: Examples for BPF Compiler Collection (BCC)
Package: python-bcc
Architecture: all
Depends: libbcc, python, binutils
Depends: libbcc (= ${binary:Version}), python, binutils
Description: Python wrappers for BPF Compiler Collection (BCC)
Package: python3-bcc
Architecture: all
Depends: libbcc (= ${binary:Version}), python3, binutils
Description: Python3 wrappers for BPF Compiler Collection (BCC)
Package: bcc-tools
Architecture: all
Depends: python-bcc
Depends: python-bcc (= ${binary:Version})
Description: Command line tools for BPF Compiler Collection (BCC)
Package: bcc-lua
Architecture: all
Depends: libbcc
Depends: libbcc (= ${binary:Version})
Description: Standalone tool to run BCC tracers written in Lua
usr/lib/python*
usr/lib/python2*
......@@ -9,7 +9,7 @@ DEBIAN_REVISION := $(shell dpkg-parsechangelog | sed -rne "s,^Version: ([0-9.]+)
UPSTREAM_VERSION := $(shell dpkg-parsechangelog | sed -rne "s,^Version: ([0-9.]+)(~|-)(.*),\1,p")
%:
dh $@ --buildsystem=cmake --parallel
dh $@ --buildsystem=cmake --parallel --with python2,python3
# tests cannot be run in parallel
override_dh_auto_test:
......@@ -17,4 +17,4 @@ override_dh_auto_test:
# FIXME: LLVM_DEFINITIONS is broken somehow in LLVM cmake upstream
override_dh_auto_configure:
dh_auto_configure -- -DREVISION_LAST=$(UPSTREAM_VERSION) -DREVISION=$(UPSTREAM_VERSION) -DLLVM_DEFINITIONS="-D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS"
dh_auto_configure -- -DREVISION_LAST=$(UPSTREAM_VERSION) -DREVISION=$(UPSTREAM_VERSION) -DLLVM_DEFINITIONS="-D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS" -DPYTHON_CMD="python2;python3"
This diff is collapsed.
This diff is collapsed.
......@@ -41,7 +41,7 @@ Here is a generic checklist for performance investigations with bcc, first as a
These tools may be installed on your system under /usr/share/bcc/tools, or you can run them from the bcc github repo under /tools where they have a .py extension. Browse the 50+ tools available for more analysis options.
#### 1. execsnoop
#### 1.1 execsnoop
```
# ./execsnoop
......@@ -59,7 +59,7 @@ It works by tracing exec(), not the fork(), so it will catch many types of new p
More [examples](../tools/execsnoop_example.txt).
#### 2. opensnoop
#### 1.2. opensnoop
```
# ./opensnoop
......@@ -82,7 +82,7 @@ Files that are opened can tell you a lot about how applications work: identifyin
More [examples](../tools/opensnoop_example.txt).
#### 3. ext4slower (or btrfs\*, xfs\*, zfs\*)
#### 1.3. ext4slower (or btrfs\*, xfs\*, zfs\*)
```
# ./ext4slower
......@@ -103,7 +103,7 @@ Similar tools exist in bcc for other file systems: btrfsslower, xfsslower, and z
More [examples](../tools/ext4slower_example.txt).
#### 4. biolatency
#### 1.4. biolatency
```
# ./biolatency
......@@ -135,7 +135,7 @@ This is great for understanding disk I/O latency beyond the average times given
More [examples](../tools/biolatency_example.txt).
#### 5. biosnoop
#### 1.5. biosnoop
```
# ./biosnoop
......@@ -155,7 +155,7 @@ This allows you to examine disk I/O in more detail, and look for time-ordered pa
More [examples](../tools/biosnoop_example.txt).
#### 6. cachestat
#### 1.6. cachestat
```
# ./cachestat
......@@ -175,7 +175,7 @@ Use this to identify a low cache hit ratio, and a high rate of misses: which giv
More [examples](../tools/cachestat_example.txt).
#### 7. tcpconnect
#### 1.7. tcpconnect
```
# ./tcpconnect
......@@ -194,7 +194,7 @@ Look for unexpected connections that may point to inefficiencies in application
More [examples](../tools/tcpconnect_example.txt).
#### 8. tcpaccept
#### 1.8. tcpaccept
```
# ./tcpaccept
......@@ -211,7 +211,7 @@ Look for unexpected connections that may point to inefficiencies in application
More [examples](../tools/tcpaccept_example.txt).
#### 9. tcpretrans
#### 1.9. tcpretrans
```
# ./tcpretrans
......@@ -224,11 +224,11 @@ TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
tcprerans prints one line of output for every TCP retransmit packet, with details including source and destination addresses, and kernel state of the TCP connection.
TCP retransmissions cause latency and throughput issues. For ESTABLESHID retransmits, look for patterns with networks. For SYN_SENT, this may point to target kernel CPU saturation and kernel packet drops.
TCP retransmissions cause latency and throughput issues. For ESTABLISHED retransmits, look for patterns with networks. For SYN_SENT, this may point to target kernel CPU saturation and kernel packet drops.
More [examples](../tools/tcpretrans_example.txt).
#### 10. runqlat
#### 1.10. runqlat
```
# ./runqlat
......@@ -259,7 +259,7 @@ This can help quantify time lost waiting for a turn on CPU, during periods of CP
More [examples](../tools/runqlat_example.txt).
#### 11. profile
#### 1.11. profile
```
# ./profile
......@@ -306,6 +306,117 @@ Use this tool to understand the code paths that are consuming CPU resources.
More [examples](../tools/profile_example.txt).
### 2. Observatility with Generic Tools
In addition to the above tools for performance tuning, below is a checklist for bcc generic tools, first as a list, and in detail:
1. trace
1. argdist
1. funccount
These generic tools may be useful to provide visibility to solve your specific problems.
#### 2.1. trace
##### Example 1
Suppose you want to track file ownership change. There are three syscalls, `chown`, `fchown` and `lchown` which users can use to change file ownership. The corresponding syscall entry is `SyS_[f|l]chown`. The following command can be used to print out syscall parameters and the calling process user id. You can use `id` command to find the uid of a particular user.
```
$ trace.py \
'p::SyS_chown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_fchown "fd = %d, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_lchown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid'
PID TID COMM FUNC -
1269255 1269255 python3.6 SyS_lchown file = /tmp/dotsync-usisgezu/tmp, to_uid = 128203, to_gid = 100, from_uid = 128203
1269441 1269441 zstd SyS_chown file = /tmp/dotsync-vic7ygj0/dotsync-package.zst, to_uid = 128203, to_gid = 100, from_uid = 128203
1269255 1269255 python3.6 SyS_lchown file = /tmp/dotsync-a40zd7ev/tmp, to_uid = 128203, to_gid = 100, from_uid = 128203
1269442 1269442 zstd SyS_chown file = /tmp/dotsync-gzp413o_/dotsync-package.zst, to_uid = 128203, to_gid = 100, from_uid = 128203
1269255 1269255 python3.6 SyS_lchown file = /tmp/dotsync-whx4fivm/tmp/.bash_profile, to_uid = 128203, to_gid = 100, from_uid = 128203
```
##### Example 2
Suppose you want to count nonvoluntary context switches (`nvcsw`) in your bpf based performance monitoring tools and you do not know what is the proper method. `/proc/<pid>/status` already tells you the number (`nonvoluntary_ctxt_switches`) for a pid and you can use `trace.py` to do a quick experiment to verify your method. With kernel source code, the `nvcsw` is counted at file `linux/kernel/sched/core.c` function `__schedule` and under condition
```
!(!preempt && prev->state) // i.e., preempt || !prev->state
```
The `__schedule` function is marked as `notrace`, and the best place to evaluate the above condition seems in `sched/sched_switch` tracepoint called inside function `__schedule` and defined in `linux/include/trace/events/sched.h`. `trace.py` already has `args` being the pointer to the tracepoint `TP_STRUCT__entry`. The above condition in function `__schedule` can be represented as
```
args->prev_state == TASK_STATE_MAX || args->prev_state == 0
```
The below command can be used to count the involuntary context switches (per process or per pid) and compare to `/proc/<pid>/status` or `/proc/<pid>/task/<task_id>/status` for correctness, as in typical cases, involuntary context switches are not very common.
```
$ trace.py -p 1134138 't:sched:sched_switch (args->prev_state == TASK_STATE_MAX || args->prev_state == 0)'
PID TID COMM FUNC
1134138 1134140 contention_test sched_switch
1134138 1134142 contention_test sched_switch
...
$ trace.py -L 1134140 't:sched:sched_switch (args->prev_state == TASK_STATE_MAX || args->prev_state == 0)'
PID TID COMM FUNC
1134138 1134140 contention_test sched_switch
1134138 1134140 contention_test sched_switch
...
```
##### Example 3
This example is related to issue [1231](https://github.com/iovisor/bcc/issues/1231) and [1516](https://github.com/iovisor/bcc/issues/1516) where uprobe does not work at all in certain cases. First, you can do a `strace` as below
```
$ strace trace.py 'r:bash:readline "%s", retval'
...
perf_event_open(0x7ffd968212f0, -1, 0, -1, 0x8 /* PERF_FLAG_??? */) = -1 EIO (Input/output error)
...
```
The `perf_event_open` syscall returns `-EIO`. Digging into kernel uprobe related codes in `/kernel/trace` and `/kernel/events` directories to search `EIO`, the function `uprobe_register` is the most suspicious. Let us find whether this function is called or not and what is the return value if it is called. In one terminal using the following command to print out the return value of uprobe_register,
```
$ trace.py 'r::uprobe_register "ret = %d", retval'
```
In another terminal run the same bash uretprobe tracing example, and you should get
```
$ trace.py 'r::uprobe_register "ret = %d", retval'
PID TID COMM FUNC -
1041401 1041401 python2.7 uprobe_register ret = -5
```
The `-5` error code is EIO. This confirms that the following code in function `uprobe_register` is the most suspicious culprit.
```
if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
return -EIO;
```
The `shmem_mapping` function is defined as
```
bool shmem_mapping(struct address_space *mapping)
{
return mapping->a_ops == &shmem_aops;
}
```
To confirm the theory, find what is `inode->i_mapping->a_ops` with the following command
```
$ trace.py -I 'linux/fs.h' 'p::uprobe_register(struct inode *inode) "a_ops = %llx", inode->i_mapping->a_ops'
PID TID COMM FUNC -
814288 814288 python2.7 uprobe_register a_ops = ffffffff81a2adc0
^C$ grep ffffffff81a2adc0 /proc/kallsyms
ffffffff81a2adc0 R empty_aops
```
The kernel symbol `empty_aops` does not have `readpage` defined and hence the above suspicious condition is true. Further examining the kernel source code shows that `overlayfs` does not provide its own `a_ops` while some other file systems (e.g., ext4) define their own `a_ops` (e.g., `ext4_da_aops`), and `ext4_da_aops` defines `readpage`. Hence, uprobe works fine on ext4 while not on overlayfs.
More [examples](../tools/trace_example.txt).
#### 2.2. argdist
More [examples](../tools/argdist_example.txt).
#### 2.3. funccount
More [examples](../tools/funccount_example.txt).
## Networking
To do.
......@@ -74,7 +74,7 @@ int hello(void *ctx) {
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
......@@ -94,7 +94,7 @@ This is similar to hello_world.py, and traces new processes via sys_clone() agai
1. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it.
1. ```b.attach_kprobe(event="sys_clone", fn_name="hello")```: Creates a kprobe for the sys_clone() kernel function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.
1. ```b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")```: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.
1. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT().
......@@ -114,12 +114,12 @@ At time 0.10 s: multiple syncs detected, last 96 ms ago
This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py):
```Python
from __future__ import print_function
from bcc import BPF
# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
BPF_HASH(last);
......@@ -144,7 +144,7 @@ int do_trace(struct pt_regs *ctx) {
}
""")
b.attach_kprobe(event="sys_sync", fn_name="do_trace")
b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
print("Tracing for quick sync's... Ctrl-C to end")
# format output
......@@ -168,7 +168,7 @@ Things to learn:
### Lesson 5. sync_count.py
Modify the sync_timing.py program (prior lesson) to store the count of all sys_sync() calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.
Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.
### Lesson 6. disksnoop.py
......@@ -279,12 +279,12 @@ int hello(struct pt_regs *ctx) {
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# define output data structure in Python
TASK_COMM_LEN = 16 # linux/sched.h
class Data(ct.Structure):
_fields_ = [("pid", ct.c_ulonglong),
_fields_ = [("pid", ct.c_uint),
("ts", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN)]
......@@ -305,7 +305,7 @@ def print_event(cpu, data, size):
# loop with callback to print_event
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()
b.perf_buffer_poll()
```
Things to learn:
......@@ -319,7 +319,7 @@ Things to learn:
1. ```class Data(ct.Structure)```: Now define the Python version of the C data structure.
1. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream.
1. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream.
1. ```while 1: b.kprobe_poll()```: Block waiting for events.
1. ```while 1: b.perf_buffer_poll()```: Block waiting for events.
This may be improved in future bcc versions. Eg, the Python data struct could be auto-generated from the C code.
......@@ -349,6 +349,7 @@ Tracing... Hit Ctrl-C to end.
Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py):
```Python
from __future__ import print_function
from bcc import BPF
from time import sleep
......@@ -373,7 +374,7 @@ print("Tracing... Hit Ctrl-C to end.")
try:
sleep(99999999)
except KeyboardInterrupt:
print
print()
# output
b["dist"].print_log2_hist("kbytes")
......@@ -388,7 +389,7 @@ A recap from earlier lessons:
New things to learn:
1. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist".
1. ```dist.increment()```: Increments the histogram bucket index provided as an argument by one.
1. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument.
1. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram.
1. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient.
......@@ -462,6 +463,7 @@ TIME(s) COMM PID GOTBITS
Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py):
```Python
from __future__ import print_function
from bcc import BPF
# load BPF program
......@@ -470,7 +472,7 @@ TRACEPOINT_PROBE(random, urandom_read) {
// args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
bpf_trace_printk("%d\\n", args->got_bits);
return 0;
};
}
""")
# header
......@@ -544,6 +546,7 @@ These are various strings that are being processed by this library function whil
Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py):
```Python
from __future__ import print_function
from bcc import BPF
from time import sleep
......@@ -564,6 +567,7 @@ int count(struct pt_regs *ctx) {
u64 zero = 0, *val;
bpf_probe_read(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
// could also use `counts.increment(key)`
val = counts.lookup_or_init(&key, &zero);
(*val)++;
return 0;
......@@ -607,17 +611,22 @@ TIME(s) COMM PID ARGS
Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py):
```Python
from __future__ import print_function
from bcc import BPF, USDT
import sys
if len(sys.argv) < 2:
print("USAGE: nodejs_http_server PID")
exit()
pid = sys.argv[1]
debug = 0
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
uint64_t addr;
char path[128];
char path[128]={0};
bpf_usdt_readarg(6, ctx, &addr);
bpf_probe_read(&path, sizeof(path), (void *)addr);
bpf_trace_printk("path:%s\\n", path);
......@@ -628,6 +637,9 @@ int do_trace(struct pt_regs *ctx) {
# enable USDT probe from given PID
u = USDT(pid=int(pid))
u.enable_probe(probe="http__server__request", fn_name="do_trace")
if debug:
print(u.get_text())
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text, usdt_contexts=[u])
......@@ -671,9 +683,6 @@ struct key_t {
};
// map_type, key_type, leaf_type, table_name, num_entry
BPF_HASH(stats, struct key_t, u64, 1024);
// attach to finish_task_switch in kernel/sched/core.c, which has the following
// prototype:
// struct rq *finish_task_switch(struct task_struct *prev)
int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
struct key_t key = {};
u64 zero = 0, *val;
......@@ -681,6 +690,7 @@ int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
key.curr_pid = bpf_get_current_pid_tgid();
key.prev_pid = prev->pid;
// could also use `stats.increment(key);`
val = stats.lookup_or_init(&key, &zero);
(*val)++;
return 0;
......
......@@ -3,6 +3,7 @@
include_directories(${CMAKE_SOURCE_DIR}/src/cc)
include_directories(${CMAKE_SOURCE_DIR}/src/cc/api)
include_directories(${CMAKE_SOURCE_DIR}/src/cc/libbpf/include/uapi)
option(INSTALL_CPP_EXAMPLES "Install C++ examples. Those binaries are statically linked and can take plenty of disk space" OFF)
......@@ -27,6 +28,9 @@ target_link_libraries(LLCStat bcc-static)
add_executable(FollyRequestContextSwitch FollyRequestContextSwitch.cc)
target_link_libraries(FollyRequestContextSwitch bcc-static)
add_executable(UseExternalMap UseExternalMap.cc)
target_link_libraries(UseExternalMap bcc-static)
if(INSTALL_CPP_EXAMPLES)
install (TARGETS HelloWorld DESTINATION share/bcc/examples/cpp)
install (TARGETS CPUDistribution DESTINATION share/bcc/examples/cpp)
......@@ -35,4 +39,5 @@ if(INSTALL_CPP_EXAMPLES)
install (TARGETS RandomRead DESTINATION share/bcc/examples/cpp)
install (TARGETS LLCStat DESTINATION share/bcc/examples/cpp)
install (TARGETS FollyRequestContextSwitch DESTINATION share/bcc/examples/cpp)
install (TARGETS UseExternalMap DESTINATION share/bcc/examples/cpp)
endif(INSTALL_CPP_EXAMPLES)
......@@ -12,6 +12,7 @@
*/
#include <signal.h>
#include <functional>
#include <iostream>
#include <vector>
......@@ -59,35 +60,50 @@ void handle_output(void* cb_cookie, void* data, int data_size) {
<< event->new_addr << std::endl;
}
ebpf::BPF* bpf;
std::function<void(int)> shutdown_handler;
void signal_handler(int s) {
std::cerr << "Terminating..." << std::endl;
delete bpf;
exit(0);
}
void signal_handler(int s) { shutdown_handler(s); }
int main(int argc, char** argv) {
if (argc != 2) {
std::cout << "USAGE: FollyRequestContextSwitch PATH_TO_BINARY" << std::endl;
std::string binary;
pid_t pid = -1;
for (int i = 0; i < argc; i++) {
if (strncmp(argv[i], "--pid", 5) == 0) {
pid = std::stoi(argv[i + 1]);
i++;
continue;
}
if (strncmp(argv[i], "--binary", 8) == 0) {
binary = argv[i + 1];
i++;
continue;
}
}
if (pid <= 0 && binary.empty()) {
std::cout << "Must specify at least one of binary or PID:" << std::endl
<< "FollyRequestContextSwitch [--pid PID] [--binary BINARY]"
<< std::endl;
exit(1);
}
std::string binary_path(argv[1]);
bpf = new ebpf::BPF();
std::vector<ebpf::USDT> u;
u.emplace_back(binary_path, "folly", "request_context_switch_before",
ebpf::USDT u(binary, pid, "folly", "request_context_switch_before",
"on_context_switch");
auto init_res = bpf->init(BPF_PROGRAM, {}, u);
ebpf::BPF* bpf = new ebpf::BPF();
auto init_res = bpf->init(BPF_PROGRAM, {}, {u});
if (init_res.code() != 0) {
std::cerr << init_res.msg() << std::endl;
return 1;
}
auto attach_res = bpf->attach_usdt(u[0]);
auto attach_res = bpf->attach_usdt(u);
if (attach_res.code() != 0) {
std::cerr << attach_res.msg() << std::endl;
return 1;
} else {
std::cout << "Attached to USDT " << u;
}
auto open_res = bpf->open_perf_buffer("events", &handle_output);
......@@ -96,10 +112,20 @@ int main(int argc, char** argv) {
return 1;
}
shutdown_handler = [&](int s) {
std::cerr << "Terminating..." << std::endl;
bpf->detach_usdt(u);
delete bpf;
exit(0);
};
signal(SIGINT, signal_handler);
std::cout << "Started tracing, hit Ctrl-C to terminate." << std::endl;
auto perf_buffer = bpf->get_perf_buffer("events");
if (perf_buffer)
while (true)
bpf->poll_perf_buffer("events");
// 100ms timeout
perf_buffer->poll(100);
return 0;
}
......@@ -27,8 +27,9 @@ int main() {
std::ifstream pipe("/sys/kernel/debug/tracing/trace_pipe");
std::string line;
std::string clone_fnname = bpf.get_syscall_fnname("clone");
auto attach_res = bpf.attach_kprobe("sys_clone", "on_sys_clone");
auto attach_res = bpf.attach_kprobe(clone_fnname, "on_sys_clone");
if (attach_res.code() != 0) {
std::cerr << attach_res.msg() << std::endl;
return 1;
......@@ -38,7 +39,7 @@ int main() {
if (std::getline(pipe, line)) {
std::cout << line << std::endl;
// Detach the probe if we got at least one line.
auto detach_res = bpf.detach_kprobe("sys_clone");
auto detach_res = bpf.detach_kprobe(clone_fnname);
if (detach_res.code() != 0) {
std::cerr << detach_res.msg() << std::endl;
return 1;
......
......@@ -19,6 +19,10 @@ const std::string BPF_PROGRAM = R"(
#include <linux/sched.h>
#include <uapi/linux/ptrace.h>
#ifndef CGROUP_FILTER
#define CGROUP_FILTER 0
#endif
struct urandom_read_args {
// See /sys/kernel/debug/tracing/events/random/urandom_read/format
uint64_t common__unused;
......@@ -35,8 +39,12 @@ struct event_t {
};
BPF_PERF_OUTPUT(events);
BPF_CGROUP_ARRAY(cgroup, 1);
int on_urandom_read(struct urandom_read_args* attr) {
if (CGROUP_FILTER && (cgroup.check_current_task(0) != 1))
return 0;
struct event_t event = {};
event.pid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&event.comm, sizeof(event.comm));
......@@ -72,12 +80,29 @@ void signal_handler(int s) {
}
int main(int argc, char** argv) {
if (argc != 1 && argc != 2) {
std::cerr << "USAGE: RandomRead [cgroup2_path]" << std::endl;
return 1;
}
std::vector<std::string> cflags = {};
if (argc == 2)
cflags.emplace_back("-DCGROUP_FILTER=1");
bpf = new ebpf::BPF();
auto init_res = bpf->init(BPF_PROGRAM);
auto init_res = bpf->init(BPF_PROGRAM, cflags, {});
if (init_res.code() != 0) {
std::cerr << init_res.msg() << std::endl;
return 1;
}
if (argc == 2) {
auto cgroup_array = bpf->get_cgroup_array("cgroup");
auto update_res = cgroup_array.update_value(0, argv[1]);
if (update_res.code() != 0) {
std::cerr << update_res.msg() << std::endl;
return 1;
}
}
auto attach_res =
bpf->attach_tracepoint("random:urandom_read", "on_urandom_read");
......@@ -92,6 +117,12 @@ int main(int argc, char** argv) {
return 1;
}
// done with all initial work, free bcc memory
if (bpf->free_bcc_memory()) {
std::cerr << "Failed to free llvm/clang memory" << std::endl;
return 1;
}
signal(SIGINT, signal_handler);
std::cout << "Started tracing, hit Ctrl-C to terminate." << std::endl;
while (true)
......
......@@ -83,8 +83,9 @@ int main(int argc, char** argv) {
auto table_handle = bpf.get_hash_table<query_probe_t, int>("queries");
auto table = table_handle.get_table_offline();
std::sort(table.begin(), table.end(), [](std::pair<query_probe_t, int> a,
std::pair<query_probe_t, int> b) {
std::sort(
table.begin(), table.end(),
[](std::pair<query_probe_t, int> a, std::pair<query_probe_t, int> b) {
return a.first.ts < b.first.ts;
});
std::cout << table.size() << " queries recorded:" << std::endl;
......
......@@ -27,17 +27,15 @@ struct stack_key_t {
int kernel_stack;
};
BPF_STACK_TRACE(stack_traces, 10240)
BPF_STACK_TRACE(stack_traces, 16384);
BPF_HASH(counts, struct stack_key_t, uint64_t);
int on_tcp_send(struct pt_regs *ctx) {
struct stack_key_t key = {};
key.pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&key.name, sizeof(key.name));
key.kernel_stack = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID);
key.user_stack = stack_traces.get_stackid(
ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK
);
key.kernel_stack = stack_traces.get_stackid(ctx, 0);
key.user_stack = stack_traces.get_stackid(ctx, BPF_F_USER_STACK);
u64 zero = 0, *val;
val = counts.lookup_or_init(&key, &zero);
......@@ -76,39 +74,56 @@ int main(int argc, char** argv) {
std::cout << "Probing for " << probe_time << " seconds" << std::endl;
sleep(probe_time);
auto detach_res = bpf.detach_kprobe("tcp_sendmsg");
if (detach_res.code() != 0) {
std::cerr << detach_res.msg() << std::endl;
return 1;
}
auto table =
bpf.get_hash_table<stack_key_t, uint64_t>("counts").get_table_offline();
std::sort(table.begin(), table.end(), [](std::pair<stack_key_t, uint64_t> a,
std::pair<stack_key_t, uint64_t> b) {
return a.second < b.second;
});
std::sort(
table.begin(), table.end(),
[](std::pair<stack_key_t, uint64_t> a,
std::pair<stack_key_t, uint64_t> b) { return a.second < b.second; });
auto stacks = bpf.get_stack_table("stack_traces");
int lost_stacks = 0;
for (auto it : table) {
std::cout << "PID: " << it.first.pid << " (" << it.first.name << ") "
<< "made " << it.second
<< " TCP sends on following stack: " << std::endl;
std::cout << " Kernel Stack:" << std::endl;
if (it.first.kernel_stack >= 0) {
std::cout << " Kernel Stack:" << std::endl;
auto syms = stacks.get_stack_symbol(it.first.kernel_stack, -1);
for (auto sym : syms)
std::cout << " " << sym << std::endl;
} else
std::cout << " " << it.first.kernel_stack << std::endl;
std::cout << " User Stack:" << std::endl;
} else {
// -EFAULT normally means the stack is not availiable and not an error
if (it.first.kernel_stack != -EFAULT) {
lost_stacks++;
std::cout << " [Lost Kernel Stack" << it.first.kernel_stack << "]"
<< std::endl;
}
}
if (it.first.user_stack >= 0) {
std::cout << " User Stack:" << std::endl;
auto syms = stacks.get_stack_symbol(it.first.user_stack, it.first.pid);
for (auto sym : syms)
std::cout << " " << sym << std::endl;
} else
std::cout << " " << it.first.user_stack << std::endl;
} else {
// -EFAULT normally means the stack is not availiable and not an error
if (it.first.user_stack != -EFAULT) {
lost_stacks++;
std::cout << " [Lost User Stack " << it.first.user_stack << "]"
<< std::endl;
}
auto detach_res = bpf.detach_kprobe("tcp_sendmsg");
if (detach_res.code() != 0) {
std::cerr << detach_res.msg() << std::endl;
return 1;
}
}
if (lost_stacks > 0)
std::cout << "Total " << lost_stacks << " stack-traces lost due to "
<< "hash collision or stack table full" << std::endl;
return 0;
}
/*
* UseExternalMap shows how to access an external map through
* C++ interface. The external map could be a pinned map.
* This example simulates the pinned map through a locally
* created map by calling libbpf bcc_create_map.
*
* Copyright (c) Facebook, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*/
#include <stdint.h>
#include <iostream>
#include "BPF.h"
// Used by C++ get hash_table
struct sched_switch_info {
int prev_pid;
int next_pid;
char prev_comm[16];
char next_comm[16];
};
#define CHECK(condition, msg) \
({ \
if (condition) { \
std::cerr << msg << std::endl; \
return 1; \
} \
})
const std::string BPF_PROGRAM = R"(
#include <linux/sched.h>
struct sched_switch_info {
int prev_pid;
int next_pid;
char prev_comm[16];
char next_comm[16];
};
BPF_TABLE("extern", u32, u32, control, 1);
BPF_HASH(counts, struct sched_switch_info, u32);
int on_sched_switch(struct tracepoint__sched__sched_switch *args) {
struct sched_switch_info key = {};
u32 zero = 0, *val;
/* only do something when control is on */
val = control.lookup(&zero);
if (!val || *val == 0)
return 0;
/* record sched_switch info in counts table */
key.prev_pid = args->prev_pid;
key.next_pid = args->next_pid;
__builtin_memcpy(&key.prev_comm, args->prev_comm, 16);
__builtin_memcpy(&key.next_comm, args->next_comm, 16);
val = counts.lookup_or_init(&key, &zero);
(*val)++;
return 0;
}
)";
static void print_counts(ebpf::BPF *bpfp, std::string msg) {
auto counts_table_hdl =
bpfp->get_hash_table<struct sched_switch_info, uint32_t>("counts");
printf("%s\n", msg.c_str());
printf("%-8s %-16s %-8s %-16s %-4s\n", "PREV_PID", "PREV_COMM",
"CURR_PID", "CURR_COMM", "CNT");
for (auto it : counts_table_hdl.get_table_offline()) {
printf("%-8d (%-16s) ==> %-8d (%-16s): %-4d\n", it.first.prev_pid,
it.first.prev_comm, it.first.next_pid, it.first.next_comm,
it.second);
}
}
int main() {
int ctrl_map_fd;
uint32_t val;
// create a map through bcc_create_map, bcc knows nothing about this map.
ctrl_map_fd = bcc_create_map(BPF_MAP_TYPE_ARRAY, "control", sizeof(uint32_t),
sizeof(uint32_t), 1, 0);
CHECK(ctrl_map_fd < 0, "bcc_create_map failure");
// populate control map into TableStorage
std::unique_ptr<ebpf::TableStorage> local_ts =
ebpf::createSharedTableStorage();
ebpf::Path global_path({"control"});
ebpf::TableDesc table_desc("control", ebpf::FileDesc(ctrl_map_fd),
BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
sizeof(uint32_t), 1, 0);
local_ts->Insert(global_path, std::move(table_desc));
// constructor with the pre-populated table storage
ebpf::BPF bpf(0, &*local_ts);
auto res = bpf.init(BPF_PROGRAM);
CHECK(res.code(), res.msg());
// attach to the tracepoint sched:sched_switch
res = bpf.attach_tracepoint("sched:sched_switch", "on_sched_switch");
CHECK(res.code(), res.msg());
// wait for some scheduling events
sleep(1);
auto control_table_hdl = bpf.get_array_table<uint32_t>("control");
res = control_table_hdl.get_value(0, val);
CHECK(res.code() || val != 0, res.msg());
// we should not see any events here
print_counts(&bpf, "events with control off:");
printf("\n");
// change the control to on so bpf program starts to count events
val = 1;
res = control_table_hdl.update_value(0, val);
CHECK(res.code(), res.msg());
// verify we get the control on back
val = 0;
res = control_table_hdl.get_value(0, val);
CHECK(res.code() || val != 1, res.msg());
// wait for some scheduling events
sleep(1);
// we should see a bunch of events here
print_counts(&bpf, "events with control on:");
return 0;
}
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......@@ -8,4 +8,5 @@
from bcc import BPF
# This may not work for 4.17 on x64, you need replace kprobe__sys_clone with kprobe____x64_sys_clone
BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
......@@ -27,5 +27,5 @@ return function(BPF)
b:get_table("events"):open_perf_buffer(print_readline, "struct { uint64_t pid; char str[80]; }", nil)
print("%-9s %-6s %s" % {"TIME", "PID", "COMMAND"})
b:kprobe_poll_loop()
b:perf_buffer_poll_loop()
end
......@@ -26,7 +26,7 @@ struct alloc_info_t {
BPF_HASH(sizes, u64);
BPF_HASH(allocs, u64, struct alloc_info_t);
BPF_STACK_TRACE(stack_traces, 10240)
BPF_STACK_TRACE(stack_traces, 10240);
int alloc_enter(struct pt_regs *ctx, size_t size)
{
......
......@@ -27,7 +27,7 @@ struct key_t {
};
BPF_HASH(counts, struct key_t);
BPF_HASH(start, u32);
BPF_STACK_TRACE(stack_traces, 10240)
BPF_STACK_TRACE(stack_traces, 10240);
int oncpu(struct pt_regs *ctx, struct task_struct *prev) {
u32 pid;
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
......@@ -43,7 +43,7 @@ struct dns_char_t
} BPF_PACKET_HEADER;
struct Key {
unsigned char p[32];
unsigned char p[255];
};
struct Leaf {
......@@ -70,10 +70,6 @@ int dns_matching(struct __sk_buff *skb)
struct udp_t *udp = cursor_advance(cursor, sizeof(*udp));
if(udp->dport == 53){
// Our Cursor + the length of our udp packet - size of the udp header
// - the two 16bit values for QTYPE and QCLASS.
u8 *sentinel = cursor + udp->length - sizeof(*udp) - 4;
struct dns_hdr_t *dns_hdr = cursor_advance(cursor, sizeof(*dns_hdr));
// Do nothing if packet is not a request.
......@@ -84,22 +80,24 @@ int dns_matching(struct __sk_buff *skb)
u16 i = 0;
struct dns_char_t *c;
// This unroll worked not in latest BCC version.
for(u8 j = 0; i<255;i++){
if (cursor == sentinel) goto end; c = cursor_advance(cursor, 1); key.p[i++] = c->c;
#pragma unroll
for(i = 0; i<255;i++){
c = cursor_advance(cursor, 1);
if (c->c == 0)
break;
key.p[i] = c->c;
}
end:
{}
struct Leaf * lookup_leaf = cache.lookup(&key);
// If DNS name is contained in our map, drop packet.
// If DNS name is contained in our map, keep the packet
if(lookup_leaf) {
return 0;
bpf_trace_printk("Matched1\n");
return -1;
}
}
}
}
return -1;
// Drop the packet
return 0;
}
......@@ -4,30 +4,46 @@ from __future__ import print_function
from bcc import BPF
from ctypes import *
import sys
import socket
import os
import struct
import sys
import fcntl
import dnslib
import argparse
def encode_dns(name):
size = 32
if len(name) > 253:
if len(name) + 1 > 255:
raise Exception("DNS Name too long.")
b = bytearray(size)
i = 0;
elements = name.split(".")
for element in elements:
b[i] = struct.pack("!B", len(element))
i += 1
for j in range(0, len(element)):
b[i] = element[j]
i += 1
return (c_ubyte * size).from_buffer(b)
b = bytearray()
for element in name.split('.'):
sublen = len(element)
if sublen > 63:
raise ValueError('DNS label %s is too long' % element)
b.append(sublen)
b.extend(element.encode('ascii'))
b.append(0) # Add 0-len octet label for the root server
return b
def add_cache_entry(cache, name):
key = cache.Key()
key_len = len(key.p)
name_buffer = encode_dns(name)
# Pad the buffer with null bytes if it is too short
name_buffer.extend((0,) * (key_len - len(name_buffer)))
key.p = (c_ubyte * key_len).from_buffer(name_buffer)
leaf = cache.Leaf()
leaf.p = (c_ubyte * 4).from_buffer(bytearray(4))
cache[key] = leaf
parser = argparse.ArgumentParser(usage='For detailed information about usage,\
try with -h option')
req_args = parser.add_argument_group("Required arguments")
req_args.add_argument("-i", "--interface", type=str, default="",
help="Interface name, defaults to all if unspecified.")
req_args.add_argument("-d", "--domains", type=str, required=True, nargs="+",
help='List of domain names separated by space. For example: -d abc.def xyz.mno')
args = parser.parse_args()
# initialize BPF - load source code from http-parse-simple.c
bpf = BPF(src_file = "dns_matching.c", debug=0)
......@@ -39,19 +55,49 @@ bpf = BPF(src_file = "dns_matching.c", debug=0)
function_dns_matching = bpf.load_func("dns_matching", BPF.SOCKET_FILTER)
#create raw socket, bind it to eth0
#create raw socket, bind it to user provided interface
#attach bpf program to socket created
BPF.attach_raw_socket(function_dns_matching, "eth1")
BPF.attach_raw_socket(function_dns_matching, args.interface)
# Get the table.
cache = bpf.get_table("cache")
# Create first entry for foo.bar
key = cache.Key()
key.p = encode_dns("foo.bar")
leaf = cache.Leaf()
leaf.p = (c_ubyte * 4).from_buffer(bytearray(4))
cache[key] = leaf
bpf.trace_print()
# Add cache entries
for e in args.domains:
print(">>>> Adding map entry: ", e)
add_cache_entry(cache, e)
print("\nTry to lookup some domain names using nslookup from another terminal.")
print("For example: nslookup foo.bar")
print("\nBPF program will filter-in DNS packets which match with map entries.")
print("Packets received by user space program will be printed here")
print("\nHit Ctrl+C to end...")
socket_fd = function_dns_matching.sock
fl = fcntl.fcntl(socket_fd, fcntl.F_GETFL)
fcntl.fcntl(socket_fd, fcntl.F_SETFL, fl & (~os.O_NONBLOCK))
while 1:
#retrieve raw packet from socket
try:
packet_str = os.read(socket_fd, 2048)
except KeyboardInterrupt:
sys.exit(0)
packet_bytearray = bytearray(packet_str)
ETH_HLEN = 14
UDP_HLEN = 8
#IP HEADER
#calculate ip header length
ip_header_length = packet_bytearray[ETH_HLEN] #load Byte
ip_header_length = ip_header_length & 0x0F #mask bits 0..3
ip_header_length = ip_header_length << 2 #shift to obtain length
#calculate payload offset
payload_offset = ETH_HLEN + ip_header_length + UDP_HLEN
payload = packet_bytearray[payload_offset:]
# pass the payload to dnslib for parsing
dnsrec = dnslib.DNSRecord.parse(payload)
print (dnsrec.questions, "\n")
......@@ -56,6 +56,19 @@ int http_filter(struct __sk_buff *skb) {
struct Key key;
struct Leaf zero = {0};
//calculate ip header length
//value to multiply * 4
//e.g. ip->hlen = 5 ; IP Header Length = 5 x 4 byte = 20 byte
ip_header_length = ip->hlen << 2; //SHL 2 -> *4 multiply
//check ip header length against minimum
if (ip_header_length < sizeof(*ip)) {
goto DROP;
}
//shift cursor forward for dynamic ip header size
void *_ = cursor_advance(cursor, (ip_header_length-sizeof(*ip)));
struct tcp_t *tcp = cursor_advance(cursor, sizeof(*tcp));
//retrieve ip src/dest and port src/dest of current packet
......@@ -65,17 +78,12 @@ int http_filter(struct __sk_buff *skb) {
key.dst_port = tcp->dst_port;
key.src_port = tcp->src_port;
//calculate ip header length
//value to multiply * 4
//e.g. ip->hlen = 5 ; IP Header Length = 5 x 4 byte = 20 byte
ip_header_length = ip->hlen << 2; //SHL 2 -> *4 multiply
//calculate tcp header length
//value to multiply *4
//e.g. tcp->offset = 5 ; TCP Header Length = 5 x 4 byte = 20 byte
tcp_header_length = tcp->offset << 2; //SHL 2 -> *4 multiply
//calculate patload offset and length
//calculate payload offset and length
payload_offset = ETH_HLEN + ip_header_length + tcp_header_length;
payload_length = ip->tlen - ip_header_length - tcp_header_length;
......@@ -91,11 +99,8 @@ int http_filter(struct __sk_buff *skb) {
//direct access to skb not allowed
unsigned long p[7];
int i = 0;
int j = 0;
const int last_index = payload_offset + 7;
for (i = payload_offset ; i < last_index ; i++) {
p[j] = load_byte(skb , i);
j++;
for (i = 0; i < 7; i++) {
p[i] = load_byte(skb , payload_offset + i);
}
//find a match with an HTTP message
......
......@@ -34,19 +34,27 @@ int http_filter(struct __sk_buff *skb) {
u32 payload_offset = 0;
u32 payload_length = 0;
struct tcp_t *tcp = cursor_advance(cursor, sizeof(*tcp));
//calculate ip header length
//value to multiply * 4
//e.g. ip->hlen = 5 ; IP Header Length = 5 x 4 byte = 20 byte
ip_header_length = ip->hlen << 2; //SHL 2 -> *4 multiply
//check ip header length against minimum
if (ip_header_length < sizeof(*ip)) {
goto DROP;
}
//shift cursor forward for dynamic ip header size
void *_ = cursor_advance(cursor, (ip_header_length-sizeof(*ip)));
struct tcp_t *tcp = cursor_advance(cursor, sizeof(*tcp));
//calculate tcp header length
//value to multiply *4
//e.g. tcp->offset = 5 ; TCP Header Length = 5 x 4 byte = 20 byte
tcp_header_length = tcp->offset << 2; //SHL 2 -> *4 multiply
//calculate patload offset and length
//calculate payload offset and length
payload_offset = ETH_HLEN + ip_header_length + tcp_header_length;
payload_length = ip->tlen - ip_header_length - tcp_header_length;
......@@ -62,11 +70,8 @@ int http_filter(struct __sk_buff *skb) {
//direct access to skb not allowed
unsigned long p[7];
int i = 0;
int j = 0;
const int last_index = payload_offset + 7;
for (i = payload_offset ; i < last_index ; i++) {
p[j] = load_byte(skb , i);
j++;
for (i = 0; i < 7; i++) {
p[i] = load_byte(skb , payload_offset + i);
}
//find a match with an HTTP message
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
#!/usr/bin/env python
#!/usr/bin/python
#
# tc_perf_event.py Output skb and meta data through perf event
#
......@@ -54,8 +54,8 @@ def print_skb_event(cpu, data, size):
# Only print for echo request
if icmp_type == 128:
src_ip = bytes(skb_event.raw[22:38])
dst_ip = bytes(skb_event.raw[38:54])
src_ip = bytes(bytearray(skb_event.raw[22:38]))
dst_ip = bytes(bytearray(skb_event.raw[38:54]))
print("%-3s %-32s %-12s 0x%08x" %
(cpu, socket.inet_ntop(socket.AF_INET6, src_ip),
socket.inet_ntop(socket.AF_INET6, dst_ip),
......@@ -77,9 +77,9 @@ try:
parent="ffff:fff3", classid=1, direct_action=True)
b["skb_events"].open_perf_buffer(print_skb_event)
print('Try: "ping -6 ff02::1%me"\n')
print('Try: "ping6 ff02::1%me"\n')
print("%-3s %-32s %-12s %-10s" % ("CPU", "SRC IP", "DST IP", "Magic"))
while True:
b.kprobe_poll()
b.perf_buffer_poll()
finally:
if "me" in locals(): ipr.link("del", index=me)
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
......@@ -17,7 +17,7 @@ struct counters {
};
BPF_HASH(stats, struct ipkey, struct counters, 1024);
BPF_TABLE("prog", int, int, parser, 10);
BPF_PROG_ARRAY(parser, 10);
enum cb_index {
CB_FLAGS = 0,
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
# VLAN Filter #
This is eBPF application to parse VXLAN packets and then extracts encapsulated VLAN packets to monitor traffic from each VLAN. Extracted packet header fields can be stored in a file or sent to remote server via Apache Kafka messaging system.
Also part of this example is a simulation of a multi-host environment. Simulation environment can be setup by using test_setup.sh script. Then a sample script (traffic.sh) can be used to send traffic from one client (VLAN=100) from host1 talking to another client at host2 and one client (VLAN=200) from host2 talking to another client at host1 while running vlan_filter application in parallel by using command 'python data-plane-tracing -i veth7'.
![picture](scenario.jpg)
### Usage Example ###
* $ sudo python data-plane-tracing.py
Timestamp | Host Name | Host IP | IP Version | Source Host IP | Dest Host IP | Source Host Port | Dest Host Port | VNI | Source VM MAC | Dest VM MAC | VLAN ID | Source VM IP | Dest VM IP | Protocol | Source VM Port | Dest VM Port | Packet Length |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
2018-05-24 18:43:30.386228 | Box1 | x.x.x.x | 4 | 10.1.1.11 | 10.1.1.12 | 54836 | 4789 | 10 | fa:16:3e:ec:22:99 | fa:16:3e:1c:6f:2d | 100 | 192.168.100.11 | 192.168.100.12 | 6 | 1285 | 20302 | 1200
# Implementation overview #
Example application implementation is split into two parts: the former that exploits eBPF code, the latter that performs some additional processing in user space (python wrapper).
### First part: eBPF Filter ###
This component filters VXLAN packets.
The program is loaded as PROG_TYPE_SOCKET_FILTER and attached to a socket, bind to eth0.
Packets matching VXLAN filter are forwarded to the user space, while other packets are dropped.
### Python code in user space ###
The Python script reads filtered raw packets from the socket, extracts all the useful header fields and stores extracted packet into a file by default or can be sent to a remote server via Apache Kafka messaging system.
# How to execute this example application #
VLAN Filter application can be executed by using one of the below commands:
* $ sudo python data-plane-tracing.py
* $ sudo python data-plane-tracing -i eth2 -k vc.manage.overcloud:9092
# How to install Required Dependencies
* $ pip install kafka-python
* $ pip install netifaces
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
#define IP_TCP 6
#define IP_UDP 17
#define IP_ICMP 1
/*
In 802.3, both the source and destination addresses are 48 bits (4 bytes) MAC address.
6 bytes (src) + 6 bytes (dst) + 2 bytes (type) = 14 bytes
*/
#define ETH_HLEN 14
/*eBPF program.
Filter TCP/UDP/ICMP packets, having payload not empty
if the program is loaded as PROG_TYPE_SOCKET_FILTER
and attached to a socket
return 0 -> DROP the packet
return -1 -> KEEP the packet and return it to user space (userspace can read it from the socket_fd )
*/
int vlan_filter(struct __sk_buff *skb) {
u8 *cursor = 0;
struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
//filter IP packets (ethernet type = 0x0800) 0x0800 is IPv4 packet
switch(ethernet->type){
case 0x0800: goto IP;
default: goto DROP;
}
IP: ;
struct ip_t *ip = cursor_advance(cursor, sizeof(*ip)); // IP header (datagram)
switch (ip->nextp){
case 17: goto UDP;
default: goto DROP;
}
UDP: ;
struct udp_t *udp = cursor_advance(cursor, sizeof(*udp));
switch (udp->dport) {
case 4789: goto KEEP;
default: goto DROP;
}
//keep the packet and send it to userspace retruning -1
KEEP:
return -1;
//drop the packet returning 0
DROP:
return 0;
}
\ No newline at end of file
#!/usr/bin/python
from __future__ import print_function
from bcc import BPF
import sys
import socket
import os
import argparse
import time
import netifaces as ni
from sys import argv
from kafka import KafkaProducer
from kafka.errors import KafkaError
from datetime import datetime
#args
def usage():
print("USAGE: %s [-i <if_name>]" % argv[0])
print("")
print("Try '%s -h' for more options." % argv[0])
exit()
#help
def help():
print("USAGE: %s [-i <if_name>][-k <kafka_server_name:kafka_port>]" % argv[0])
print("")
print("optional arguments:")
print(" -h print this help")
print(" -i if_name select interface if_name. Default is eth0")
print(" -k kafka_server_name select kafka server name. Default is save to file")
print(" If -k option is not specified data will be saved to file.")
print("")
print("examples:")
print(" data-plane-tracing # bind socket to eth0")
print(" data-plane-tracing -i eno2 -k vc.manage.overcloud:9092 # bind socket to eno2 and send data to kafka server in iovisor-topic.")
exit()
#arguments
interface="eth0"
kafkaserver=''
#check provided arguments
if len(argv) == 2:
if str(argv[1]) == '-h':
help()
else:
usage()
if len(argv) == 3:
if str(argv[1]) == '-i':
interface = argv[2]
elif str(argv[1]) == '-k':
kafkaserver = argv[2]
else:
usage()
if len(argv) == 5:
if str(argv[1]) == '-i':
interface = argv[2]
kafkaserver = argv[4]
elif str(argv[1]) == '-k':
kafkaserver = argv[2]
interface = argv[4]
else:
usage()
if len(argv) > 5:
usage()
print ("binding socket to '%s'" % interface)
#initialize BPF - load source code from http-parse-simple.c
bpf = BPF(src_file = "data-plane-tracing.c", debug = 0)
#load eBPF program http_filter of type SOCKET_FILTER into the kernel eBPF vm
#more info about eBPF program types http://man7.org/linux/man-pages/man2/bpf.2.html
function_vlan_filter = bpf.load_func("vlan_filter", BPF.SOCKET_FILTER)
#create raw socket, bind it to eth0
#attach bpf program to socket created
BPF.attach_raw_socket(function_vlan_filter, interface)
#get file descriptor of the socket previously created inside BPF.attach_raw_socket
socket_fd = function_vlan_filter.sock
#create python socket object, from the file descriptor
sock = socket.fromfd(socket_fd,socket.PF_PACKET,socket.SOCK_RAW,socket.IPPROTO_IP)
#set it as blocking socket
sock.setblocking(True)
#get interface ip address. In case ip is not set then just add 127.0.0.1.
ni.ifaddresses(interface)
try:
ip = ni.ifaddresses(interface)[ni.AF_INET][0]['addr']
except:
ip = '127.0.0.1'
print("| Timestamp | Host Name | Host IP | IP Version | Source Host IP | Dest Host IP | Source Host Port | Dest Host Port | VNI | Source VM MAC | Dest VM MAC | VLAN ID | Source VM IP | Dest VM IP | Protocol | Source VM Port | Dest VM Port | Packet Length |")
while 1:
#retrieve raw packet from socket
packet_str = os.read(socket_fd, 2048)
#convert packet into bytearray
packet_bytearray = bytearray(packet_str)
#ethernet header length
ETH_HLEN = 14
#VXLAN header length
VXLAN_HLEN = 8
#VLAN header length
VLAN_HLEN = 4
#Inner TCP/UDP header length
TCP_HLEN = 20
UDP_HLEN = 8
#calculate packet total length
total_length = packet_bytearray[ETH_HLEN + 2] #load MSB
total_length = total_length << 8 #shift MSB
total_length = total_length + packet_bytearray[ETH_HLEN+3] #add LSB
#calculate ip header length
ip_header_length = packet_bytearray[ETH_HLEN] #load Byte
ip_header_length = ip_header_length & 0x0F #mask bits 0..3
ip_header_length = ip_header_length << 2 #shift to obtain length
#calculate payload offset
payload_offset = ETH_HLEN + ip_header_length + UDP_HLEN + VXLAN_HLEN
#parsing ip version from ip packet header
ipversion = str(bin(packet_bytearray[14])[2:5])
#parsing source ip address, destination ip address from ip packet header
src_host_ip = str(packet_bytearray[26]) + "." + str(packet_bytearray[27]) + "." + str(packet_bytearray[28]) + "." + str(packet_bytearray[29])
dest_host_ip = str(packet_bytearray[30]) + "." + str(packet_bytearray[31]) + "." + str(packet_bytearray[32]) + "." + str(packet_bytearray[33])
#parsing source port and destination port
src_host_port = packet_bytearray[34] << 8 | packet_bytearray[35]
dest_host_port = packet_bytearray[36] << 8 | packet_bytearray[37]
#parsing VNI from VXLAN header
VNI = str((packet_bytearray[46])+(packet_bytearray[47])+(packet_bytearray[48]))
#parsing source mac address and destination mac address
mac_add = [packet_bytearray[50], packet_bytearray[51], packet_bytearray[52], packet_bytearray[53], packet_bytearray[54], packet_bytearray[55]]
src_vm_mac = ":".join(map(lambda b: format(b, "02x"), mac_add))
mac_add = [packet_bytearray[56], packet_bytearray[57], packet_bytearray[58], packet_bytearray[59], packet_bytearray[60], packet_bytearray[61]]
dest_vm_mac = ":".join(map(lambda b: format(b, "02x"), mac_add))
#parsing VLANID from VLAN header
VLANID=""
VLANID = str((packet_bytearray[64])+(packet_bytearray[65]))
#parsing source vm ip address, destination vm ip address from encapsulated ip packet header
src_vm_ip = str(packet_bytearray[80]) + "." + str(packet_bytearray[81]) + "." + str(packet_bytearray[82]) + "." + str(packet_bytearray[83])
dest_vm_ip = str(packet_bytearray[84]) + "." + str(packet_bytearray[85]) + "." + str(packet_bytearray[86]) + "." + str(packet_bytearray[87])
#parsing source port and destination port
if (packet_bytearray[77]==6 or packet_bytearray[77]==17):
src_vm_port = packet_bytearray[88] << 8 | packet_bytearray[88]
dest_vm_port = packet_bytearray[90] << 8 | packet_bytearray[91]
elif (packet_bytearray[77]==1):
src_vm_port = -1
dest_vm_port = -1
type = str(packet_bytearray[88])
else:
continue
timestamp = str(datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S.%f'))
#send data to remote server via Kafka Messaging Bus
if kafkaserver:
MESSAGE = (timestamp, socket.gethostname(),ip, str(int(ipversion, 2)), str(src_host_ip), str(dest_host_ip), str(src_host_port), str(dest_host_port), str(int(VNI)), str(src_vm_mac), str(dest_vm_mac), str(int(VLANID)), src_vm_ip, dest_vm_ip, str(packet_bytearray[77]), str(src_vm_port), str(dest_vm_port), str(total_length))
print (MESSAGE)
MESSAGE = ','.join(MESSAGE)
MESSAGE = MESSAGE.encode()
producer = KafkaProducer(bootstrap_servers=[kafkaserver])
producer.send('iovisor-topic', key=b'iovisor', value=MESSAGE)
#save data to files
else:
MESSAGE = timestamp+","+socket.gethostname()+","+ip+","+str(int(ipversion, 2))+","+src_host_ip+","+dest_host_ip+","+str(src_host_port)+","+str(dest_host_port)+","+str(int(VNI))+","+str(src_vm_mac)+","+str(dest_vm_mac)+","+str(int(VLANID))+","+src_vm_ip+","+dest_vm_ip+","+str(packet_bytearray[77])+","+str(src_vm_port)+","+str(dest_vm_port)+","+str(total_length)
print (MESSAGE)
#save data to a file on hour basis
filename = "./vlan-data-"+time.strftime("%Y-%m-%d-%H")+"-00"
with open(filename, "a") as f:
f.write("%s\n" % MESSAGE)
#!/bin/bash
# This script must be executed by root user
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
# add namespaces
ip netns add netns11
ip netns add netns12
ip netns add netns21
ip netns add netns22
ip netns add netns3
ip netns add netns4
# set up veth devices in netns11 to netns21 with connection to netns3
ip link add veth11 type veth peer name veth13
ip link add veth21 type veth peer name veth23
ip link set veth11 netns netns11
ip link set veth21 netns netns21
ip link set veth13 netns netns3
ip link set veth23 netns netns3
# set up veth devices in netns12 and netns22 with connection to netns4
ip link add veth12 type veth peer name veth14
ip link add veth22 type veth peer name veth24
ip link set veth12 netns netns12
ip link set veth22 netns netns22
ip link set veth14 netns netns4
ip link set veth24 netns netns4
# assign IP addresses and set the devices up
ip netns exec netns11 ifconfig veth11 192.168.100.11/24 up
ip netns exec netns11 ip link set lo up
ip netns exec netns12 ifconfig veth12 192.168.100.12/24 up
ip netns exec netns12 ip link set lo up
ip netns exec netns21 ifconfig veth21 192.168.200.21/24 up
ip netns exec netns21 ip link set lo up
ip netns exec netns22 ifconfig veth22 192.168.200.22/24 up
ip netns exec netns22 ip link set lo up
# set up bridge brx and its ports
ip netns exec netns3 brctl addbr brx
ip netns exec netns3 ip link set brx up
ip netns exec netns3 ip link set veth13 up
ip netns exec netns3 ip link set veth23 up
ip netns exec netns3 brctl addif brx veth13
ip netns exec netns3 brctl addif brx veth23
# set up bridge bry and its ports
ip netns exec netns4 brctl addbr bry
ip netns exec netns4 ip link set bry up
ip netns exec netns4 ip link set veth14 up
ip netns exec netns4 ip link set veth24 up
ip netns exec netns4 brctl addif bry veth14
ip netns exec netns4 brctl addif bry veth24
# create veth devices to connect the bridges
ip link add vethx type veth peer name vethx11
ip link add vethy type veth peer name vethy11
ip link set vethx netns netns3
ip link set vethx11 netns netns3
ip link set vethy netns netns4
ip link set vethy11 netns netns4
ip netns exec netns3 brctl addif brx vethx
ip netns exec netns3 ip link set vethx up
ip netns exec netns3 bridge vlan add vid 100 tagged dev vethx
ip netns exec netns3 bridge vlan add vid 200 tagged dev vethx
ip netns exec netns3 bridge vlan del vid 1 dev vethx
ip netns exec netns3 bridge vlan show
ip netns exec netns4 brctl addif bry vethy
ip netns exec netns4 ip link set vethy up
ip netns exec netns4 bridge vlan add vid 100 tagged dev vethy
ip netns exec netns4 bridge vlan add vid 200 tagged dev vethy
ip netns exec netns4 bridge vlan del vid 1 dev vethy
ip netns exec netns4 bridge vlan show
ip netns exec netns3 ip link set dev brx type bridge vlan_filtering 1
ip netns exec netns4 ip link set dev bry type bridge vlan_filtering 1
ip netns exec netns3 bridge vlan del vid 1 dev brx self
ip netns exec netns4 bridge vlan del vid 1 dev bry self
ip netns exec netns3 bridge vlan show
ip netns exec netns4 bridge vlan show
ip netns exec netns3 bridge vlan add vid 100 pvid untagged dev veth13
ip netns exec netns3 bridge vlan add vid 200 pvid untagged dev veth23
ip netns exec netns4 bridge vlan add vid 100 pvid untagged dev veth14
ip netns exec netns4 bridge vlan add vid 200 pvid untagged dev veth24
ip netns exec netns3 bridge vlan del vid 1 dev veth13
ip netns exec netns3 bridge vlan del vid 1 dev veth23
ip netns exec netns4 bridge vlan del vid 1 dev veth14
ip netns exec netns4 bridge vlan del vid 1 dev veth24
# set up bridge brvx and its ports
ip netns exec netns3 brctl addbr brvx
ip netns exec netns3 ip link set brvx up
ip netns exec netns3 ip link set vethx11 up
ip netns exec netns3 brctl addif brvx vethx11
# set up bridge brvy and its ports
ip netns exec netns4 brctl addbr brvy
ip netns exec netns4 ip link set brvy up
ip netns exec netns4 ip link set vethy11 up
ip netns exec netns4 brctl addif brvy vethy11
# create veth devices to connect the vxlan bridges
ip link add veth3 type veth peer name veth4
ip link add veth5 type veth peer name veth6
ip link set veth3 netns netns3
ip link set veth5 netns netns4
ip netns exec netns3 ip link set veth3 up
ip netns exec netns4 ip link set veth5 up
ip link set veth4 up
ip link set veth6 up
ip netns exec netns3 ifconfig veth3 10.1.1.11/24 up
ip netns exec netns4 ifconfig veth5 10.1.1.12/24 up
# add vxlan ports
ip netns exec netns3 ip link add vxlan-10 type vxlan id 10 remote 10.1.1.12 dstport 4789 dev veth3
ip netns exec netns4 ip link add vxlan-10 type vxlan id 10 remote 10.1.1.11 dstport 4789 dev veth5
ip netns exec netns3 ip link set vxlan-10 up
ip netns exec netns4 ip link set vxlan-10 up
ip netns exec netns3 brctl addif brvx vxlan-10
ip netns exec netns4 brctl addif brvy vxlan-10
# create veth devices to connect the vxlan bridges
ip link add veth7 type veth peer name veth8
ip link set veth7 up
ip link set veth8 up
# set up bridge brjx and its ports
brctl addbr brjx
ip link set brjx up
ip link set veth4 up
brctl addif brjx veth4
brctl addif brjx veth7
# set up bridge brjy and its ports
brctl addbr brjy
ip link set brjy up
ip link set veth6 up
brctl addif brjy veth6
brctl addif brjy veth8
#!/bin/bash
ip netns exec netns11 ping 192.168.100.12 -c 10
ip netns exec netns22 ping 192.168.200.21 -c 10
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......
#!/usr/bin/env python
#!/usr/bin/python
#
# xdp_drop_count.py Drop incoming packets on XDP layer and count for which
# protocol type
......@@ -96,6 +96,9 @@ int xdp_prog1(struct CTXTYPE *ctx) {
h_proto = eth->h_proto;
// parse double vlans
#pragma unroll
for (int i=0; i<2; i++) {
if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
struct vlan_hdr *vhdr;
......@@ -105,14 +108,6 @@ int xdp_prog1(struct CTXTYPE *ctx) {
return rc;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
struct vlan_hdr *vhdr;
vhdr = data + nh_off;
nh_off += sizeof(struct vlan_hdr);
if (data + nh_off > data_end)
return rc;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == htons(ETH_P_IP))
......
#!/usr/bin/python
#
# xdp_macswap_count.py Swap Source and Destination MAC addresses on
# incoming packets and transmit packets back on
# same interface in XDP layer and count for which
# protocol type
#
# Copyright (c) 2016 PLUMgrid
# Copyright (c) 2016 Jan Ruth
# Copyright (c) 2018 Andy Gospodarek
# Licensed under the Apache License, Version 2.0 (the "License")
from bcc import BPF
import pyroute2
import time
import sys
flags = 0
def usage():
print("Usage: {0} [-S] <ifdev>".format(sys.argv[0]))
print(" -S: use skb mode\n")
print("e.g.: {0} eth0\n".format(sys.argv[0]))
exit(1)
if len(sys.argv) < 2 or len(sys.argv) > 3:
usage()
if len(sys.argv) == 2:
device = sys.argv[1]
if len(sys.argv) == 3:
if "-S" in sys.argv:
# XDP_FLAGS_SKB_MODE
flags |= 2 << 0
if "-S" == sys.argv[1]:
device = sys.argv[2]
else:
device = sys.argv[1]
mode = BPF.XDP
#mode = BPF.SCHED_CLS
if mode == BPF.XDP:
ret = "XDP_TX"
ctxtype = "xdp_md"
else:
ret = "TC_ACT_SHOT"
ctxtype = "__sk_buff"
# load BPF program
b = BPF(text = """
#define KBUILD_MODNAME "foo"
#include <uapi/linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <linux/if_vlan.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
BPF_TABLE("percpu_array", uint32_t, long, dropcnt, 256);
static inline int parse_ipv4(void *data, u64 nh_off, void *data_end) {
struct iphdr *iph = data + nh_off;
if ((void*)&iph[1] > data_end)
return 0;
return iph->protocol;
}
static inline int parse_ipv6(void *data, u64 nh_off, void *data_end) {
struct ipv6hdr *ip6h = data + nh_off;
if ((void*)&ip6h[1] > data_end)
return 0;
return ip6h->nexthdr;
}
static void swap_src_dst_mac(void *data)
{
unsigned short *p = data;
unsigned short dst[3];
dst[0] = p[0];
dst[1] = p[1];
dst[2] = p[2];
p[0] = p[3];
p[1] = p[4];
p[2] = p[5];
p[3] = dst[0];
p[4] = dst[1];
p[5] = dst[2];
}
int xdp_prog1(struct CTXTYPE *ctx) {
void* data_end = (void*)(long)ctx->data_end;
void* data = (void*)(long)ctx->data;
struct ethhdr *eth = data;
// drop packets
int rc = RETURNCODE; // let pass XDP_PASS or redirect to tx via XDP_TX
long *value;
uint16_t h_proto;
uint64_t nh_off = 0;
uint32_t index;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
return rc;
h_proto = eth->h_proto;
if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
struct vlan_hdr *vhdr;
vhdr = data + nh_off;
nh_off += sizeof(struct vlan_hdr);
if (data + nh_off > data_end)
return rc;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
struct vlan_hdr *vhdr;
vhdr = data + nh_off;
nh_off += sizeof(struct vlan_hdr);
if (data + nh_off > data_end)
return rc;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == htons(ETH_P_IP))
index = parse_ipv4(data, nh_off, data_end);
else if (h_proto == htons(ETH_P_IPV6))
index = parse_ipv6(data, nh_off, data_end);
else
index = 0;
if (h_proto == IPPROTO_UDP) {
swap_src_dst_mac(data);
rc = XDP_TX;
}
value = dropcnt.lookup(&index);
if (value)
*value += 1;
return rc;
}
""", cflags=["-w", "-DRETURNCODE=%s" % ret, "-DCTXTYPE=%s" % ctxtype])
fn = b.load_func("xdp_prog1", mode)
if mode == BPF.XDP:
b.attach_xdp(device, fn, flags)
else:
ip = pyroute2.IPRoute()
ipdb = pyroute2.IPDB(nl=ip)
idx = ipdb.interfaces[device].index
ip.tc("add", "clsact", idx)
ip.tc("add-filter", "bpf", idx, ":1", fd=fn.fd, name=fn.name,
parent="ffff:fff2", classid=1, direct_action=True)
dropcnt = b.get_table("dropcnt")
prev = [0] * 256
print("Printing drops per IP protocol-number, hit CTRL+C to stop")
while 1:
try:
for k in dropcnt.keys():
val = dropcnt.sum(k).value
i = k.value
if val:
delta = val - prev[i]
prev[i] = val
print("{}: {} pkt/s".format(i, delta))
time.sleep(1)
except KeyboardInterrupt:
print("Removing filter from device")
break;
if mode == BPF.XDP:
b.remove_xdp(device, flags)
else:
ip.tc("del", "clsact", idx)
ipdb.release()
#!/usr/bin/python
#
# xdp_redirect_cpu.py Redirect the incoming packet to the specific CPU
#
# Copyright (c) 2018 Gary Lin
# Licensed under the Apache License, Version 2.0 (the "License")
from bcc import BPF
import time
import sys
from multiprocessing import cpu_count
import ctypes as ct
flags = 0
def usage():
print("Usage: {0} <in ifdev> <CPU id>".format(sys.argv[0]))
print("e.g.: {0} eth0 2\n".format(sys.argv[0]))
exit(1)
if len(sys.argv) != 3:
usage()
in_if = sys.argv[1]
cpu_id = int(sys.argv[2])
max_cpu = cpu_count()
if (cpu_id > max_cpu):
print("Invalid CPU id")
exit(1)
# load BPF program
b = BPF(text = """
#define KBUILD_MODNAME "foo"
#include <uapi/linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
BPF_CPUMAP(cpumap, __MAX_CPU__);
BPF_ARRAY(dest, uint32_t, 1);
BPF_PERCPU_ARRAY(rxcnt, long, 1);
int xdp_redirect_cpu(struct xdp_md *ctx) {
void* data_end = (void*)(long)ctx->data_end;
void* data = (void*)(long)ctx->data;
struct ethhdr *eth = data;
uint32_t key = 0;
long *value;
uint32_t *cpu;
uint64_t nh_off;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
return XDP_DROP;
cpu = dest.lookup(&key);
if (!cpu)
return XDP_PASS;
value = rxcnt.lookup(&key);
if (value)
*value += 1;
return cpumap.redirect_map(*cpu, 0);
}
int xdp_dummy(struct xdp_md *ctx) {
return XDP_PASS;
}
""", cflags=["-w", "-D__MAX_CPU__=%u" % max_cpu], debug=0)
dest = b.get_table("dest")
dest[0] = ct.c_uint32(cpu_id)
cpumap = b.get_table("cpumap")
cpumap[cpu_id] = ct.c_uint32(192)
in_fn = b.load_func("xdp_redirect_cpu", BPF.XDP)
b.attach_xdp(in_if, in_fn, flags)
rxcnt = b.get_table("rxcnt")
prev = 0
print("Printing redirected packets, hit CTRL+C to stop")
while 1:
try:
val = rxcnt.sum(0).value
if val:
delta = val - prev
prev = val
print("{} pkt/s to CPU {}".format(delta, cpu_id))
time.sleep(1)
except KeyboardInterrupt:
print("Removing filter from device")
break
b.remove_xdp(in_if, flags)
#!/usr/bin/python
#
# xdp_redirect_map.py Redirect the incoming packet to another interface
# with the helper: bpf_redirect_map()
#
# Copyright (c) 2018 Gary Lin
# Licensed under the Apache License, Version 2.0 (the "License")
from bcc import BPF
import pyroute2
import time
import sys
import ctypes as ct
flags = 0
def usage():
print("Usage: {0} <in ifdev> <out ifdev>".format(sys.argv[0]))
print("e.g.: {0} eth0 eth1\n".format(sys.argv[0]))
exit(1)
if len(sys.argv) != 3:
usage()
in_if = sys.argv[1]
out_if = sys.argv[2]
ip = pyroute2.IPRoute()
out_idx = ip.link_lookup(ifname=out_if)[0]
# load BPF program
b = BPF(text = """
#define KBUILD_MODNAME "foo"
#include <uapi/linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
BPF_DEVMAP(tx_port, 1);
BPF_PERCPU_ARRAY(rxcnt, long, 1);
static inline void swap_src_dst_mac(void *data)
{
unsigned short *p = data;
unsigned short dst[3];
dst[0] = p[0];
dst[1] = p[1];
dst[2] = p[2];
p[0] = p[3];
p[1] = p[4];
p[2] = p[5];
p[3] = dst[0];
p[4] = dst[1];
p[5] = dst[2];
}
int xdp_redirect_map(struct xdp_md *ctx) {
void* data_end = (void*)(long)ctx->data_end;
void* data = (void*)(long)ctx->data;
struct ethhdr *eth = data;
uint32_t key = 0;
long *value;
uint64_t nh_off;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
return XDP_DROP;
value = rxcnt.lookup(&key);
if (value)
*value += 1;
swap_src_dst_mac(data);
return tx_port.redirect_map(0, 0);
}
int xdp_dummy(struct xdp_md *ctx) {
return XDP_PASS;
}
""", cflags=["-w"])
tx_port = b.get_table("tx_port")
tx_port[0] = ct.c_int(out_idx)
in_fn = b.load_func("xdp_redirect_map", BPF.XDP)
out_fn = b.load_func("xdp_dummy", BPF.XDP)
b.attach_xdp(in_if, in_fn, flags)
b.attach_xdp(out_if, out_fn, flags)
rxcnt = b.get_table("rxcnt")
prev = 0
print("Printing redirected packets, hit CTRL+C to stop")
while 1:
try:
val = rxcnt.sum(0).value
if val:
delta = val - prev
prev = val
print("{} pkt/s".format(delta))
time.sleep(1)
except KeyboardInterrupt:
print("Removing filter from device")
break;
b.remove_xdp(in_if, flags)
b.remove_xdp(out_if, flags)
......@@ -3,16 +3,17 @@
# bitehist.py Block I/O size histogram.
# For Linux, uses BCC, eBPF. Embedded C.
#
# Written as a basic example of using a histogram to show a distribution.
# Written as a basic example of using histograms to show a distribution.
#
# The default interval is 5 seconds. A Ctrl-C will print the partially
# gathered histogram then exit.
# A Ctrl-C will print the gathered histogram then exit.
#
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 15-Aug-2015 Brendan Gregg Created this.
# 03-Feb-2019 Xiaozhou Liu added linear histogram.
from __future__ import print_function
from bcc import BPF
from time import sleep
......@@ -22,10 +23,12 @@ b = BPF(text="""
#include <linux/blkdev.h>
BPF_HISTOGRAM(dist);
BPF_HISTOGRAM(dist_linear);
int kprobe__blk_account_io_completion(struct pt_regs *ctx, struct request *req)
{
dist.increment(bpf_log2l(req->__data_len / 1024));
dist_linear.increment(req->__data_len / 1024);
return 0;
}
""")
......@@ -37,7 +40,13 @@ print("Tracing... Hit Ctrl-C to end.")
try:
sleep(99999999)
except KeyboardInterrupt:
print
print()
# output
print("log2 histogram")
print("~~~~~~~~~~~~~~")
b["dist"].print_log2_hist("kbytes")
print("\nlinear histogram")
print("~~~~~~~~~~~~~~~~")
b["dist_linear"].print_linear_hist("kbytes")
#!/usr/bin/python
#
# dddos.py DDOS dectection system.
#
# Written as a basic tracing example of using ePBF
# to detect a potential DDOS attack against a system.
#
# Copyright (c) 2019 Jugurtha BELKALEM.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 14-Jan-2019 Jugurtha BELKALEM Created this.
from bcc import BPF
import ctypes as ct
import datetime
prog = """
#include <linux/skbuff.h>
#include <uapi/linux/ip.h>
#define MAX_NB_PACKETS 1000
#define LEGAL_DIFF_TIMESTAMP_PACKETS 1000000
BPF_HASH(rcv_packets);
struct detectionPackets {
u64 nb_ddos_packets;
};
BPF_PERF_OUTPUT(events);
int detect_ddos(struct pt_regs *ctx, void *skb){
struct detectionPackets detectionPacket = {};
// Used to count number of received packets
u64 rcv_packets_nb_index = 0, rcv_packets_nb_inter=1, *rcv_packets_nb_ptr;
// Used to measure elapsed time between 2 successive received packets
u64 rcv_packets_ts_index = 1, rcv_packets_ts_inter=0, *rcv_packets_ts_ptr;
/* The algorithm analyses packets received by ip_rcv function
* and measures the difference in reception time between each packet.
* DDOS flooders send millions of packets such that difference of
* timestamp between 2 successive packets is so small
* (which is not like regular applications behaviour).
* This script looks for this difference in time and if it sees
* more than MAX_NB_PACKETS succesive packets with a difference
* of timestamp between each one of them less than
* LEGAL_DIFF_TIMESTAMP_PACKETS ns,
* ------------------ It Triggers an ALERT -----------------
* Those settings must be adapted depending on regular network traffic
* -------------------------------------------------------------------
* Important: this is a rudimentary intrusion detection system, one can
* test a real case attack using hping3. However; if regular network
* traffic increases above predefined detection settings, a false
* positive alert will be triggered (an example would be the
case of large file downloads).
*/
rcv_packets_nb_ptr = rcv_packets.lookup(&rcv_packets_nb_index);
rcv_packets_ts_ptr = rcv_packets.lookup(&rcv_packets_ts_index);
if(rcv_packets_nb_ptr != 0 && rcv_packets_ts_ptr != 0){
rcv_packets_nb_inter = *rcv_packets_nb_ptr;
rcv_packets_ts_inter = bpf_ktime_get_ns() - *rcv_packets_ts_ptr;
if(rcv_packets_ts_inter < LEGAL_DIFF_TIMESTAMP_PACKETS){
rcv_packets_nb_inter++;
} else {
rcv_packets_nb_inter = 0;
}
if(rcv_packets_nb_inter > MAX_NB_PACKETS){
detectionPacket.nb_ddos_packets = rcv_packets_nb_inter;
events.perf_submit(ctx, &detectionPacket, sizeof(detectionPacket));
}
}
rcv_packets_ts_inter = bpf_ktime_get_ns();
rcv_packets.update(&rcv_packets_nb_index, &rcv_packets_nb_inter);
rcv_packets.update(&rcv_packets_ts_index, &rcv_packets_ts_inter);
return 0;
}
"""
# Loads eBPF program
b = BPF(text=prog)
# Attach kprobe to kernel function and sets detect_ddos as kprobe handler
b.attach_kprobe(event="ip_rcv", fn_name="detect_ddos")
class DetectionTimestamp(ct.Structure):
_fields_ = [("nb_ddos_packets", ct.c_ulonglong)]
# Show message when ePBF stats
print("DDOS detector started ... Hit Ctrl-C to end!")
print("%-26s %-10s" % ("TIME(s)", "MESSAGE"))
def trigger_alert_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(DetectionTimestamp)).contents
print("%-26s %s %ld" % (datetime.datetime.now(),
"DDOS Attack => nb of packets up to now : ", event.nb_ddos_packets))
# loop with callback to trigger_alert_event
b["events"].open_perf_buffer(trigger_alert_event)
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
Demonstrations of dddos.py, the Linux eBPF/bcc version.
This tracks ip_rcv function (using kprobe) and elapsed time
between received packets to detect potential DDOS attacks.
The following steps illustrates the usage of dddos :
1 - Start dddos.py :
# ./dddos.py
DDOS detector started ... Hit Ctrl-C to end!
TIME(s) MESSAGE
2 - Launch hping3 (or any other flooder) in another terminal as shown below:
# hping3 localhost -S -A -V -p 443 -i u100
3 - dddos.py triggers alerts and reports a DDOS attack:
DDOS detector started ... Hit Ctrl-C to end!
TIME(s) MESSAGE
2019-01-16 11:55:12.600734 DDOS Attack => nb of packets up to now : 1001
2019-01-16 11:55:12.600845 DDOS Attack => nb of packets up to now : 1002
2019-01-16 11:55:12.600887 DDOS Attack => nb of packets up to now : 1003
2019-01-16 11:55:12.600971 DDOS Attack => nb of packets up to now : 1004
2019-01-16 11:55:12.601009 DDOS Attack => nb of packets up to now : 1005
2019-01-16 11:55:12.601062 DDOS Attack => nb of packets up to now : 1006
2019-01-16 11:55:12.601096 DDOS Attack => nb of packets up to now : 1007
2019-01-16 11:55:12.601195 DDOS Attack => nb of packets up to now : 1008
2019-01-16 11:55:12.601228 DDOS Attack => nb of packets up to now : 1009
2019-01-16 11:55:12.601331 DDOS Attack => nb of packets up to now : 1010
2019-01-16 11:55:12.601364 DDOS Attack => nb of packets up to now : 1011
2019-01-16 11:55:12.601470 DDOS Attack => nb of packets up to now : 1012
2019-01-16 11:55:12.601505 DDOS Attack => nb of packets up to now : 1013
2019-01-16 11:55:12.601621 DDOS Attack => nb of packets up to now : 1014
2019-01-16 11:55:12.601656 DDOS Attack => nb of packets up to now : 1015
2019-01-16 11:55:12.601757 DDOS Attack => nb of packets up to now : 1016
2019-01-16 11:55:12.601790 DDOS Attack => nb of packets up to now : 1017
2019-01-16 11:55:12.601892 DDOS Attack => nb of packets up to now : 1018
2019-01-16 11:55:12.601925 DDOS Attack => nb of packets up to now : 1019
2019-01-16 11:55:12.602028 DDOS Attack => nb of packets up to now : 1020
Remark : Use Ctrl-C to stop dddos.py
......@@ -51,6 +51,7 @@ print("%-18s %-2s %-7s %8s" % ("TIME(s)", "T", "BYTES", "LAT(ms)"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
(bytes_s, bflags_s, us_s) = msg.split()
......@@ -63,3 +64,5 @@ while 1:
ms = float(int(us_s, 10)) / 1000
print("%-18.9f %-2s %-7s %8.2f" % (ts, type_s, bytes_s, ms))
except KeyboardInterrupt:
exit()
#!/usr/bin/env python
#!/usr/bin/python
#
# This is a Hello World example that formats output as fields.
......@@ -14,7 +14,7 @@ int hello(void *ctx) {
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
......
#!/usr/bin/env python
#!/usr/bin/python
#
# This is a Hello World example that uses BPF_PERF_OUTPUT.
......@@ -32,7 +32,7 @@ int hello(struct pt_regs *ctx) {
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# define output data structure in Python
TASK_COMM_LEN = 16 # linux/sched.h
......@@ -58,4 +58,7 @@ def print_event(cpu, data, size):
# loop with callback to print_event
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
#!/usr/bin/env python
#!/usr/bin/python
#
# kvm_hypercall.py
#
......
......@@ -25,7 +25,7 @@ b = BPF(text="""
#include <uapi/linux/ptrace.h>
BPF_HASH(calls, int);
BPF_STACK_TRACE(stack_traces, 1024)
BPF_STACK_TRACE(stack_traces, 1024);
int alloc_enter(struct pt_regs *ctx, size_t size) {
int key = stack_traces.get_stackid(ctx,
......@@ -33,6 +33,7 @@ int alloc_enter(struct pt_regs *ctx, size_t size) {
if (key < 0)
return 0;
// could also use `calls.increment(key, size);`
u64 zero = 0, *val;
val = calls.lookup_or_init(&key, &zero);
(*val) += size;
......
#!/usr/bin/python
#
# An example usage of stack_build_id
# Most of the code here is borrowed from tools/profile.py
#
# Steps for using this code
# 1) Start ping program in one terminal eg invocation: ping google.com -i0.001
# 2) Change the path of libc specified in b.add_module() below
# 3) Invoke the script as 'python stack_buildid_example.py'
# 4) o/p of the tool is as shown below
# python example/tracing/stack_buildid_example.py
# sendto
# - ping (5232)
# 2
#
# REQUIRES: Linux 4.17+ (BPF_BUILD_ID support)
# Licensed under the Apache License, Version 2.0 (the "License")
# 03-Jan-2019 Vijay Nag
from __future__ import print_function
from bcc import BPF, PerfType, PerfSWConfig
from sys import stderr
from time import sleep
import argparse
import signal
import os
import subprocess
import errno
import multiprocessing
import ctypes as ct
def Get_libc_path():
# A small helper function that returns full path
# of libc in the system
cmd = 'cat /proc/self/maps | grep libc | awk \'{print $6}\' | uniq'
output = subprocess.check_output(cmd, shell=True)
if not isinstance(output, str):
output = output.decode()
return output.split('\n')[0]
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <uapi/linux/bpf_perf_event.h>
#include <linux/sched.h>
struct key_t {
u32 pid;
int user_stack_id;
char name[TASK_COMM_LEN];
};
BPF_HASH(counts, struct key_t);
BPF_STACK_TRACE_BUILDID(stack_traces, 128);
int do_perf_event(struct bpf_perf_event_data *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
// create map key
struct key_t key = {.pid = pid};
bpf_get_current_comm(&key.name, sizeof(key.name));
key.user_stack_id = stack_traces.get_stackid(&ctx->regs, BPF_F_USER_STACK);
if (key.user_stack_id >= 0) {
counts.increment(key);
}
return 0;
}
"""
b = BPF(text=bpf_text)
b.attach_perf_event(ev_type=PerfType.SOFTWARE,
ev_config=PerfSWConfig.CPU_CLOCK, fn_name="do_perf_event",
sample_period=0, sample_freq=49, cpu=0)
# Add the list of libraries/executables to the build sym cache for sym resolution
# Change the libc path if it is different on a different machine.
# libc.so and ping are added here so that any symbols pertaining to
# libc or ping are resolved. More executables/libraries can be added here.
b.add_module(Get_libc_path())
b.add_module("/usr/sbin/sshd")
b.add_module("/bin/ping")
counts = b.get_table("counts")
stack_traces = b.get_table("stack_traces")
duration = 2
def signal_handler(signal, frame):
print()
try:
sleep(duration)
except KeyboardInterrupt:
# as cleanup can take some time, trap Ctrl-C:
signal.signal(signal.SIGINT, signal_ignore)
user_stack=[]
for k,v in sorted(counts.items(), key=lambda counts: counts[1].value):
user_stack = [] if k.user_stack_id < 0 else \
stack_traces.walk(k.user_stack_id)
user_stack=list(user_stack)
for addr in user_stack:
print(" %s" % b.sym(addr, k.pid).decode('utf-8', 'replace'))
print(" %-16s %s (%d)" % ("-", k.name.decode('utf-8', 'replace'), k.pid))
print(" %d\n" % v.value)
......@@ -52,7 +52,7 @@ struct data_t {
char comm[TASK_COMM_LEN];
};
BPF_STACK_TRACE(stack_traces, 128)
BPF_STACK_TRACE(stack_traces, 128);
BPF_PERF_OUTPUT(events);
void trace_stack(struct pt_regs *ctx) {
......@@ -120,4 +120,7 @@ def print_event(cpu, data, size):
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
......@@ -31,6 +31,7 @@ int count(struct pt_regs *ctx) {
u64 zero = 0, *val;
bpf_probe_read(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
// could also use `counts.increment(key)`
val = counts.lookup_or_init(&key, &zero);
(*val)++;
return 0;
......
......@@ -14,7 +14,6 @@ from bcc import BPF
# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
BPF_HASH(last);
......@@ -39,7 +38,7 @@ int do_trace(struct pt_regs *ctx) {
}
""")
b.attach_kprobe(event="sys_sync", fn_name="do_trace")
b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
print("Tracing for quick sync's... Ctrl-C to end")
# format output
......
......@@ -22,6 +22,7 @@ int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
key.curr_pid = bpf_get_current_pid_tgid();
key.prev_pid = prev->pid;
// could also use `stats.increment(key);`
val = stats.lookup_or_init(&key, &zero);
(*val)++;
return 0;
......
......@@ -55,11 +55,9 @@ int kretprobe__tcp_v4_connect(struct pt_regs *ctx)
// pull in details
struct sock *skp = *skpp;
u32 saddr = 0, daddr = 0;
u16 dport = 0;
bpf_probe_read(&saddr, sizeof(saddr), &skp->__sk_common.skc_rcv_saddr);
bpf_probe_read(&daddr, sizeof(daddr), &skp->__sk_common.skc_daddr);
bpf_probe_read(&dport, sizeof(dport), &skp->__sk_common.skc_dport);
u32 saddr = skp->__sk_common.skc_rcv_saddr;
u32 daddr = skp->__sk_common.skc_daddr;
u16 dport = skp->__sk_common.skc_dport;
// output
bpf_trace_printk("trace_tcp4connect %x %x %d\\n", saddr, daddr, ntohs(dport));
......
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......@@ -6,6 +6,7 @@
# run in project examples directory with:
# sudo ./trace_fields.py"
from __future__ import print_function
from bcc import BPF
prog = """
......@@ -15,6 +16,6 @@ int hello(void *ctx) {
}
"""
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
print "PID MESSAGE"
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
print("PID MESSAGE")
b.trace_print(fmt="{1} {5}")
#!/usr/bin/env python
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
......@@ -25,7 +25,7 @@ def cb(cpu, data, size):
prog = """
BPF_PERF_OUTPUT(events);
BPF_ARRAY(counters, u64, 10);
int kprobe__sys_clone(void *ctx) {
int do_sys_clone(void *ctx) {
struct {
u64 ts;
u64 magic;
......@@ -40,6 +40,8 @@ int kprobe__sys_clone(void *ctx) {
}
"""
b = BPF(text=prog)
event_name = b.get_syscall_fnname("clone")
b.attach_kprobe(event=event_name, fn_name="do_sys_clone")
b["events"].open_perf_buffer(cb)
@atexit.register
......@@ -48,7 +50,10 @@ def print_counter():
global b
print("counter = %d vs %d" % (counter, b["counters"][ct.c_int(0)].value))
print("Tracing sys_write, try `dd if=/dev/zero of=/dev/null`")
print("Tracing " + event_name + ", try `dd if=/dev/zero of=/dev/null`")
print("Tracing... Hit Ctrl-C to end.")
while 1:
b.kprobe_poll()
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
......@@ -33,7 +33,7 @@ struct urandom_read_args {
int printarg(struct urandom_read_args *args) {
bpf_trace_printk("%d\\n", args->got_bits);
return 0;
};
}
"""
# load BPF program
......
......@@ -20,7 +20,7 @@ TRACEPOINT_PROBE(random, urandom_read) {
// args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
bpf_trace_printk("%d\\n", args->got_bits);
return 0;
};
}
""")
# header
......
File mode changed from 100755 to 100644
......@@ -15,6 +15,7 @@
#
# 15-Aug-2015 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from ctypes import c_ushort, c_int, c_ulonglong
from time import sleep
......@@ -58,7 +59,7 @@ while (1):
except KeyboardInterrupt:
pass; do_exit = 1
print
print()
b["dist"].print_log2_hist("usecs")
b["dist"].clear()
if do_exit:
......
File mode changed from 100755 to 100644
#!/usr/bin/python
import argparse
from time import sleep, strftime
from sys import argv
......
#!/usr/bin/python
import argparse
from time import sleep, strftime
from sys import argv
......
#!/usr/bin/python
import argparse
from time import sleep
from sys import argv
......@@ -114,4 +115,4 @@ print("%-18s %-10s %-32s %-32s %16s %16s %16s" % ("time(s)", "id", "input", "out
# Output latency events
bpf_ctx["operation_event"].open_perf_buffer(print_event)
while 1:
bpf_ctx.kprobe_poll()
bpf_ctx.perf_buffer_poll()
File mode changed from 100755 to 100644
File mode changed from 100755 to 100644
# Copyright (c) Facebook, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
include_directories(${CMAKE_SOURCE_DIR}/src/cc)
include_directories(${CMAKE_SOURCE_DIR}/src/cc/api)
include_directories(${CMAKE_SOURCE_DIR}/src/cc/libbpf/include/uapi)
option(INSTALL_INTROSPECTION "Install BPF introspection tools" ON)
add_executable(bps bps.c)
target_link_libraries(bps bpf-static)
install (TARGETS bps DESTINATION share/bcc/introspection)
#include <time.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <ctype.h>
#include <sysexits.h>
#include "libbpf.h"
// TODO: Remove this when CentOS 6 support is not needed anymore
#ifndef CLOCK_BOOTTIME
#define CLOCK_BOOTTIME 7
#endif
static const char * const prog_type_strings[] = {
[BPF_PROG_TYPE_UNSPEC] = "unspec",
[BPF_PROG_TYPE_SOCKET_FILTER] = "socket filter",
[BPF_PROG_TYPE_KPROBE] = "kprobe",
[BPF_PROG_TYPE_SCHED_CLS] = "sched cls",
[BPF_PROG_TYPE_SCHED_ACT] = "sched act",
[BPF_PROG_TYPE_TRACEPOINT] = "tracepoint",
[BPF_PROG_TYPE_XDP] = "xdp",
[BPF_PROG_TYPE_PERF_EVENT] = "perf event",
[BPF_PROG_TYPE_CGROUP_SKB] = "cgroup skb",
[BPF_PROG_TYPE_CGROUP_SOCK] = "cgroup sock",
[BPF_PROG_TYPE_LWT_IN] = "lwt in",
[BPF_PROG_TYPE_LWT_OUT] = "lwt out",
[BPF_PROG_TYPE_LWT_XMIT] = "lwt xmit",
[BPF_PROG_TYPE_SOCK_OPS] = "sock ops",
[BPF_PROG_TYPE_SK_SKB] = "sk skb",
[BPF_PROG_TYPE_CGROUP_DEVICE] = "cgroup_device",
[BPF_PROG_TYPE_SK_MSG] = "sk_msg",
[BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint",
[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
[BPF_PROG_TYPE_LIRC_MODE2] = "lirc_mode2",
[BPF_PROG_TYPE_SK_REUSEPORT] = "sk_reuseport",
[BPF_PROG_TYPE_FLOW_DISSECTOR] = "flow_dissector",
};
static const char * const map_type_strings[] = {
[BPF_MAP_TYPE_UNSPEC] = "unspec",
[BPF_MAP_TYPE_HASH] = "hash",
[BPF_MAP_TYPE_ARRAY] = "array",
[BPF_MAP_TYPE_PROG_ARRAY] = "prog array",
[BPF_MAP_TYPE_PERF_EVENT_ARRAY] = "perf-ev array",
[BPF_MAP_TYPE_PERCPU_HASH] = "percpu hash",
[BPF_MAP_TYPE_PERCPU_ARRAY] = "percpu array",
[BPF_MAP_TYPE_STACK_TRACE] = "stack trace",
[BPF_MAP_TYPE_CGROUP_ARRAY] = "cgroup array",
[BPF_MAP_TYPE_LRU_HASH] = "lru hash",
[BPF_MAP_TYPE_LRU_PERCPU_HASH] = "lru percpu hash",
[BPF_MAP_TYPE_LPM_TRIE] = "lpm trie",
[BPF_MAP_TYPE_ARRAY_OF_MAPS] = "array of maps",
[BPF_MAP_TYPE_HASH_OF_MAPS] = "hash of maps",
[BPF_MAP_TYPE_DEVMAP] = "devmap",
[BPF_MAP_TYPE_SOCKMAP] = "sockmap",
[BPF_MAP_TYPE_CPUMAP] = "cpumap",
[BPF_MAP_TYPE_SOCKHASH] = "sockhash",
[BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage",
[BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray",
[BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE] = "precpu_cgroup_storage",
[BPF_MAP_TYPE_QUEUE] = "queue",
[BPF_MAP_TYPE_STACK] = "stack",
};
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
#define LAST_KNOWN_PROG_TYPE (ARRAY_SIZE(prog_type_strings) - 1)
#define LAST_KNOWN_MAP_TYPE (ARRAY_SIZE(map_type_strings) - 1)
#define min(x, y) ((x) < (y) ? (x) : (y))
static inline uint64_t ptr_to_u64(const void *ptr)
{
return (uint64_t) (unsigned long) ptr;
}
static inline void * u64_to_ptr(uint64_t ptr)
{
return (void *) (unsigned long ) ptr;
}
static int handle_get_next_errno(int eno)
{
switch (eno) {
case ENOENT:
return 0;
case EINVAL:
fprintf(stderr, "Kernel does not support BPF introspection\n");
return EX_UNAVAILABLE;
case EPERM:
fprintf(stderr,
"Require CAP_SYS_ADMIN capability. Please retry as root\n");
return EX_NOPERM;
default:
fprintf(stderr, "%s\n", strerror(errno));
return 1;
}
}
static void print_prog_hdr(void)
{
printf("%9s %-15s %8s %6s %-12s %-15s\n",
"BID", "TYPE", "UID", "#MAPS", "LoadTime", "NAME");
}
static void print_prog_info(const struct bpf_prog_info *prog_info)
{
struct timespec real_time_ts, boot_time_ts;
time_t wallclock_load_time = 0;
char unknown_prog_type[16];
const char *prog_type;
char load_time[16];
struct tm load_tm;
if (prog_info->type > LAST_KNOWN_PROG_TYPE) {
snprintf(unknown_prog_type, sizeof(unknown_prog_type), "<%u>",
prog_info->type);
unknown_prog_type[sizeof(unknown_prog_type) - 1] = '\0';
prog_type = unknown_prog_type;
} else {
prog_type = prog_type_strings[prog_info->type];
}
if (!clock_gettime(CLOCK_REALTIME, &real_time_ts) &&
!clock_gettime(CLOCK_BOOTTIME, &boot_time_ts) &&
real_time_ts.tv_sec >= boot_time_ts.tv_sec)
wallclock_load_time =
(real_time_ts.tv_sec - boot_time_ts.tv_sec) +
prog_info->load_time / 1000000000;
if (wallclock_load_time && localtime_r(&wallclock_load_time, &load_tm))
strftime(load_time, sizeof(load_time), "%b%d/%H:%M", &load_tm);
else
snprintf(load_time, sizeof(load_time), "<%llu>",
prog_info->load_time / 1000000000);
load_time[sizeof(load_time) - 1] = '\0';
if (prog_info->jited_prog_len)
printf("%9u %-15s %8u %6u %-12s %-15s\n",
prog_info->id, prog_type, prog_info->created_by_uid,
prog_info->nr_map_ids, load_time, prog_info->name);
else
printf("%8u- %-15s %8u %6u %-12s %-15s\n",
prog_info->id, prog_type, prog_info->created_by_uid,
prog_info->nr_map_ids, load_time, prog_info->name);
}
static void print_map_hdr(void)
{
printf("%8s %-15s %-10s %8s %8s %8s %-15s\n",
"MID", "TYPE", "FLAGS", "KeySz", "ValueSz", "MaxEnts",
"NAME");
}
static void print_map_info(const struct bpf_map_info *map_info)
{
char unknown_map_type[16];
const char *map_type;
if (map_info->type > LAST_KNOWN_MAP_TYPE) {
snprintf(unknown_map_type, sizeof(unknown_map_type),
"<%u>", map_info->type);
unknown_map_type[sizeof(unknown_map_type) - 1] = '\0';
map_type = unknown_map_type;
} else {
map_type = map_type_strings[map_info->type];
}
printf("%8u %-15s 0x%-8x %8u %8u %8u %-15s\n",
map_info->id, map_type, map_info->map_flags, map_info->key_size,
map_info->value_size, map_info->max_entries,
map_info->name);
}
static int print_one_prog(uint32_t prog_id)
{
const uint32_t usual_nr_map_ids = 64;
uint32_t nr_map_ids = usual_nr_map_ids;
struct bpf_prog_info prog_info;
uint32_t *map_ids = NULL;
uint32_t info_len;
int ret = 0;
int prog_fd;
uint32_t i;
prog_fd = bpf_prog_get_fd_by_id(prog_id);
if (prog_fd == -1) {
if (errno == ENOENT) {
fprintf(stderr, "BID:%u not found\n", prog_id);
return EX_DATAERR;
} else {
return handle_get_next_errno(errno);
}
}
/* Retry at most one time for larger map_ids array */
for (i = 0; i < 2; i++) {
bzero(&prog_info, sizeof(prog_info));
prog_info.map_ids = ptr_to_u64(realloc(map_ids,
nr_map_ids * sizeof(*map_ids)));
if (!prog_info.map_ids) {
fprintf(stderr,
"Cannot allocate memory for %u map_ids for BID:%u\n",
nr_map_ids, prog_id);
close(prog_fd);
free(map_ids);
return 1;
}
map_ids = u64_to_ptr(prog_info.map_ids);
prog_info.nr_map_ids = nr_map_ids;
info_len = sizeof(prog_info);
ret = bpf_obj_get_info(prog_fd, &prog_info, &info_len);
if (ret) {
fprintf(stderr, "Cannot get info for BID:%u. %s(%d)\n",
prog_id, strerror(errno), errno);
close(prog_fd);
free(map_ids);
return ret;
}
if (prog_info.nr_map_ids <= nr_map_ids)
break;
nr_map_ids = prog_info.nr_map_ids;
}
close(prog_fd);
print_prog_hdr();
print_prog_info(&prog_info);
printf("\n");
/* Print all map_info used by the prog */
print_map_hdr();
nr_map_ids = min(prog_info.nr_map_ids, nr_map_ids);
for (i = 0; i < nr_map_ids; i++) {
struct bpf_map_info map_info = {};
info_len = sizeof(map_info);
int map_fd;
map_fd = bpf_map_get_fd_by_id(map_ids[i]);
if (map_fd == -1) {
if (errno == -ENOENT)
continue;
fprintf(stderr,
"Cannot get fd for map:%u. %s(%d)\n",
map_ids[i], strerror(errno), errno);
ret = map_fd;
break;
}
ret = bpf_obj_get_info(map_fd, &map_info, &info_len);
close(map_fd);
if (ret) {
fprintf(stderr, "Cannot get info for map:%u. %s(%d)\n",
map_ids[i], strerror(errno), errno);
break;
}
print_map_info(&map_info);
}
free(map_ids);
return ret;
}
int print_all_progs(void)
{
uint32_t next_id = 0;
print_prog_hdr();
while (!bpf_prog_get_next_id(next_id, &next_id)) {
struct bpf_prog_info prog_info = {};
uint32_t prog_info_len = sizeof(prog_info);
int prog_fd;
int ret;
prog_fd = bpf_prog_get_fd_by_id(next_id);
if (prog_fd < 0) {
if (errno == ENOENT)
continue;
fprintf(stderr,
"Cannot get fd for BID:%u. %s(%d)\n",
next_id, strerror(errno), errno);
return 1;
}
ret = bpf_obj_get_info(prog_fd, &prog_info, &prog_info_len);
close(prog_fd);
if (ret) {
fprintf(stderr,
"Cannot get bpf_prog_info for BID:%u. %s(%d)\n",
next_id, strerror(errno), errno);
return ret;
}
print_prog_info(&prog_info);
}
return handle_get_next_errno(errno);
}
void usage(void)
{
printf("BPF Program Snapshot (bps):\n"
"List of all BPF programs loaded into the system.\n\n");
printf("Usage: bps [bpf-prog-id]\n");
printf(" [bpf-prog-id] If specified, it shows the details info of the bpf-prog\n");
printf("\n");
}
int main(int argc, char **argv)
{
if (argc > 1) {
if (!isdigit(*argv[1])) {
usage();
return EX_USAGE;
}
return print_one_prog((uint32_t)atoi(argv[1]));
}
return print_all_progs();
}
* List all BPF programs *
# bps
BID TYPE UID #MAPS LoadTime NAME
82 kprobe 0 1 Oct19/23:52 map_perf_test
83 kprobe 0 1 Oct19/23:52 map_perf_test
84 kprobe 0 1 Oct19/23:52 map_perf_test
85 kprobe 0 1 Oct19/23:52 map_perf_test
86 kprobe 0 4 Oct19/23:52 map_perf_test
87 kprobe 0 1 Oct19/23:52 map_perf_test
88 kprobe 0 1 Oct19/23:52 map_perf_test
89 kprobe 0 1 Oct19/23:52 map_perf_test
* List a particular BPF program and its maps *
# bps 86
BID TYPE UID #MAPS LoadTime NAME
86 kprobe 0 4 Oct19/23:52 map_perf_test
MID TYPE FLAGS KeySz ValueSz MaxEnts NAME
120 lru hash 0x0 4 8 10000 lru_hash_map
129 lru hash 0x0 4 8 43 lru_hash_lookup
123 array of maps 0x0 4 4 1024 array_of_lru_ha
121 lru hash 0x2 4
find_program(GZIP gzip)
file(GLOB FILES *.8)
install(FILES ${FILES} DESTINATION share/bcc/man/man8)
set(GZFILES "")
foreach(FIL ${FILES})
get_filename_component(NAME ${FIL} NAME)
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.gz
COMMAND ${GZIP} -c ${FIL} > ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.gz
DEPENDS ${FIL})
list(APPEND GZFILES "${CMAKE_CURRENT_BINARY_DIR}/${NAME}.gz")
endforeach()
add_custom_target(man ALL DEPENDS ${GZFILES})
install(FILES ${GZFILES} DESTINATION share/bcc/man/man8)
......@@ -2,7 +2,7 @@
.SH NAME
argdist \- Trace a function and display a histogram or frequency count of its parameter values. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] [-T TOP] [-H specifier] [-C specifier] [-I header]
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-d DURATION] [-n COUNT] [-v] [-T TOP] [-H specifier] [-C specifier] [-I header]
.SH DESCRIPTION
argdist attaches to function entry and exit points, collects specified parameter
values, and stores them in a histogram or a frequency collection that counts
......@@ -27,6 +27,9 @@ characters. Longer strings will be truncated.
\-i INTERVAL
Print the collected data every INTERVAL seconds. The default is 1 second.
.TP
\-d DURATION
Total duration of trace in seconds.
.TP
\-n NUMBER
Print the collected data COUNT times and then exit.
.TP
......
......@@ -30,9 +30,6 @@ Don't clear the screen.
\-r MAXROWS
Maximum number of rows to print. Default is 20.
.TP
\-p PID
Trace this PID only.
.TP
interval
Interval between updates, seconds.
.TP
......
.TH bps 8 "2017-10-19" "USER COMMANDS"
.SH NAME
bps \- List all BPF programs. 'ps' for BPF programs.
.SH SYNOPSIS
.B bps [bpf-prog-id]
.SH DESCRIPTION
.B bps
lists all BPF programs loaded into the kernel. It is similar
to the ps command but for the BPF programs.
Each loaded bpf program is identified by an unique integer (i.e.
.B bpf-prog-id
or simply BID). If
a
.B bpf-prog-id
is specified, the maps used by
.B bpf-prog-id
will also be listed.
.SH EXAMPLES
.TP
List all BPF programs loaded into the kernel:
.B bps
.TP
Show the details and maps of BID 6:
.B bps 6
.SH BPF PROGRAM FIELDS
.TP
.B BID
BPF program ID. It ends with '-' if it is not jitted.
.TP
.B TYPE
The type of a BPF program. e.g. kprobe, tracepoint, xdp...etc.
.TP
.B UID
The user ID that loaded the BPF program.
.TP
.B #MAPS
Total number of maps used by a BPF program.
.TP
.B LoadTime
When was the BPF program loaded?
.TP
.B NAME
The name of a BPF program. The user space library (like
.B bcc
) usually
uses the C function name of the original BPF's source code as
the program name. It could be empty if the user space did not
provide a name.
.SH BPF MAP FIELDS
.TP
.B MID
BPF map ID.
.TP
.B TYPE
The type of a BPF map. e.g. hash, array, stack trace...etc.
.TP
.B FLAGS
The flags used to create the BP map.
.TP
.B KeySz
The key size of a BPF map.
.TP
.B ValueSz
The value size of a BPF map.
.TP
.B MaxEnts
The maximum number of entries of a map.
.TP
.B NAME
The name of a BPF map. The user space library (like
.B bcc
) usually uses the C variable name of the BPF map as its name.
It could be empty if the user space did not provide a name.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Martin Lau
......@@ -18,18 +18,14 @@ Since this uses BPF, only the root user can use this tool.
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Print summaries every five second:
Print summaries every second:
#
.B cachestat
.TP
Print summaries every five seconds with timestamp:
Print summaries every second with timestamp:
#
.B cachestat -T
.TP
Print summaries each second:
#
.B cachestat 1
.TP
Print output every five seconds, three times:
#
.B cachestat 5 3
......@@ -51,6 +47,9 @@ Number of page cache misses.
DIRTIES
Number of dirty pages added to the page cache.
.TP
HITRATIO
The hit ratio as a percentage.
.TP
READ_HIT%
Read hit percent of page cache usage.
.TP
......
......@@ -2,7 +2,7 @@
.SH NAME
capable \- Trace security capability checks (cap_capable()).
.SH SYNOPSIS
.B capable [\-h] [\-v] [\-p PID]
.B capable [\-h] [\-v] [\-p PID] [\-K] [\-U]
.SH DESCRIPTION
This traces security capability checks in the kernel, and prints details for
each call. This can be useful for general debugging, and also security
......@@ -19,6 +19,12 @@ USAGE message.
Include non-audit capability checks. These are those deemed not interesting and
not necessary to audit, such as CAP_SYS_ADMIN checks on memory allocation to
affect the behavior of overcommit.
.TP
\-K
Include kernel stack traces to the output.
.TP
\-U
Include user-space stack traces to the output.
.SH EXAMPLES
.TP
Trace all capability checks system-wide:
......
.TH criticalstat 8 "2018-06-07" "USER COMMANDS"
.SH NAME
criticalstat \- A tracer to find and report long atomic critical sections in kernel
.SH SYNOPSIS
.B criticalstat [\-h] [\-p] [\-i] [\-d DURATION]
.SH DESCRIPTION
criticalstat traces and reports occurences of atomic critical sections in the
kernel with useful stacktraces showing the origin of them. Such critical
sections frequently occur due to use of spinlocks, or if interrupts or
preemption were explicity disabled by a driver. IRQ routines in Linux are also
executed with interrupts disabled. There are many reasons. Such critical
sections are a source of long latency/responsive issues for real-time systems.
This works by probing the preempt/irq and cpuidle tracepoints in the kernel.
Since this uses BPF, only the root user can use this tool. Further, the kernel
has to be built with certain CONFIG options enabled. See below.
.SH REQUIREMENTS
Enable CONFIG_PREEMPTIRQ_EVENTS and CONFIG_DEBUG_PREEMPT. Additionally, the
following options should be DISABLED on older kernels: CONFIG_PROVE_LOCKING,
CONFIG_LOCKDEP.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-p
Find long sections where preemption was disabled on local CPU.
.TP
\-i
Find long sections where interrupt was disabled on local CPU.
.TP
\-d DURATION
Only identify sections that are longer than DURATION in microseconds.
.SH EXAMPLES
.TP
Run with default options: irq disabled for more than 100 uS
#
.B criticalstat
.TP
Find sections with preemption disabled for more than 100 uS.
#
.B criticalstat -p
.TP
Find sections with IRQ disabled for more than 500 uS.
#
.B criticalstat -d 500
.TP
Find sections with preemption disabled for more than 500 uS.
#
.B criticalstat -p -d 500
.SH OVERHEAD
This tool can cause overhead if the application is spending a lot of time in
kernel mode. The overhead is variable but can be 2-4% of performance
degradation. If overhead is seen to be too much, please pass a higher DURATION
to the -d option to filter more aggressively.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Joel Fernandes
.SH SEE ALSO
Linux kernel's preemptoff and irqoff tracers.
uthreads.8
\ No newline at end of file
......@@ -2,7 +2,7 @@
.SH NAME
dbslower \- Trace MySQL/PostgreSQL server queries slower than a threshold.
.SH SYNOPSIS
.B dbslower [-v] [-p PID [PID ...]] [-m THRESHOLD] {mysql,postgres}
.B dbslower [-v] [-p PID [PID ...]] [-x PATH] [-m THRESHOLD] {mysql,postgres}
.SH DESCRIPTION
This traces queries served by a MySQL or PostgreSQL server, and prints
those that exceed a latency (query time) threshold. By default a threshold of
......@@ -11,6 +11,8 @@ those that exceed a latency (query time) threshold. By default a threshold of
This uses User Statically-Defined Tracing (USDT) probes, a feature added to
MySQL and PostgreSQL for DTrace support, but which may not be enabled on a
given installation. See requirements.
Alternativly, MySQL queries can be traced without the USDT support using the
-x option.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
......@@ -25,6 +27,10 @@ Print usage message.
Trace this PID. If no PID is specified, the tool will attempt to automatically
detect the MySQL or PostgreSQL processes running on the system.
.TP
\-x PATH
Path to MySQL binary. This option allow to MySQL queries even when USDT probes
aren't enabled on the MySQL server.
.TP
\-m THRESHOLD
Minimum query latency (duration) to trace, in milliseconds. Default is 1 ms.
.TP
......
This diff is collapsed.
......@@ -30,11 +30,18 @@ Include a timestamp column.
\-x
Include failed exec()s
.TP
\-q
Add "quotemarks" around arguments. Escape quotemarks in arguments with a
backslash. For tracing empty arguments or arguments that contain whitespace.
.TP
\-n NAME
Only print command lines matching this name (regex)
.TP
\-l LINE
Only print commands where arg contains this line (regex)
.TP
\--max-args MAXARGS
Maximum number of arguments parsed and displayed, defaults to 20
.SH EXAMPLES
.TP
Trace all exec() syscalls:
......@@ -49,6 +56,10 @@ Include failed exec()s:
#
.B execsnoop \-x
.TP
Put quotemarks around arguments.
#
.B execsnoop \-q
.TP
Only trace exec()s where the filename contains "mount":
#
.B execsnoop \-n mount
......
......@@ -2,7 +2,7 @@
.SH NAME
funccount \- Count function, tracepoint, and USDT probe calls matching a pattern. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B funccount [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-r] [\-d] pattern
.B funccount [\-h] [\-p PID] [\-i INTERVAL] [\-d DURATION] [\-T] [\-r] [\-D] pattern
.SH DESCRIPTION
This tool is a quick way to determine which functions are being called,
and at what rate. It uses in-kernel eBPF maps to count function calls.
......@@ -36,7 +36,7 @@ Include timestamps on output.
\-r
Use regular expressions for the search pattern.
.TP
\-d
\-D
Print the BPF program before starting (for debugging purposes).
.SH EXAMPLES
.TP
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment