generate proper usdt code to prevent llvm meddling with ctx->#fields

Qin reported a test case where llvm still messes up with ctx->#fields. For code like below: switch(ctx->ip) { case 0x7fdf2ede9820ULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->r12; return 0; case 0x7fdf2edecd9cULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->bx; return 0; } The compiler still generates: # r1 is the pointer to the ctx r1 += 24 goto LBB0_4 LBB0_3: r1 += 40 LBB0_4: r3 = *(u64 *)(r1 + 0) The verifier will reject the above code since the last load is not "ctx + field_offset" format. The responsible llvm optimization pass is CFGSimplifyPass. Its main implementation in llvm/lib/Transforms/Utils/SimplifyCFG.cpp. The main routine to do the optimization is SinkThenElseCodeToEnd. The routine canSinkInstructions is used to determine whether an insn is a candidate for sinking. Unfortunately, volatile load/store is not a condition to prevent the optimization. But inline assembly is a condition which can prevent further optimization. In this patch, instead of using volatile to annotate ctx->#field access, we do normal ctx->#field access but put a compiler inline assembly memory barrier __asm__ __volatile__(\"\": : :\"memory\"); after the field access. Tested with usdt unit test case, usdt_samples example, a couple of usdt unit tests developed in the past. Signed-off-by: Yonghong Song <yhs@fb.com>

generate proper usdt code to prevent llvm meddling with ctx->#fields
Qin reported a test case where llvm still messes up with ctx->#fields. For code like below: switch(ctx->ip) { case 0x7fdf2ede9820ULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->r12; return 0; case 0x7fdf2edecd9cULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->bx; return 0; } The compiler still generates: # r1 is the pointer to the ctx r1 += 24 goto LBB0_4 LBB0_3: r1 += 40 LBB0_4: r3 = *(u64 *)(r1 + 0) The verifier will reject the above code since the last load is not "ctx + field_offset" format. The responsible llvm optimization pass is CFGSimplifyPass. Its main implementation in llvm/lib/Transforms/Utils/SimplifyCFG.cpp. The main routine to do the optimization is SinkThenElseCodeToEnd. The routine canSinkInstructions is used to determine whether an insn is a candidate for sinking. Unfortunately, volatile load/store is not a condition to prevent the optimization. But inline assembly is a condition which can prevent further optimization. In this patch, instead of using volatile to annotate ctx->#field access, we do normal ctx->#field access but put a compiler inline assembly memory barrier __asm__ __volatile__(\"\": : :\"memory\"); after the field access. Tested with usdt unit test case, usdt_samples example, a couple of usdt unit tests developed in the past. Signed-off-by: Yonghong Song <yhs@fb.com>
8206f547 · Yonghong Song · 5f7035e4 · 8206f547 · 8206f547
Commit 8206f547 authored Jul 18, 2017 by Yonghong Song
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 4 deletions

src/cc/usdt.h src/cc/usdt.h +3 -0

src/cc/usdt_args.cc src/cc/usdt_args.cc +11 -4

No files found.
--- a/src/cc/usdt.h
+++ b/src/cc/usdt.h
@@ -35,6 +35,9 @@ class ArgumentParser;
 static const std::string USDT_PROGRAM_HEADER =
    "#include <uapi/linux/ptrace.h>\n";

+static const std::string COMPILER_BARRIER =
+    "__asm__ __volatile__(\"\": : :\"memory\");";
+
 class Argument {
 private:
  optional<int> arg_size_;

--- a/src/cc/usdt_args.cc
+++ b/src/cc/usdt_args.cc
@@ -67,20 +67,27 @@ bool Argument::assign_to_local(std::ostream &stream,
  }

  if (!deref_offset_) {
-    tfm::format(stream, "%s = *(volatile %s *)&ctx->%s;", local_name, ctype(),
-                *base_register_name_);
+    tfm::format(stream, "%s = ctx->%s;", local_name, *base_register_name_);
+    // Put a compiler barrier to prevent optimization
+    // like llvm SimplifyCFG SinkThenElseCodeToEnd
+    // Volatile marking is not sufficient to prevent such optimization.
+    tfm::format(stream, " %s", COMPILER_BARRIER);
    return true;
  }

  if (deref_offset_ && !deref_ident_) {
-    tfm::format(stream, "{ u64 __addr = (*(volatile u64 *)&ctx->%s) + (%d)",
+    tfm::format(stream, "{ u64 __addr = ctx->%s + %d",
                *base_register_name_, *deref_offset_);
    if (index_register_name_) {
      int scale = scale_.value_or(1);
-      tfm::format(stream, " + ((*(volatile u64 *)&ctx->%s) * %d);", *index_register_name_, scale);
+      tfm::format(stream, " + (ctx->%s * %d);", *index_register_name_, scale);
    } else {
      tfm::format(stream, ";");
    }
+    // Theoretically, llvm SimplifyCFG SinkThenElseCodeToEnd may still
+    // sink bpf_probe_read call, so put a barrier here to prevent sinking
+    // of ctx->#fields.
+    tfm::format(stream, " %s ", COMPILER_BARRIER);
    tfm::format(stream,
                "%s __res = 0x0; "
                "bpf_probe_read(&__res, sizeof(__res), (void *)__addr); "