1. 25 Jul, 2012 17 commits
    • Namhyung Kim's avatar
      tools lib traceevent: Ignore TRACEEVENT-CFLAGS file · 52f18a2f
      Namhyung Kim authored
      The TRACEEVENT-CFLAGS file is used to detect any change on compiler
      flags. Just ignore it.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1341559297-25725-3-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      52f18a2f
    • Namhyung Kim's avatar
      tools lib traceevent: Detect build environment changes · 52b5c0d4
      Namhyung Kim authored
      Cross compiling perf requires setting ARCH and CROSS_COMPILE variables,
      but libtraceevent couldn't detect the changes so it ends up believing no
      recompiling is required. Thus the linker failed like:
      
           LINK perf
       ../lib/traceevent//libtraceevent.a: member ../lib/traceevent//libtraceevent.a(event-parse.o) in archive is not an object
       collect2: ld returned 1 exit status
       make: *** [perf] Error 1
      
      This patch fixes this by adding TRACEEVENT-CFLAGS file like
      PERF-CFLAGS to track those changes.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1341559297-25725-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      52b5c0d4
    • Kirill A. Shutemov's avatar
      perf tools: Fix build error with bison 2.6 · 043d1a5c
      Kirill A. Shutemov authored
      Bison 2.6 started to generate parse_events_parse() declaration in header. In
      this case we have redundant redeclaration:
      
      util/parse-events.c:29:5: error: redundant redeclaration of ‘parse_events_parse’ [-Werror=redundant-decls]
      In file included from util/parse-events.c:14:0:
      util/parse-events-bison.h:99:5: note: previous declaration of ‘parse_events_parse’ was here
      cc1: all warnings being treated as errors
      
      Let's disable -Wredundant-decls for util/parse-events.c since it includes
      header we can't control.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20120723210407.GA25186@shutemov.nameSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      043d1a5c
    • Kirill A. Shutemov's avatar
      perf tools: use XSI-complaint version of strerror_r() instead of GNU-specific · 4cc49d4d
      Kirill A. Shutemov authored
      Perf uses GNU-specific version of strerror_r(). The GNU-specific strerror_r()
      returns a pointer to a string containing the error message.  This may be either
      a pointer to a string that the function stores in buf, or a pointer to some
      (immutable) static string (in which case buf is unused).
      
      In glibc-2.16 GNU version was marked with attribute warn_unused_result.  It
      triggers few warnings in perf:
      
      util/target.c: In function ‘perf_target__strerror’:
      util/target.c:114:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
      ui/browsers/hists.c: In function ‘hist_browser__dump’:
      ui/browsers/hists.c:981:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
      
      They are bugs.
      
      Let's fix strerror_r() usage.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: default avatarUlrich Drepper <drepper@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/20120723210654.GA25248@shutemov.name
      [ committer note: s/assert/BUG_ON/g ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4cc49d4d
    • Jovi Zhang's avatar
      perf tools: Make the breakpoint events sample period default to 1 · 4a841d65
      Jovi Zhang authored
      There have one problem about hw_breakpoint perf event, as watched, the
      events reported to userspace is not correctly, sometime one trigger
      bp_event report several events, sometime bp_event cannot go through to
      user.
      
      The root cause is attr->freq is 1 passed to kernel defaultly in bp
      events, this make kernel calculate event period not as expect, make
      sample period to 1 will change attr->freq to 0, to fix this problem.
      
      This patch is similar with commit f92128 about tracepoint events:
          perf: Make the trace events sample period default to 1
      Signed-off-by: default avatarJovi Zhang <bookjovi@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/CACV3sbLF8taiCq_VYW-sgRJyupeMzg58C7ZXfMe3xZUiH_Mx6w@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4a841d65
    • Jiri Olsa's avatar
      perf test: Add dso data caching tests · f7add556
      Jiri Olsa authored
      Adding automated test for DSO data reading. Testing raw/cached reads
      from different file/cache locations.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1342959280-5361-18-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f7add556
    • Jiri Olsa's avatar
      perf symbols: Add dso data caching · 4dff624a
      Jiri Olsa authored
      Adding dso data caching so we don't need to open/read/close, each time
      we want dso data.
      
      The DSO data caching affects following functions:
        dso__data_read_offset
        dso__data_read_addr
      
      Each DSO read tries to find the data (based on offset) inside the cache.
      If it's not present it fills the cache from file, and returns the data.
      If it is present, data are returned with no file read.
      
      Each data read is cached by reading cache page sized/aligned amount of
      DSO data. The cache page size is hardcoded to 4096.  The cache is using
      RB tree with file offset as a sort key.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1342959280-5361-17-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4dff624a
    • Jiri Olsa's avatar
      perf symbols: Add interface to read DSO image data · 949d160b
      Jiri Olsa authored
      Adding following interface for DSO object to allow
      reading of DSO image data:
      
        dso__data_fd
          - opens DSO and returns file descriptor
            Binary types are used to locate/open DSO in following order:
              DSO_BINARY_TYPE__BUILD_ID_CACHE
              DSO_BINARY_TYPE__SYSTEM_PATH_DSO
            In other word we first try to open DSO build-id path,
            and if that fails we try to open DSO system path.
      
        dso__data_read_offset
          - reads DSO data from specified offset
      
        dso__data_read_addr
          - reads DSO data from specified address/map.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1342959280-5361-11-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      949d160b
    • Jiri Olsa's avatar
      perf symbols: Factor DSO symtab types to generic binary types · 44f24cb3
      Jiri Olsa authored
      Adding interface to access DSOs so it could be used
      from another place.
      
      New DSO binary type is added - making current SYMTAB__*
      types more general:
         DSO_BINARY_TYPE__* = SYMTAB__*
      
      Following function is added to return path based on the specified
      binary type:
         dso__binary_type_file
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1342959280-5361-10-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      44f24cb3
    • Frederic Weisbecker's avatar
      perf hists: Print newline between hists callchains · 6654f5d8
      Frederic Weisbecker authored
      Tiny cosmetic fix. The lack of a newline between hists callchains was
      looking slightly messy.
      
      Before:
      
           0.24%      swapper  [kernel.kallsyms]  [k] _raw_spin_lock_irq
                      |
                      --- _raw_spin_lock_irq
                          run_timer_softirq
                          __do_softirq
                          call_softirq
                          do_softirq
                          irq_exit
                          smp_apic_timer_interrupt
                          apic_timer_interrupt
                          default_idle
                          amd_e400_idle
                          cpu_idle
                          start_secondary
           0.10%         perf  [kernel.kallsyms]  [k] lock_is_held
                         |
                         --- lock_is_held
                             __might_sleep
                             mutex_lock_nested
                             perf_event_for_each_child
                             perf_ioctl
                             do_vfs_ioctl
                             sys_ioctl
                             system_call_fastpath
                             ioctl
                             cmd_record
                             run_builtin
                             main
                             __libc_start_main
      
      After:
      
           0.24%      swapper  [kernel.kallsyms]  [k] _raw_spin_lock_irq
                      |
                      --- _raw_spin_lock_irq
                          run_timer_softirq
                          __do_softirq
                          call_softirq
                          do_softirq
                          irq_exit
                          smp_apic_timer_interrupt
                          apic_timer_interrupt
                          default_idle
                          amd_e400_idle
                          cpu_idle
                          start_secondary
      
           0.10%         perf  [kernel.kallsyms]  [k] lock_is_held
                         |
                         --- lock_is_held
                             __might_sleep
                             mutex_lock_nested
                             perf_event_for_each_child
                             perf_ioctl
                             do_vfs_ioctl
                             sys_ioctl
                             system_call_fastpath
                             ioctl
                             cmd_record
                             run_builtin
                             main
                             __libc_start_main
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1342631456-7233-3-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6654f5d8
    • Frederic Weisbecker's avatar
      perf tools: Fix trace events storms due to weight demux · 0983cc0d
      Frederic Weisbecker authored
      Trace events have a period (weight) of 1 by default. This can be
      overriden on events definition by using the __perf_count() macro.
      
      For example, the sched_stat_runtime() is weighted with the runtime of
      the task that fired the event.
      
      By default, perf handles such weighted event by dividing it into
      individual events carrying a weight of 1. For example if
      sched_stat_runtime is fired and the task has run 5000000 nsecs, perf
      divides it into 5000000 events in the buffer.
      
      This behaviour makes weighted events unusable because they quickly
      fullfill the buffers and we lose most events.
      
      The commit 5d81e5cf ("events: Don't
      divide events if it has field period") solves this problem by sending
      only one event when PERF_SAMPLE_PERIOD flag is set. The weight is
      carried in the sample itself such that we don't need to demultiplex it
      anymore.
      
      This patch provides the last missing piece to use this feature by
      setting PERF_SAMPLE_PERIOD from perf tools when we deal with trace
      events.
      
      Before:
      	$ ./perf record -e sched:* -a sleep 1
      	[ perf record: Woken up 3 times to write data ]
      	[ perf record: Captured and wrote 1.619 MB perf.data (~70749 samples) ]
      	Warning:
      	Processed 16909 events and lost 1 chunks!
      
      	Check IO/CPU overload!
      
      	$ ./perf script
      	perf  1894 [003]   824.898327: sched_migrate_task: comm=perf pid=1898 prio=120 orig_cpu=2 dest_cpu=0
      	perf  1894 [003]   824.898335: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898336: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898337: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898338: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898339: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898340: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	perf  1894 [003]   824.898341: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
      	[...]
      
      After:
      	$ ./perf record -e sched:* -a sleep 1
      	[ perf record: Woken up 1 times to write data ]
      	[ perf record: Captured and wrote 0.074 MB perf.data (~3228 samples) ]
      
      	$ ./perf script
      
      	perf  1461 [000]   554.286957: sched_migrate_task: comm=perf pid=1465 prio=120 orig_cpu=3 dest_cpu=1
      	perf  1461 [000]   554.286964: sched_stat_sleep: comm=perf pid=1465 delay=133047190 [ns]
      	perf  1461 [000]   554.286967: sched_wakeup: comm=perf pid=1465 prio=120 success=1 target_cpu=001
      	swapper     0 [001]   554.286976: sched_stat_wait: comm=perf pid=1465 delay=0 [ns]
      	swapper     0 [001]   554.286983: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf
      	[...]
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1342631456-7233-1-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0983cc0d
    • Frederic Weisbecker's avatar
      perf hists: Return correct number of characters printed in callchain · 8760db72
      Frederic Weisbecker authored
      Include the omitted number of characters printed for the first entry.
      
      Not that it really matters because nobody seem to care about the number
      of printed characters for now. But just in case.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1342631456-7233-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8760db72
    • David Ahern's avatar
      perf tools: Dump exclude_{guest,host}, precise_ip header info too · 78b961ff
      David Ahern authored
      Adds the attributes to the event line in the header dump.
      From:
       event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
            config2 = 0x0, excl_usr = 0, excl_kern = 0, ...
      to
        event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
            config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
            excl_guest = 0, precise_ip = 0, ...
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1342826756-64663-8-git-send-email-dsahern@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      78b961ff
    • David Ahern's avatar
      perf kvm: Limit repetitive guestmount message to once per directory · c80c3c26
      David Ahern authored
      After 7ed97ad4 use of the guestmount option without a subdir for *each*
      VM generates an error message for each sample related to that VM. Once
      per VM is enough.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342826756-64663-7-git-send-email-dsahern@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c80c3c26
    • David Ahern's avatar
      perf kvm: Fix bug resolving guest kernel syms · adb5d2a4
      David Ahern authored
      Guest kernel symbols are not resolved despite passing the information
      needed to resolve them. e.g.,
      
      perf kvm --guest --guestmount=/tmp/guest-mount record -a -- sleep 1
      perf kvm --guest --guestmount=/tmp/guest-mount report --stdio
      
          36.55%  [guest/11399]  [unknown]         [g] 0xffffffff81600bc8
          33.19%  [guest/10474]  [unknown]         [g] 0x00000000c0116e00
          30.26%  [guest/11094]  [unknown]         [g] 0xffffffff8100a288
          43.69%  [guest/10474]  [unknown]         [g] 0x00000000c0103d90
          37.38%  [guest/11399]  [unknown]         [g] 0xffffffff81600bc8
          12.24%  [guest/11094]  [unknown]         [g] 0xffffffff810aa91d
           6.69%  [guest/11094]  [unknown]         [u] 0x00007fa784d721c3
      
      which is just pathetic.
      
      After a maddening 2 days sifting through perf minutia I found it --
      id_hdr_size is not initialized for guest machines. This shows up on the
      report side as random garbage for the cpu and timestamp, e.g.,
      
      29816 7310572949125804849 0x1ac0 [0x50]: PERF_RECORD_MMAP ...
      
      That messes up the sample sorting such that synthesized guest maps are
      processed last.
      
      With this patch you get a much more helpful report:
      
        12.11%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] irqtime_account_process_tick
        10.58%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] run_timer_softirq
         6.95%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] printk_needs_cpu
         6.50%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] do_timer
         6.45%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] idle_balance
         4.90%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] native_read_tsc
          ...
      
      v2:
      - changed rbtree walk to use rb_first per Namhyung's suggestion
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342826756-64663-5-git-send-email-dsahern@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      adb5d2a4
    • David Ahern's avatar
      perf kvm: Guest userspace samples should not be lumped with host uspace · 7c0f4a41
      David Ahern authored
      e.g., perf kvm --host  --guest report -i perf.data --stdio -D
      shows:
      
      1 599127912065356 0x143b8 [0x48]: PERF_RECORD_SAMPLE(IP, 5): 5671/5676: 0x7fdf95a061c0 period: 1 addr: 0
      ... chain: nr:2
      .....  0: ffffffffffffff80
      .....  1: fffffffffffffe00
       ... thread: qemu-kvm:5671
       ...... dso: <not found>
      
      (IP, 5) means sample in guest userspace. Those samples should not be lumped
      into the VMM's host thread. i.e, the report output:
      
          56.86%  qemu-kvm  [unknown]         [u] 0x00007fdf95a061c0
      
      With this patch the output emphasizes it is a guest userspace hit:
      
          56.86%  [guest/5671]  [unknown]         [u] 0x00007fdf95a061c0
      
      Looking at 3 VMs (2 64-bit, 1 32-bit) with each running a CPU bound
      process (openssl speed), perf report currently shows:
      
        93.84%  117726   qemu-kvm  [unknown]   [u] 0x00007fd7dcaea8e5
      
      which is wrong. With this patch you get:
      
        31.50%   39258   [guest/18772]  [unknown]   [u] 0x00007fd7dcaea8e5
        31.50%   39236   [guest/11230]  [unknown]   [u] 0x0000000000a57340
        30.84%   39232   [guest/18395]  [unknown]   [u] 0x00007f66f641e107
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342826756-64663-4-git-send-email-dsahern@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7c0f4a41
    • David Ahern's avatar
      perf kvm: Set name for VM process in guest machine · 5cd95c2d
      David Ahern authored
      COMM events are not generated in the context of a guest machine, so the
      thread name is never set for the VMM process. For example, the qemu-kvm
      name applies to the process in the host machine, not the guest machine.
      So, samples for guest machines are currently displayed as:
      
          99.67%     :5671  [unknown]         [g] 0xffffffff81366b41
      
      where 5671 is the pid of the VMM. With this patch the samples in the guest
      machine are shown as:
      
          18.43%  [guest/5671]  [unknown]           [g] 0xffffffff810d68b7
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342826756-64663-3-git-send-email-dsahern@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5cd95c2d
  2. 23 Jul, 2012 1 commit
  3. 18 Jul, 2012 2 commits
  4. 17 Jul, 2012 15 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a0185401
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) IPVS oops'ers:
         a) Should not reset skb->nf_bridge in forwarding hook (Lin Ming)
         b) 3.4 commit can cause ip_vs_control_cleanup to be invoked after
            the ipvs_core_ops are unregistered during rmmod (Julian ANastasov)
      
       2) ixgbevf bringup failure can crash in TX descriptor cleanup
          (Alexander Duyck)
      
       3) AX25 switch missing break statement hoses ROSE sockets (Alan Cox)
      
       4) CAIF accesses freed per-net memory (Sjur Brandeland)
      
       5) Network cgroup code has out-or-bounds accesses (Eric DUmazet), and
          accesses freed memory (Gao Feng)
      
       6) Fix a crash in SCTP reported by Dave Jones caused by freeing an
          association still on a list (Neil HOrman)
      
       7) __netdev_alloc_skb() regresses on GFP_DMA using drivers because that
          GFP flag is not being retained for the allocation (Eric Dumazet).
      
       8) Missing NULL hceck in sch_sfb netlink message parsing (Alan Cox)
      
       9) bnx2 crashes because TX index iteration is not bounded correctly
          (Michael Chan)
      
      10) IPoIB generates warnings in TCP queue collapsing (via
          skb_try_coalesce) because it does not set skb->truesize correctly
          (Eric Dumazet)
      
      11) vlan_info objects leak for the implicit vlan with ID 0 (Amir
          Hanania)
      
      12) A fix for TX time stamp handling in gianfar does not transfer socket
          ownership from one packet to another correctly, resulting in a
          socket write space imbalance (Eric Dumazet)
      
      13) Julia Lawall found several cases where we do a list iteration, and
          then at the loop termination unconditionally assume we ended up with
          real list object, rather than the list head itself (CNIC, RXRPC,
          mISDN).
      
      14) The bonding driver handles procfs moving incorrectly when a device
          it manages is moved from one namespace to another (Eric Biederman)
      
      15) Missing memory barriers in stmmac descriptor accesses result in
          various crashes (Deepak Sikri)
      
      16) Fix handling of broadcast packets in batman-adv (Simon Wunderlich)
      
      17) Properly check the sanity of sendmsg() lengths in ieee802154's
          dgram_sendmsg().  Dave Jones and others have hit and reported this
          bug (Sasha Levin)
      
      18) Some drivers (b44 and b43legacy) on 64-bit machines stopped working
          because of how netdev_alloc_skb() was adjusted.  Such drivers should
          now use alloc_skb() for obtaining bounce buffers.  (Eric Dumazet)
      
      19) atl1c mis-managed it's link state in that it stops the queue by hand
          on link down.  The generic networking takes care of that and this
          double stop locks the queue down.  So simply removing the driver's
          queue stop call fixes the problem (Cloud Ren)
      
      20) Fix out-of-memory due to mis-accounting in net_em packet scheduler
          (Eric Dumazet)
      
      21) If DCB and SR-IOV are configured at the same time in IXGBE the chip
          will hang because this is not supported (Alexander Duyck)
      
      22) A commit to stop drivers using netdev->base_addr broke the CNIC
          driver (Michael Chan)
      
      23) Timeout regression in ipset caused by an attempt to fix an overflow
          bug (Jozsef Kadlecsik).
      
      24) mac80211 minstrel code allocates memory using incorrect size
          (Thomas Huehn)
      
      25) llcp_sock_getname() needs to check for a NULL device otherwise we
          OOPS (Sasha Levin)
      
      26) mwifiex leaks memory (Bing Zhao)
      
      27) Propagate iwlwifi fix to iwlegacy, even when we're not associated
          we need to monitor for stuck queues in the watchdog handler
          (Stanislaw Geuszka)
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        ipvs: fix oops in ip_vs_dst_event on rmmod
        ipvs: fix oops on NAT reply in br_nf context
        ixgbevf: Fix panic when loading driver
        ax25: Fix missing break
        MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
        caif: Fix access to freed pernet memory
        net: cgroup: fix access the unallocated memory in netprio cgroup
        ixgbevf: Prevent RX/TX statistics getting reset to zero
        sctp: Fix list corruption resulting from freeing an association on a list
        net: respect GFP_DMA in __netdev_alloc_skb()
        e1000e: fix test for PHY being accessible on 82577/8/9 and I217
        e1000e: Correct link check logic for 82571 serdes
        sch_sfb: Fix missing NULL check
        bnx2: Fix bug in bnx2_free_tx_skbs().
        IPoIB: fix skb truesize underestimatiom
        net: Fix memory leak - vlan_info struct
        gianfar: fix potential sk_wmem_alloc imbalance
        drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable
        net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable
        drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable
        ...
      a0185401
    • Linus Torvalds's avatar
      Merge tag 'single-rpmsg-3.5-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg · 635ac119
      Linus Torvalds authored
      Pull rpmsg fix from Ohad Ben-Cohen:
       "A single rpmsg fix for 3.5, coming from Federico Fuga, which
        eliminates the dependency on arbitrary initialization orders."
      
      * tag 'single-rpmsg-3.5-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
        rpmsg: fix dependency on initialization order
      635ac119
    • Linus Torvalds's avatar
      Merge branch 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping · 5bb93f1a
      Linus Torvalds authored
      Pull CMA and DMA-mapping fixes from Marek Szyprowski:
       "Another set of minor fixups for recently merged Contiguous Memory
        Allocator and ARM DMA-mapping changes.  Those patches fix mysterious
        crashes on systems with CMA and Himem enabled as well as some corner
        cases caused by typical off-by-one bug."
      
      * 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
        ARM: dma-mapping: modify condition check while freeing pages
        mm: cma: fix condition check when setting global cma area
        mm: cma: don't replace lowmem pages with highmem
      5bb93f1a
    • David S. Miller's avatar
      Merge branch 'master' of git://1984.lsi.us.es/nf · 602e65a3
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      I know that we're in fairly late stage to request pulls, but the IPVS people
      pinged me with little patches with oops fixes last week.
      
      One of them was recently introduced (during the 3.4 development cycle) while
      cleaning up the IPVS netns support. They are:
      
      * Fix one regression introduced in 3.4 while cleaning up the
        netns support for IPVS, from Julian Anastasov.
      
      * Fix one oops triggered due to resetting the conntrack attached to the skb
        instead of just putting it in the forward hook, from Lin Ming. This problem
        seems to be there since 2.6.37 according to Simon Horman.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      602e65a3
    • Federico Fuga's avatar
      rpmsg: fix dependency on initialization order · 96342526
      Federico Fuga authored
      When rpmsg drivers are built into the kernel, they must not initialize
      before the rpmsg bus does, otherwise they'd trigger a BUG() in
      drivers/base/driver.c line 169 (driver_register()).
      
      To fix that, and to stop depending on arbitrary linkage ordering of
      those built-in rpmsg drivers, we make the rpmsg bus initialize at
      subsys_initcall.
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarFederico Fuga <fuga@studiofuga.com>
      [ohad: rewrite the commit log]
      Signed-off-by: default avatarOhad Ben-Cohen <ohad@wizery.com>
      96342526
    • Julian Anastasov's avatar
      ipvs: fix oops in ip_vs_dst_event on rmmod · 283283c4
      Julian Anastasov authored
      	After commit 39f618b4 (3.4)
      "ipvs: reset ipvs pointer in netns" we can oops in
      ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
      is called after the ipvs_core_ops subsys is unregistered and
      net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
      if ipvs is NULL. It is safe because all services and dests
      for the net are already freed.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      283283c4
    • Lin Ming's avatar
      ipvs: fix oops on NAT reply in br_nf context · 9e33ce45
      Lin Ming authored
      IPVS should not reset skb->nf_bridge in FORWARD hook
      by calling nf_reset for NAT replies. It triggers oops in
      br_nf_forward_finish.
      
      [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.781792] PGD 218f9067 PUD 0
      [  579.781865] Oops: 0000 [#1] SMP
      [  579.781945] CPU 0
      [  579.781983] Modules linked in:
      [  579.782047]
      [  579.782080]
      [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
      [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
      [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
      [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
      [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
      [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
      [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
      [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
      [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
      [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
      [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
      [  579.783919] Stack:
      [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
      [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
      [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
      [  579.784477] Call Trace:
      [  579.784523]  <IRQ>
      [  579.784562]
      [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
      [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
      [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
      [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
      [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
      [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
      [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
      [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
      [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
      [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
      [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
      [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
      [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
      [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
      [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
      
      The steps to reproduce as follow,
      
      1. On Host1, setup brige br0(192.168.1.106)
      2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
      3. Start IPVS service on Host1
         ipvsadm -A -t 192.168.1.106:80 -s rr
         ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
      4. Run apache benchmark on Host2(192.168.1.101)
         ab -n 1000 http://192.168.1.106/
      
      ip_vs_reply4
        ip_vs_out
          handle_response
            ip_vs_notrack
              nf_reset()
              {
                skb->nf_bridge = NULL;
              }
      
      Actually, IPVS wants in this case just to replace nfct
      with untracked version. So replace the nf_reset(skb) call
      in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
      Signed-off-by: default avatarLin Ming <mlin@ss.pku.edu.cn>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9e33ce45
    • Alexander Duyck's avatar
      ixgbevf: Fix panic when loading driver · 10cc1bdd
      Alexander Duyck authored
      This patch addresses a kernel panic seen when setting up the interface.
      Specifically we see a NULL pointer dereference on the Tx descriptor cleanup
      path when enabling interrupts.  This change corrects that so it cannot
      occur.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarGreg Rose <gregory.v.rose@intel.com>
      Tested-by: default avatarSibai Li <sibai.li@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10cc1bdd
    • Alan Cox's avatar
      ax25: Fix missing break · ef764a13
      Alan Cox authored
      At least there seems to be no reason to disallow ROSE sockets when
      NETROM is loaded.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef764a13
    • Dmitry Eremin-Solenikov's avatar
      MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership · 68653359
      Dmitry Eremin-Solenikov authored
      As the life flows, developers priorities shifts a bit. Reflect actual
      changes in the maintainership of IEEE 802.15.4 code: Sergey mostly
      stopped cared about this piece of code. Most of the work recently was
      done by Alexander, so put him to the MAINTAINERS file to reflect his
      status and to ease the life of respective patches.
      
      Also add new net/mac802154/ directory to the list of maintained files.
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68653359
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 5dcaba7e
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      This series contains fixes to e1000e.
       ...
      Bruce Allan (1):
        e1000e: fix test for PHY being accessible on 82577/8/9 and I217
      
      Tushar Dave (1):
        e1000e: Correct link check logic for 82571 serdes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dcaba7e
    • Sjur Brændeland's avatar
      caif: Fix access to freed pernet memory · 96f80d12
      Sjur Brændeland authored
      unregister_netdevice_notifier() must be called before
      unregister_pernet_subsys() to avoid accessing already freed
      pernet memory. This fixes the following oops when doing rmmod:
      
      Call Trace:
       [<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
       [<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
       [<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
       [<ffffffff810e7734>] sys_delete_module+0x1a4/0x300
       [<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
       [<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
       [<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f
      
      RIP
       [<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]
      Signed-off-by: default avatarSjur Brændeland <sjur.brandeland@stericsson.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96f80d12
    • Gao feng's avatar
      net: cgroup: fix access the unallocated memory in netprio cgroup · ef209f15
      Gao feng authored
      there are some out of bound accesses in netprio cgroup.
      
      now before accessing the dev->priomap.priomap array,we only check
      if the dev->priomap exist.and because we don't want to see
      additional bound checkings in fast path, so we should make sure
      that dev->priomap is null or array size of dev->priomap.priomap
      is equal to max_prioidx + 1;
      
      so in write_priomap logic,we should call extend_netdev_table when
      dev->priomap is null and dev->priomap.priomap_len < max_len.
      and in cgrp_create->update_netdev_tables logic,we should call
      extend_netdev_table only when dev->priomap exist and
      dev->priomap.priomap_len < max_len.
      
      and it's not needed to call update_netdev_tables in write_priomap,
      we can only allocate the net device's priomap which we change through
      net_prio.ifpriomap.
      
      this patch also add a return value for update_netdev_tables &
      extend_netdev_table, so when new_priomap is allocated failed,
      write_priomap will stop to access the priomap,and return -ENOMEM
      back to the userspace to tell the user what happend.
      
      Change From v3:
      1. add rtnl protect when reading max_prioidx in write_priomap.
      
      2. only call extend_netdev_table when map->priomap_len < max_len,
         this will make sure array size of dev->map->priomap always
         bigger than any prioidx.
      
      3. add a function write_update_netdev_table to make codes clear.
      
      Change From v2:
      1. protect extend_netdev_table by RTNL.
      2. when extend_netdev_table failed,call dev_put to reduce device's refcount.
      Signed-off-by: default avatarGao feng <gaofeng@cn.fujitsu.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef209f15
    • Narendra K's avatar
      ixgbevf: Prevent RX/TX statistics getting reset to zero · 93659763
      Narendra K authored
      The commit 4197aa7b implements 64 bit
      per ring statistics. But the driver resets the 'total_bytes' and
      'total_packets' from RX and TX rings in the RX and TX interrupt
      handlers to zero. This results in statistics being lost and user space
      reporting RX and TX statistics as zero. This patch addresses the
      issue by preventing the resetting of RX and TX ring statistics to
      zero.
      Signed-off-by: default avatarNarendra K <narendra_k@dell.com>
      Tested-by: default avatarSibai Li <sibai.li@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93659763
    • Neil Horman's avatar
      sctp: Fix list corruption resulting from freeing an association on a list · 2eebc1e1
      Neil Horman authored
      A few days ago Dave Jones reported this oops:
      
      [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
      [22766.295376] CPU 0
      [22766.295384] Modules linked in:
      [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
      ffff880147c03a74
      [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
      [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
      [22766.387137] Stack:
      [22766.387140]  ffff880147c03a10
      [22766.387140]  ffffffffa169f2b6
      [22766.387140]  ffff88013ed95728
      [22766.387143]  0000000000000002
      [22766.387143]  0000000000000000
      [22766.387143]  ffff880003fad062
      [22766.387144]  ffff88013c120000
      [22766.387144]
      [22766.387145] Call Trace:
      [22766.387145]  <IRQ>
      [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
      [sctp]
      [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
      [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
      [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
      [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
      [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
      [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
      [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
      [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
      [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
      [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
      [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
      [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
      [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
      [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
      [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
      [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
      [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
      [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
      [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
      [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
      [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
      [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
      [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
      [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
      [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
      [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
      [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
      [22766.387283]  <EOI>
      [22766.387284]
      [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
      [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
      89 e5 48 83
      ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
      48 89 fb
      49 89 f5 66 c1 c0 08 66 39 46 02
      [22766.387307]
      [22766.387307] RIP
      [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
      [22766.387311]  RSP <ffff880147c039b0>
      [22766.387142]  ffffffffa16ab120
      [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
      [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
      
      It appears from his analysis and some staring at the code that this is likely
      occuring because an association is getting freed while still on the
      sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
      while a freed node corrupts part of the list.
      
      Nominally I would think that an mibalanced refcount was responsible for this,
      but I can't seem to find any obvious imbalance.  What I did note however was
      that the two places where we create an association using
      sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
      which free a newly created association after calling sctp_primitive_ASSOCIATE.
      sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
      issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
      the aforementioned hash table.  the sctp command interpreter that process side
      effects has not way to unwind previously processed commands, so freeing the
      association from the __sctp_connect or sctp_sendmsg error path would lead to a
      freed association remaining on this hash table.
      
      I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
      which allows us to proerly use hlist_unhashed to check if the node is on a
      hashlist safely during a delete.  That in turn alows us to safely call
      sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
      before freeing them, regardles of what the associations state is on the hash
      list.
      
      I noted, while I was doing this, that the __sctp_unhash_endpoint was using
      hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
      pointers to make that function work properly, so I fixed that up in a simmilar
      fashion.
      
      I attempted to test this using a virtual guest running the SCTP_RR test from
      netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
      able to recreate the problem prior to this fix, nor was I able to trigger the
      failure after (neither of which I suppose is suprising).  Given the trace above
      however, I think its likely that this is what we hit.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: davej@redhat.com
      CC: davej@redhat.com
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: linux-sctp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eebc1e1
  5. 16 Jul, 2012 5 commits