1. 07 May, 2014 1 commit
  2. 20 Apr, 2014 5 commits
  3. 19 Apr, 2014 12 commits
    • Adrien BAK's avatar
      perf tools: Improve error reporting · ffa91880
      Adrien BAK authored
      In the current version, when using perf record, if something goes
      wrong in tools/perf/builtin-record.c:375
        session = perf_session__new(file, false, NULL);
      
      The error message:
      "Not enough memory for reading per file header"
      
      is issued. This error message seems to be outdated and is not very
      helpful. This patch proposes to replace this error message by
      "Perf session creation failed"
      
      I believe this issue has been brought to lkml:
      https://lkml.org/lkml/2014/2/24/458
      although this patch only tackles a (small) part of the issue.
      
      Additionnaly, this patch improves error reporting in
      tools/perf/util/data.c open_file_write.
      
      Currently, if the call to open fails, the user is unaware of it.
      This patch logs the error, before returning the error code to
      the caller.
      Reported-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAdrien BAK <adrien.bak@metascale.org>
      Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast
      [ Reorganize the changelog into paragraphs ]
      [ Added empty line after fd declaration in open_file_write ]
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      ffa91880
    • Vladimir Nikulichev's avatar
      perf tools: Adjust symbols in VDSO · 922d0e4d
      Vladimir Nikulichev authored
      pert-report doesn't resolve function names in VDSO:
      
      $ perf report --stdio -g flat,0.0,15,callee --sort pid
      ...
                  8.76%
                     0x7fff6b1fe861
                     __gettimeofday
                     ACE_OS::gettimeofday()
      ...
      
      In this case symbol values should be adjusted the same way as for executables,
      relocatable objects and prelinked libraries.
      
      After fix:
      
      $ perf report --stdio -g flat,0.0,15,callee --sort pid
      ...
                  8.76%
                     __vdso_gettimeofday
                     __gettimeofday
                     ACE_OS::gettimeofday()
      Signed-off-by: default avatarVladimir Nikulichev <nvs@tbricks.com>
      Tested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvsSigned-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      922d0e4d
    • Alexander Yarygin's avatar
      perf kvm: Fix 'Min time' counting in report command · acb61fc8
      Alexander Yarygin authored
      Every event in the perf-kvm has a 'stats' structure, which contains
      max/min/average/etc times of handling this event.
      The problem is that the 'perf-kvm stat report' command always shows
      that 'min time' is 0us for every event. Example:
      
       # perf kvm stat report
      
       Analyze events for all VCPUs:
      
          VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
        [..]
        0xB2 MSCH         12     0.07%     0.00%        0us        8us 7.31us ( +-   2.11% )
        0xB2 CHSC         12     0.07%     0.00%        0us       18us 9.39us ( +-   9.49% )
        0xB2 STPX          8     0.05%     0.00%        0us        2us 1.88us ( +-   7.18% )
        0xB2 STSI          7     0.04%     0.00%        0us       44us 16.49us ( +-  38.20% )
        [..]
      
      This happens because the 'stats' structure is not initialized and
      stats->min equals to 0. Lets initialize the structure for every
      event after its allocation using init_stats() function. This initializes
      stats->min to -1 and makes 'Min time' statistics counting work:
      
       # perf kvm stat report
      
       Analyze events for all VCPUs:
      
          VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
        [..]
        0xB2 MSCH         12     0.07%     0.00%        6us        8us 7.31us ( +-   2.11% )
        0xB2 CHSC         12     0.07%     0.00%        7us       18us 9.39us ( +-   9.49% )
        0xB2 STPX          8     0.05%     0.00%        1us        2us 1.88us ( +-   7.18% )
        0xB2 STSI          7     0.04%     0.00%        1us       44us 16.49us ( +-  38.20% )
        [..]
      Signed-off-by: default avatarAlexander Yarygin <yarygin@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com
      [ Fixing the perf examples changelog output ]
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      acb61fc8
    • Eric Dumazet's avatar
      coredump: fix va_list corruption · 404ca80e
      Eric Dumazet authored
      A va_list needs to be copied in case it needs to be used twice.
      
      Thanks to Hugh for debugging this issue, leading to various panics.
      
      Tested:
      
        lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
      
      'produce_core' is simply : main() { *(int *)0 = 1;}
      
        lpq84:~# ./produce_core
        Segmentation fault (core dumped)
        lpq84:~# dmesg | tail -1
        [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
      
      Notice the last argument was replaced by a NULL (we were lucky enough to
      not crash, but do not try this on your production machine !)
      
      After fix :
      
        lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
        lpq83:~# ./produce_core
        Segmentation fault
        lpq83:~# dmesg | tail -1
        [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
      
      Fixes: 5fe9d8ca ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      404ca80e
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6d459690
      Linus Torvalds authored
      Pull x86 fix from Ingo Molnar:
       "This fixes the preemption-count imbalance crash reported by Owen
        Kibel"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Fix CMCI preemption bugs
      6d459690
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8f98f6f5
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes:
      
         - a SCHED_DEADLINE task selection fix
         - a sched/numa related lockdep splat fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Check for stop task appearance when balancing happens
        sched/numa: Fix task_numa_free() lockdep splat
      8f98f6f5
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8de3f7a7
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Two kernel side fixes:
      
         - an Intel uncore PMU driver potential crash fix
         - a kprobes/perf-call-graph interaction fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
        kprobes/x86: Fix page-fault handling logic
      8de3f7a7
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · b9312420
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Unfortunately this contains no easter eggs, its a bit larger than I'd
        like, but I included a patch that just moves code from one file to
        another and I'd like to avoid merge conflicts with that later, so it
        makes it seem worse than it is,
      
        Otherwise:
         - radeon: fixes to use new microcode to stabilise some cards, use
           some common displayport code, some runtime pm fixes, pll regression
           fixes
         - i915: fix for some context oopses, a warn in a used path, backlight
           fixes
         - nouveau: regression fix
         - omap: a bunch of fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (51 commits)
        drm: bochs: drop unused struct fields
        drm: bochs: add power management support
        drm: cirrus: add power management support
        drm: Split out drm_probe_helper.c from drm_crtc_helper.c
        drm/plane-helper: Don't fake-implement primary plane disabling
        drm/ast: fix value check in cbr_scan2
        drm/nouveau/bios: fix a bit shift error introduced by 457e77b2
        drm/radeon/ci: make sure mc ucode is loaded before checking the size
        drm/radeon/si: make sure mc ucode is loaded before checking the size
        drm/radeon: improve PLL params if we don't match exactly v2
        drm/radeon: memory leak on bo reservation failure. v2
        drm/radeon: fix VCE fence command
        drm/radeon: re-enable mclk dpm on R7 260X asics
        drm/radeon: add support for newer mc ucode on CI (v2)
        drm/radeon: add support for newer mc ucode on SI (v2)
        drm/radeon: apply more strict limits for PLL params v2
        drm/radeon: update CI DPM powertune settings
        drm/radeon: fix runpm handling on APUs (v4)
        drm/radeon: disable mclk dpm on R7 260X
        drm/tegra: Remove gratuitous pad field
        ...
      b9312420
    • Dave Airlie's avatar
      Merge branch 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux into drm-next · a42892ed
      Dave Airlie authored
      Some i2c fixes over DisplayPort.
      
      * 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
        drm/radeon: Improve vramlimit module param documentation
        drm/radeon: fix audio pin counts for DCE6+ (v2)
        drm/radeon/dp: switch to the common i2c over aux code
        drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
        drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
        drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
        drm/i915: support address only i2c-over-aux transactions
        drm/tegra: dp: Support address-only I2C-over-AUX transactions
      a42892ed
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ebfc45ee
      Linus Torvalds authored
      Pull more networking fixes from David Miller:
      
       1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
          context, not synchronize it.  From Chris Mason.
      
       2) Ipv4 flow input interface should never be zero, it should be
          LOOPBACK_IFINDEX instead.  From Cong Wang and Julian Anastasov.
      
       3) Properly configure MAC to PHY connection in mvneta devices, from
          Thomas Petazzoni.
      
       4) sys_recv should use SYSCALL_DEFINE.  From Jan Glauber.
      
       5) Tunnel driver ioctls do not use the correct namespace, fix from
          Nicolas Dichtel.
      
       6) Fix memory leak on seccomp filter attach, from Kees Cook.
      
       7) Fix lockdep warning for nested vlans, from Ding Tianhong.
      
       8) Crashes can happen in SCTP due to how the auth_enable value is
          managed, fix from Vlad Yasevich.
      
       9) Wireless fixes from John W Linville and co.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
        net: sctp: cache auth_enable per endpoint
        tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
        vlan: Fix lockdep warning when vlan dev handle notification
        seccomp: fix memory leak on filter attach
        isdn: icn: buffer overflow in icn_command()
        ip6_tunnel: use the right netns in ioctl handler
        sit: use the right netns in ioctl handler
        ip_tunnel: use the right netns in ioctl handler
        net: use SYSCALL_DEFINEx for sys_recv
        net: mdio-gpio: Add support for separate MDI and MDO gpio pins
        net: mdio-gpio: Add support for active low gpio pins
        net: mdio-gpio: Use devm_ functions where possible
        ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
        ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
        mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
        net: mvneta: properly configure the MAC <-> PHY connection in all situations
        net: phy: add minimal support for QSGMII PHY
        sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
        mwifiex: fix hung task on command timeout
        mwifiex: process event before command response
        ...
      ebfc45ee
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 6e66d5da
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "A set of 5 small cifs fixes"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        cif: fix dead code
        cifs: fix error handling cifs_user_readv
        fs: cifs: remove unused variable.
        Return correct error on query of xattr on file with empty xattrs
        cifs: Wait for writebacks to complete before attempting write.
      6e66d5da
    • Linus Torvalds's avatar
      Merge tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 25bfe4f5
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a few driver fixes for char/misc drivers that resolve
        reported issues.
      
        All have been in linux-next successfully for a few days"
      
      * tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts
        Tools: hv: Handle the case when the target file exists correctly
        vme_tsi148: Utilize to_pci_dev() macro
        vme_tsi148: Fix PCI address mapping assumption
        vme_tsi148: Fix typo in tsi148_slave_get()
        w1: avoid recursive device_add
        w1: fix netlink refcnt leak on error path
        misc: Grammar s/addition/additional/
        drivers: mcb: fix memory leak in chameleon_parse_cells() error path
        mei: ignore client writing state during cb completion
        mei: me: do not load the driver if the FW doesn't support MEI interface
        GenWQE: Increase driver version number
        GenWQE: Fix multithreading problems
        GenWQE: Ensure rc is not returning an uninitialized value
        GenWQE: Add wmb before DDCB is started
        GenWQE: Enable access to VPD flash area
      25bfe4f5
  4. 18 Apr, 2014 22 commits
    • Linus Torvalds's avatar
      Merge tag 'driver-core-3.15-rc2' of... · 60fbf2bd
      Linus Torvalds authored
      Merge tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are some driver core fixes for 3.15-rc2.  Also in here are some
        documentation updates, as well as an API removal that had to wait for
        after -rc1 due to the cleanups coming into you from multiple developer
        trees (this one and the PPC tree.)
      
        All have been in linux next successfully"
      
      * tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        drivers/base/dd.c incorrect pr_debug() parameters
        Documentation: Update stable address in Chinese and Japanese translations
        topology: Fix compilation warning when not in SMP
        Chinese: add translation of io_ordering.txt
        stable_kernel_rules: spelling/word usage
        sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
        kernfs: protect lazy kernfs_iattrs allocation with mutex
        fs: Don't return 0 from get_anon_bdev
      60fbf2bd
    • Linus Torvalds's avatar
      Merge tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 8cb652bb
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are a few staging driver fixes for issues that have been reported
        for 3.15-rc2.
      
        Also dominating the diffstat for the pull request is the removal of
        the rtl8187se driver.  It's no longer needed in staging as a "real"
        driver for this hardware is now merged in the tree in the "correct"
        location in drivers/net/
      
        All of these patches have been tested in linux-next"
      
      * tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: r8188eu: Fix case where ethtype was never obtained and always be checked against 0
        staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0
        staging: r8188eu: Calling rtw_get_stainfo() with a NULL sta_addr will return NULL
        staging: comedi: fix circular locking dependency in comedi_mmap()
        staging: r8723au: Add missing initialization of change_inx in sort algorithm
        Staging: unisys: use after free in list_for_each()
        staging: unisys: use after free in error messages
        staging: speakup: fix misuse of kstrtol() in handle_goto()
        staging: goldfish: Call free_irq in error path
        staging: delete rtl8187se wireless driver
        staging: rtl8723au: Fix buffer overflow in rtw_get_wfd_ie()
        staging: gs_fpgaboot: remove __TIMESTAMP__ macro
        staging: vme: fix memory leak in vme_user_probe()
        staging: fpgaboot: clean up Makefile
        staging/usbip: fix store_attach() sscanf return value check
        staging/usbip: userspace - fix usbipd SIGSEGV from refresh_exported_devices()
        staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls
        staging/rtl8821ae: Fix OOM handling in _rtl_init_deferred_work()
      8cb652bb
    • Linus Torvalds's avatar
      Merge tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 575a2929
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are a number of small tty/serial driver fixes for 3.15-rc2.  Also
        in here are some Documentation file removals for drivers that we
        removed a long time ago, no need to keep it around any longer.
      
        All of these have been in linux-next for a bit"
      
      * tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        Revert "serial: 8250, disable "too much work" messages"
        serial: amba-pl011: fix regression, causing an Oops on rmmod
        tty: Fix help text of SYNCLINK_CS
        tty: fix memleak in alloc_pid
        ttyprintk: Allow built as a module
        ttyprintk: Fix wrong tty_unregister_driver() call in the error path
        serial: 8250, disable "too much work" messages
        Documentation/serial: Delete obsolete driver documentation
        serial: omap: Fix missing pm_runtime_resume handling by simplifying code
        serial_core: Fix pm imbalance on unbind
        serial: pl011: change Rx burst size to half of trigger level
        serial: timberdale: Depend on X86_32
        serial: st-asc: Fix SysRq char handling
        Revert "serial: clps711x: Give a chance to perform useful tasks during wait loop"
        serial_core: Fix conditional start_tx on ring buffer not empty
        serial: efm32: use $vendor,$device scheme for compatible string
        serial: omap: free the wakeup settings in remove
      575a2929
    • Linus Torvalds's avatar
      Merge tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 7e55f81e
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of tiny USB fixes and new device ids for 3.15-rc2.
        Nothing major, just issues some people have reported.
      
        All of these have been in linux-next"
      
      * tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        uas: fix deadlocky memory allocations
        uas: fix error handling during scsi_scan()
        uas: fix GFP_NOIO under spinlock
        uwb: adds missing error handling
        USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver
        USB: ohci-jz4740: FEAT_POWER is a port feature, not a hub feature
        USB: ohci-jz4740: Fix uninitialized variable warning
        USB: EHCI: tegra: set txfill_tuning
        usb: ehci-platform: Return immediately from suspend if ehci_suspend fails
        usb: ehci-exynos: Return immediately from suspend if ehci_suspend fails
        USB: fix crash during hotplug of PCI USB controller card
        USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate()
        usb: usb-common: fix typo for usb_state_string
        USB: usb_wwan: fix handling of missing bulk endpoints
        USB: pl2303: add ids for Hewlett-Packard HP POS pole displays
        USB: cp210x: Add 8281 (Nanotec Plug & Drive)
        usb: option driver, add support for Telit UE910v2
        Revert "USB: serial: add usbid for dell wwan card to sierra.c"
        USB: serial: ftdi_sio: add id for Brainboxes serial cards
      7e55f81e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · ea2388f2
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        thp: close race between split and zap huge pages
        mm: fix new kernel-doc warning in filemap.c
        mm: fix CONFIG_DEBUG_VM_RB description
        mm: use paravirt friendly ops for NUMA hinting ptes
        mips: export flush_icache_range
        mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
        wait: explain the shadowing and type inconsistencies
        Shiraz has moved
        Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
        powerpc/mm: fix ".__node_distance" undefined
        kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
        init/Kconfig: move the trusted keyring config option to general setup
        vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()
      ea2388f2
    • Kirill A. Shutemov's avatar
      thp: close race between split and zap huge pages · b5a8cad3
      Kirill A. Shutemov authored
      Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
      have the same root cause.  Let's look to them one by one.
      
      The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
      BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
      testing I see that page_mapcount() is higher than mapcount here.
      
      I think it happens due to race between zap_huge_pmd() and
      page_check_address_pmd().  page_check_address_pmd() misses PMD which is
      under zap:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  /*
      	   * We check if PMD present without taking ptl: no
      	   * serialization against zap_huge_pmd(). We miss this PMD,
      	   * it's not accounted to 'mapcount' in __split_huge_page().
      	   */
      	  pmd_present(pmd) == 0
      
        BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
      
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      
      The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
      It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
      
      This happens in similar way:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  pmd_present(pmd) == 0	/* The same comment as above */
        /*
         * No crash this time since we already decremented page->_mapcount in
         * zap_huge_pmd().
         */
        BUG_ON(mapcount != page_mapcount(page))
      
        /*
         * We split the compound page here into small pages without
         * serialization against zap_huge_pmd()
         */
        __split_huge_page_refcount()
      						VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
      
      So my understanding the problem is pmd_present() check in mm_find_pmd()
      without taking page table lock.
      
      The bug was introduced by me commit with commit 117b0791. Sorry for
      that. :(
      
      Let's open code mm_find_pmd() in page_check_address_pmd() and do the
      check under page table lock.
      
      Note that __page_check_address() does the same for PTE entires
      if sync != 0.
      
      I've stress tested split and zap code paths for 36+ hours by now and
      don't see crashes with the patch applied. Before it took <20 min to
      trigger the first bug and few hours for second one (if we ignore
      first).
      
      [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
      [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[3.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5a8cad3
    • Randy Dunlap's avatar
      mm: fix new kernel-doc warning in filemap.c · b59b8cbc
      Randy Dunlap authored
      Fix new kernel-doc warning in mm/filemap.c:
      
        Warning(mm/filemap.c:2600): Excess function parameter 'ppos' description in '__generic_file_aio_write'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b59b8cbc
    • Davidlohr Bueso's avatar
      mm: fix CONFIG_DEBUG_VM_RB description · a663dad6
      Davidlohr Bueso authored
      This appears to be a copy/paste error.  Update the description to
      reflect extra rbtree debug and checks for the config option instead of
      duplicating CONFIG_DEBUG_VM.
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a663dad6
    • Mel Gorman's avatar
      mm: use paravirt friendly ops for NUMA hinting ptes · 29c77870
      Mel Gorman authored
      David Vrabel identified a regression when using automatic NUMA balancing
      under Xen whereby page table entries were getting corrupted due to the
      use of native PTE operations.  Quoting him
      
      	Xen PV guest page tables require that their entries use machine
      	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
      	successful migration) non-present PTEs must use pseudo-physical
      	addresses.  This is because on migration MFNs in present PTEs are
      	translated to PFNs (canonicalised) so they may be translated back
      	to the new MFN in the destination domain (uncanonicalised).
      
      	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
      	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
      	pte_clear_flags(), etc.
      
      	In a Xen PV guest, these functions must translate MFNs to PFNs
      	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
      	_PAGE_PRESENT.
      
      His suggested fix converted p[te|md]_[set|clear]_flags to using
      paravirt-friendly ops but this is overkill.  He suggested an alternative
      of using p[te|md]_modify in the NUMA page table operations but this is
      does more work than necessary and would require looking up a VMA for
      protections.
      
      This patch modifies the NUMA page table operations to use paravirt
      friendly operations to set/clear the flags of interest.  Unfortunately
      this will take a performance hit when updating the PTEs on
      CONFIG_PARAVIRT but I do not see a way around it that does not break
      Xen.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Tested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29c77870
    • Kees Cook's avatar
      mips: export flush_icache_range · 8229f1a0
      Kees Cook authored
      The lkdtm module performs tests against executable memory ranges, so it
      needs to flush the icache for proper behaviors.  Other architectures
      already export this, so do the same for MIPS.
      
      [akpm@linux-foundation.org: relocate export sites]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8229f1a0
    • Mizuma, Masayoshi's avatar
      mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages() · 7848a4bf
      Mizuma, Masayoshi authored
      soft lockup in freeing gigantic hugepage fixed in commit 55f67141 "mm:
      hugetlb: fix softlockup when a large number of hugepages are freed." can
      happen in return_unused_surplus_pages(), so let's fix it.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7848a4bf
    • Peter Zijlstra's avatar
      wait: explain the shadowing and type inconsistencies · 8b32201d
      Peter Zijlstra authored
      Stick in a comment before someone else tries to fix the sparse warning
      this generates.
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.orgSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b32201d
    • Viresh Kumar's avatar
      Shiraz has moved · 9cc23682
      Viresh Kumar authored
      shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
      company.  Replace ST's id with shiraz.linux.kernel@gmail.com.
      
      It also updates .mailmap file to fix address for 'git shortlog'.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9cc23682
    • Tang Chen's avatar
      Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt · 8f28ed92
      Tang Chen authored
      In document numa_memory_policy.txt, the following examples for flag
      MPOL_F_RELATIVE_NODES are incorrect.
      
      	For example, consider a task that is attached to a cpuset with
      	mems 2-5 that sets an Interleave policy over the same set with
      	MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
      	interleave now occurs over nodes 3,5-6.  If the cpuset's mems
      	then change to 0,2-3,5, then the interleave occurs over nodes
      	0,3,5.
      
      According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
      the nodemasks the user specifies should be considered relative to the
      current task's mems_allowed.
      
       (https://lkml.org/lkml/2008/2/29/428)
      
      And according to numa_memory_policy.txt, if the user's nodemask includes
      nodes that are outside the range of the new set of allowed nodes, then
      the remap wraps around to the beginning of the nodemask and, if not
      already set, sets the node in the mempolicy nodemask.
      
      So in the example, if the user specifies 2-5, for a task whose
      mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
      fifth, sixth node in mems_allowed.  like the following:
      
      	mems_allowed:       3  4  5  6  7
      
      	relative index:     0  1  2  3  4
      	                    5
      
      So the nodemasks should be remapped to 3,5-7, but not 3,5-6.
      
      And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
      remapped to 0,2-3,5, but not 0,3,5.
      
      	mems_allowed:       0  2  3  5
      
              relative index:     0  1  2  3
                                  4  5
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f28ed92
    • Mike Qiu's avatar
      powerpc/mm: fix ".__node_distance" undefined · 12c743eb
      Mike Qiu authored
        CHK     include/config/kernel.release
        CHK     include/generated/uapi/linux/version.h
        CHK     include/generated/utsrelease.h
        ...
        Building modules, stage 2.
      WARNING: 1 bad relocations
      c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
        WRAP    arch/powerpc/boot/zImage.pseries
        WRAP    arch/powerpc/boot/zImage.epapr
        MODPOST 1849 modules
      ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      make: *** Waiting for unfinished jobs....
      
      The reason is symbol "__node_distance" not been exported in powerpc.
      Signed-off-by: default avatarMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12c743eb
    • Andrew Morton's avatar
      kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write() · 7861144b
      Andrew Morton authored
      Fix:
      
        BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
        caller is __this_cpu_preempt_check+0x13/0x20
        CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G        W     3.15.0-rc1 #9
        Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
        Call Trace:
          check_preemption_disabled+0xe1/0xf0
          __this_cpu_preempt_check+0x13/0x20
          touch_nmi_watchdog+0x28/0x40
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Tested-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Cc: Eric Piel <eric.piel@tremplin-utc.net>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7861144b
    • Peter Foley's avatar
      init/Kconfig: move the trusted keyring config option to general setup · 82c04ff8
      Peter Foley authored
      The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it
      to show up in the toplevel of the kernel configuration.  Fix this by
      moving it under the General Setup menu.
      Signed-off-by: default avatarPeter Foley <pefoley2@pefoley.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82c04ff8
    • Christoph Lameter's avatar
      vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state() · 83da7510
      Christoph Lameter authored
      Seems to be called with preemption enabled.  Therefore it must use
      mod_zone_page_state instead.
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83da7510
    • Vlad Yasevich's avatar
      net: sctp: cache auth_enable per endpoint · b14878cc
      Vlad Yasevich authored
      Currently, it is possible to create an SCTP socket, then switch
      auth_enable via sysctl setting to 1 and crash the system on connect:
      
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
      task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
      [...]
      Call Trace:
      [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
      [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
      [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
      [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
      [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
      [<ffffffff8043af68>] sctp_rcv+0x588/0x630
      [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
      [<ffffffff803acb50>] ip6_input+0x2c0/0x440
      [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
      [<ffffffff80310650>] process_backlog+0xb4/0x18c
      [<ffffffff80313cbc>] net_rx_action+0x12c/0x210
      [<ffffffff80034254>] __do_softirq+0x17c/0x2ac
      [<ffffffff800345e0>] irq_exit+0x54/0xb0
      [<ffffffff800075a4>] ret_from_irq+0x0/0x4
      [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
      [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
      [<ffffffff805a88b0>] start_kernel+0x37c/0x398
      Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
      03e00008  00000000
      ---[ end trace b530b0551467f2fd ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      
      What happens while auth_enable=0 in that case is, that
      ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
      when endpoint is being created.
      
      After that point, if an admin switches over to auth_enable=1,
      the machine can crash due to NULL pointer dereference during
      reception of an INIT chunk. When we enter sctp_process_init()
      via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
      the INIT verification succeeds and while we walk and process
      all INIT params via sctp_process_param() we find that
      net->sctp.auth_enable is set, therefore do not fall through,
      but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
      dereference what we have set to NULL during endpoint
      initialization phase.
      
      The fix is to make auth_enable immutable by caching its value
      during endpoint initialization, so that its original value is
      being carried along until destruction. The bug seems to originate
      from the very first days.
      
      Fix in joint work with Daniel Borkmann.
      Reported-by: default avatarJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Tested-by: default avatarJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b14878cc
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 5a292f7b
      David S. Miller authored
      John W. Linville says:
      
      ====================
      pull request: wireless 2014-04-17
      
      Please pull this batch of fixes intended for the 3.15 stream...
      
      For the mac80211 bits, Johannes says:
      
      "We have a fix from Chun-Yeow to not look at management frame bitrates
      that are typically really low, two fixes from Felix for AP_VLAN
      interfaces, a fix from Ido to disable SMPS settings when a monitor
      interface is enabled, a radar detection fix from Michał and a fix from
      myself for a very old remain-on-channel bug."
      
      For the iwlwifi bits, Emmanuel says:
      
      "I have new device IDs and a new firmware API. These are the trivial
      ones. The less trivial ones are Johannes's fix that delays the
      enablement of an interrupt coalescing hardware until after association
      - this fixes a few connection problems seen in the field. Eyal has a
      bunch of rate control fixes. I decided to add these for 3.15 because
      they fix some disconnection and packet loss scenarios which were
      reported by the field. I also have a fix for a memory leak that
      happens only with a very new NIC."
      
      Along with those...
      
      Amitkumar Karwar fixes a couple of problems relating to driver/firmware
      interactions in mwifiex.
      
      Christian Engelmayer avoids a couple of potential memory leaks in
      the new rsi driver.
      
      Eliad Peller provides a wl18xx mailbox alignment fix for problems
      when using new firmware.
      
      Frederic Danis adds a couple of missing debugging strings to the
      cw1200 driver.
      
      Geert Uytterhoeven adds a variable initialization inside of the
      rsi driver.
      
      Luciano Coelho patches the wlcore code to ignore dummy packet events
      in PLT mode in order to work around a firmware bug.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a292f7b
    • Ivan Vecera's avatar
      tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled · ba67b510
      Ivan Vecera authored
      The patch fixes a problem with dropped jumbo frames after usage of
      'ethtool -G ... rx'.
      
      Scenario:
      1. ip link set eth0 up
      2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo
      3. ip link set mtu 9000 dev eth0
      
      The ethtool command set rx_jumbo_pending to zero so any received jumbo
      packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N'
      to workaround the issue.
      The patch changes the logic so rx_jumbo_pending value is changed only if
      jumbo frames are enabled (MTU > 1500).
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba67b510
    • dingtianhong's avatar
      vlan: Fix lockdep warning when vlan dev handle notification · dc8eaaa0
      dingtianhong authored
      When I open the LOCKDEP config and run these steps:
      
      modprobe 8021q
      vconfig add eth2 20
      vconfig add eth2.20 30
      ifconfig eth2 xx.xx.xx.xx
      
      then the Call Trace happened:
      
      [32524.386288] =============================================
      [32524.386293] [ INFO: possible recursive locking detected ]
      [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G           O
      [32524.386302] ---------------------------------------------
      [32524.386306] ifconfig/3103 is trying to acquire lock:
      [32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386326]
      [32524.386326] but task is already holding lock:
      [32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386341]
      [32524.386341] other info that might help us debug this:
      [32524.386345]  Possible unsafe locking scenario:
      [32524.386345]
      [32524.386350]        CPU0
      [32524.386352]        ----
      [32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386364]
      [32524.386364]  *** DEADLOCK ***
      [32524.386364]
      [32524.386368]  May be due to missing lock nesting notation
      [32524.386368]
      [32524.386373] 2 locks held by ifconfig/3103:
      [32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
      [32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386398]
      [32524.386398] stack backtrace:
      [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ #35
      [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
      [32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
      [32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
      [32524.386435] Call Trace:
      [32524.386441]  [<ffffffff814f68a2>] dump_stack+0x6a/0x78
      [32524.386448]  [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
      [32524.386454]  [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
      [32524.386459]  [<ffffffff810a4874>] lock_acquire+0xe4/0x110
      [32524.386464]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386471]  [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
      [32524.386476]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386481]  [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386489]  [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
      [32524.386495]  [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
      [32524.386500]  [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
      [32524.386506]  [<ffffffff8141b3cf>] __dev_open+0xef/0x150
      [32524.386511]  [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
      [32524.386516]  [<ffffffff8141b292>] dev_change_flags+0x32/0x80
      [32524.386524]  [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
      [32524.386532]  [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
      [32524.386540]  [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
      [32524.386550]  [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
      [32524.386558]  [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
      [32524.386568]  [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
      [32524.386578]  [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
      [32524.386586]  [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
      [32524.386594]  [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
      [32524.386604]  [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b
      
      ========================================================================
      
      The reason is that all of the addr_lock_key for vlan dev have the same class,
      so if we change the status for vlan dev, the vlan dev and its real dev will
      hold the same class of addr_lock_key together, so the warning happened.
      
      we should distinguish the lock depth for vlan dev and its real dev.
      
      v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
      	could support to add 8 vlan id on a same vlan dev, I think it is enough for current
      	scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
      	and the vlan dev would not meet the same class key with its real dev.
      
      	The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
      	dev could get a suitable class key.
      
      v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
      	and its real dev, but it make no sense, because the difference for subclass in the
      	lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
      	to distinguish the different depth for every vlan dev, the same depth of the vlan dev
      	could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
      	I think it is enough here, the lockdep should never exceed that value.
      
      v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
      	we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
      	and use the depth as the subclass for addr_lock_key.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc8eaaa0