1. 26 Feb, 2013 3 commits
    • Linus Torvalds's avatar
      Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux · fffddfd6
      Linus Torvalds authored
      Pull drm merge from Dave Airlie:
       "Highlights:
      
         - TI LCD controller KMS driver
      
         - TI OMAP KMS driver merged from staging
      
         - drop gma500 stub driver
      
         - the fbcon locking fixes
      
         - the vgacon dirty like zebra fix.
      
         - open firmware videomode and hdmi common code helpers
      
         - major locking rework for kms object handling - pageflip/cursor
           won't block on polling anymore!
      
         - fbcon helper and prime helper cleanups
      
         - i915: all over the map, haswell power well enhancements, valleyview
           macro horrors cleaned up, killing lots of legacy GTT code,
      
         - radeon: CS ioctl unification, deprecated UMS support, gpu reset
           rework, VM fixes
      
         - nouveau: reworked thermal code, external dp/tmds encoder support
           (anx9805), fences sleep instead of polling,
      
         - exynos: all over the driver fixes."
      
      Lovely conflict in radeon/evergreen_cs.c between commit de0babd6
      ("drm/radeon: enforce use of radeon_get_ib_value when reading user cmd")
      and the new changes that modified that evergreen_dma_cs_parse()
      function.
      
      * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (508 commits)
        drm/tilcdc: only build on arm
        drm/i915: Revert hdmi HDP pin checks
        drm/tegra: Add list of framebuffers to debugfs
        drm/tegra: Fix color expansion
        drm/tegra: Split DC_CMD_STATE_CONTROL register write
        drm/tegra: Implement page-flipping support
        drm/tegra: Implement VBLANK support
        drm/tegra: Implement .mode_set_base()
        drm/tegra: Add plane support
        drm/tegra: Remove bogus tegra_framebuffer structure
        drm: Add consistency check for page-flipping
        drm/radeon: Use generic HDMI infoframe helpers
        drm/tegra: Use generic HDMI infoframe helpers
        drm: Add EDID helper documentation
        drm: Add HDMI infoframe helpers
        video: Add generic HDMI infoframe helpers
        drm: Add some missing forward declarations
        drm: Move mode tables to drm_edid.c
        drm: Remove duplicate drm_mode_cea_vic()
        gma500: Fix n, m1 and m2 clock limits for sdvo and lvds
        ...
      fffddfd6
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 69086a78
      Linus Torvalds authored
      Pull vfs fix from Al Viro:
       "Fix for 3.8 breakage introduced by "vfs: Allow unprivileged
        manipulation of the mount namespace" - accessing mnt->mnt_ns is done
        there without needed locking *and* without any real need.
      
        Definite -stable fodder, fortunately not going too far back.
      
        This is *not* all - there will be much bigger vfs pull request
        tomorrow."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        get rid of unprotected dereferencing of mnt->mnt_ns
      69086a78
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 94f2f142
      Linus Torvalds authored
      Pull user namespace and namespace infrastructure changes from Eric W Biederman:
       "This set of changes starts with a few small enhnacements to the user
        namespace.  reboot support, allowing more arbitrary mappings, and
        support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
        user namespace root.
      
        I do my best to document that if you care about limiting your
        unprivileged users that when you have the user namespace support
        enabled you will need to enable memory control groups.
      
        There is a minor bug fix to prevent overflowing the stack if someone
        creates way too many user namespaces.
      
        The bulk of the changes are a continuation of the kuid/kgid push down
        work through the filesystems.  These changes make using uids and gids
        typesafe which ensures that these filesystems are safe to use when
        multiple user namespaces are in use.  The filesystems converted for
        3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
        changes for these filesystems were a little more involved so I split
        the changes into smaller hopefully obviously correct changes.
      
        XFS is the only filesystem that remains.  I was hoping I could get
        that in this release so that user namespace support would be enabled
        with an allyesconfig or an allmodconfig but it looks like the xfs
        changes need another couple of days before it they are ready."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
        cifs: Enable building with user namespaces enabled.
        cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
        cifs: Convert struct cifs_sb_info to use kuids and kgids
        cifs: Modify struct smb_vol to use kuids and kgids
        cifs: Convert struct cifsFileInfo to use a kuid
        cifs: Convert struct cifs_fattr to use kuid and kgids
        cifs: Convert struct tcon_link to use a kuid.
        cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
        cifs: Convert from a kuid before printing current_fsuid
        cifs: Use kuids and kgids SID to uid/gid mapping
        cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
        cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
        cifs: Override unmappable incoming uids and gids
        nfsd: Enable building with user namespaces enabled.
        nfsd: Properly compare and initialize kuids and kgids
        nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
        nfsd: Modify nfsd4_cb_sec to use kuids and kgids
        nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
        nfsd: Convert nfsxdr to use kuids and kgids
        nfsd: Convert nfs3xdr to use kuids and kgids
        ...
      94f2f142
  2. 25 Feb, 2013 17 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 8d168f71
      Linus Torvalds authored
      Pull KVM ARM compile fixes from Gleb Natapov:
       "Fix ARM KVM compilation breakage due to changes from kvm.git"
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        ARM: KVM: fix compilation after removal of user_alloc from struct kvm_memory_slot
        ARM: KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
        ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region
      8d168f71
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 32dc43e4
      Linus Torvalds authored
      Pull crypto update from Herbert Xu:
       "Here is the crypto update for 3.9:
      
         - Added accelerated implementation of crc32 using pclmulqdq.
      
         - Added test vector for fcrypt.
      
         - Added support for OMAP4/AM33XX cipher and hash.
      
         - Fixed loose crypto_user input checks.
      
         - Misc fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits)
        crypto: user - ensure user supplied strings are nul-terminated
        crypto: user - fix empty string test in report API
        crypto: user - fix info leaks in report API
        crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding.
        crypto: use ERR_CAST
        crypto: atmel-aes - adjust duplicate test
        crypto: crc32-pclmul - Kill warning on x86-32
        crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels
        crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC
        crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets
        crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_*
        crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions
        crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC
        crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions
        crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
        crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
        crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
        crypto: aesni-intel - add ENDPROC statements for assembler functions
        crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets
        crypto: testmgr - add test vector for fcrypt
        ...
      32dc43e4
    • Stephen Rothwell's avatar
      drm/tilcdc: only build on arm · be88298b
      Stephen Rothwell authored
      [airlied: hack for now until we fix cma helpers on other OF platforms]
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarDave Airlie <airlied@linux.ie>
      be88298b
    • Linus Torvalds's avatar
      Merge tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · d414c104
      Linus Torvalds authored
      Pull ia64 update from Tony Luck:
       "ia64 vm patch series that was cooking in -mm tree"
      
      * tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        mm: use vm_unmapped_area() in hugetlbfs on ia64 architecture
        mm: use vm_unmapped_area() on ia64 architecture
      d414c104
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · f6d43b93
      Linus Torvalds authored
      Pull security subsystem fixes from James Morris:
       "From Mimi:
      
          Both of these patches are bug fixes for patches, which were
          upstreamed in this open window.  The first patch addresses a merge
          issue.  The second patch addresses a CONFIG_BLOCK dependency."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        block: fix part_pack_uuid() build error
        ima: "remove enforce checking duplication" merge fix
      f6d43b93
    • Linus Torvalds's avatar
      Merge tag 'ktest-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · c69d0a15
      Linus Torvalds authored
      Pull ktest update from Steven Rostedt:
       "Added ability to have all builds test warnings.
      
        Fixed failing reboot when the reboot produces a non fatal error.
      
        Config reading fixes and other cleanups"
      
      * tag 'ktest-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Remove indexes from warnings check
        ktest: Ignore warnings during reboot
        ktest: Search for linux banner for successful reboot
        ktest: Add make_warnings_file and process full warnings
        ktest: Allow a test option to use its default option
        ktest: Strip off '\n' when reading which files were modified
        ktest: Do not require CONSOLE for build or install bisects
      c69d0a15
    • Linus Torvalds's avatar
      Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 9043a265
      Linus Torvalds authored
      Pull module update from Rusty Russell:
       "The sweeping change is to make add_taint() explicitly indicate whether
        to disable lockdep, but it's a mechanical change."
      
      * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        MODSIGN: Add option to not sign modules during modules_install
        MODSIGN: Add -s <signature> option to sign-file
        MODSIGN: Specify the hash algorithm on sign-file command line
        MODSIGN: Simplify Makefile with a Kconfig helper
        module: clean up load_module a little more.
        modpost: Ignore ARC specific non-alloc sections
        module: constify within_module_*
        taint: add explicit flag to show whether lock dep is still OK.
        module: printk message when module signature fail taints kernel.
      9043a265
    • Mimi Zohar's avatar
      block: fix part_pack_uuid() build error · 446d64e3
      Mimi Zohar authored
      Commit "85865c1f ima: add policy support for file system uuid"
      introduced a CONFIG_BLOCK dependency.  This patch defines a
      wrapper called blk_part_pack_uuid(), which returns -EINVAL,
      when CONFIG_BLOCK is not defined.
      
      security/integrity/ima/ima_policy.c:538:4: error: implicit declaration
      of function 'part_pack_uuid' [-Werror=implicit-function-declaration]
      
      Changelog v2:
      - Reference commit number in patch description
      Changelog v1:
      - rename ima_part_pack_uuid() to blk_part_pack_uuid()
      - resolve scripts/checkpatch.pl warnings
      Changelog v0:
      - fix UUID scripts/Lindent msgs
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      446d64e3
    • Mimi Zohar's avatar
      ima: "remove enforce checking duplication" merge fix · a2c2c3a7
      Mimi Zohar authored
      Commit "750943a3 ima: remove enforce checking duplication" combined
      the 'in IMA policy' and 'enforcing file integrity' checks.  For
      the non-file, kernel module verification, a specific check for
      'enforcing file integrity' was not added.  This patch adds the
      check.
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      a2c2c3a7
    • Marc Zyngier's avatar
      ARM: KVM: fix compilation after removal of user_alloc from struct kvm_memory_slot · 3b8cd8a0
      Marc Zyngier authored
      Commit 7a905b14 (KVM: Remove user_alloc from struct kvm_memory_slot)
      broke KVM/ARM by removing the user_alloc field from a public structure.
      
      As we only used this field to alert the user that we didn't support
      this operation mode, there is no harm in discarding this bit of code
      without any remorse.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      3b8cd8a0
    • Marc Zyngier's avatar
      ARM: KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS · 2b5e1e47
      Marc Zyngier authored
      Commit bbacc0c1 (KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS)
      broke KVM/ARM by changing a global #define.
      
      Apply the same change to fix the compilation breakage.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      2b5e1e47
    • Marc Zyngier's avatar
      ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region · bef103aa
      Marc Zyngier authored
      Commit f82a8cfe (KVM: struct kvm_memory_slot.user_alloc -> bool)
      broke the ARM KVM port by changing the prototype of two global
      functions.
      
      Apply the same change to fix the compilation breakage.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      bef103aa
    • Linus Torvalds's avatar
      Merge tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 · ab782659
      Linus Torvalds authored
      Pull MFS updates from Samuel Ortiz:
       "This is the MFD pull request for the 3.9 merge window.
      
        No new drivers this time, but a bunch of fairly big cleanups:
      
         - Roger Quadros worked on a OMAP USBHS and TLL platform data
           consolidation, OMAP5 support and clock management code cleanup.
      
         - The first step of a major sync for the ab8500 driver from Lee
           Jones.  In particular, the debugfs and the sysct interfaces got
           extended and improved.
      
         - Peter Ujfalusi sent a nice patchset for cleaning and fixing the
           twl-core driver, with a much needed module id lookup code
           improvement.
      
         - The regular wm5102 and arizona cleanups and fixes from Mark Brown.
      
         - Laxman Dewangan extended the palmas APIs in order to implement the
           palmas GPIO and rt drivers.
      
         - Laxman also added DT support for the tps65090 driver.
      
         - The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra
           and Darren Hart.
      
         - Linus Walleij patchset for the ab8500 driver allowed ab8500 and
           ab9540 based devices to switch to the new abx500 pin-ctrl driver.
      
         - The max8925 now has device tree and irqdomain support thanks to
           Qing Xu.
      
         - The recently added rtsx driver got a few cleanups and fixes for a
           better card detection code path and now also supports the RTS5227
           chipset, thanks to Wei Wang and Roger Tseng."
      
      * tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (109 commits)
        mfd: lpc_ich: Use devres API to allocate private data
        mfd: lpc_ich: Add Device IDs for Intel Wellsburg PCH
        mfd: lpc_sch: Accomodate partial population of the MFD devices
        mfd: da9052-i2c: Staticize da9052_i2c_fix()
        mfd: syscon: Fix sparse warning
        mfd: twl-core: Fix kernel panic on boot
        mfd: rtsx: Fix issue that booting OS with SD card inserted
        mfd: ab8500: Fix compile error
        mfd: Add missing GENERIC_HARDIRQS dependecies
        Documentation: Add docs for max8925 dt
        mfd: max8925: Add dts
        mfd: max8925: Support dt for backlight
        mfd: max8925: Fix onkey driver irq base
        mfd: max8925: Fix mfd device register failure
        mfd: max8925: Add irqdomain for dt
        mfd: vexpress: Allow vexpress-sysreg to self-initialise
        mfd: rtsx: Support RTS5227
        mfd: rtsx: Implement driving adjustment to device-dependent callbacks
        mfd: vexpress: Add pseudo-GPIO based LEDs
        mfd: ab8500: Rename ab8500 to abx500 for hwmon driver
        ...
      ab782659
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 21fbd580
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - Some cleanups at V4L2 documentation
      
       - new drivers: ts2020 frontend, ov9650 sensor, s5c73m3 sensor,
         sh-mobile veu mem2mem driver, radio-ma901, davinci_vpfe staging
         driver
      
       - Lots of missing MAINTAINERS entries added
      
       - several em28xx driver improvements, including its conversion to
         videobuf2
      
       - several fixups on drivers to make them to better comply with the API
      
       - DVB core: add support for DVBv5 stats, allowing the implementation of
         statistics for new standards like ISDB
      
       - mb86a20s: add statistics to the driver
      
       - lots of new board additions, cleanups, and driver improvements.
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (596 commits)
        [media] media: Add 0x3009 USB PID to ttusb2 driver (fixed diff)
        [media] rtl28xxu: Add USB IDs for Compro VideoMate U620F
        [media] em28xx: add usb id for terratec h5 rev. 3
        [media] media: rc: gpio-ir-recv: add support for device tree parsing
        [media] mceusb: move check earlier to make smatch happy
        [media] radio-si470x doc: add info about v4l2-ctl and sox+alsa
        [media] staging: media: Remove unnecessary OOM messages
        [media] sh_vou: Use vou_dev instead of vou_file wherever possible
        [media] sh_vou: Use video_drvdata()
        [media] drivers/media/platform/soc_camera/pxa_camera.c: use devm_ functions
        [media] mt9t112: mt9t111 format set up differs from mt9t112
        [media] sh-mobile-ceu-camera: fix SHARPNESS control default
        Revert "[media] fc0011: Return early, if the frequency is already tuned"
        [media] cx18/ivtv: fix regression: remove __init from a non-init function
        [media] em28xx: fix analog streaming with USB bulk transfers
        [media] stv0900: remove unnecessary null pointer check
        [media] fc0011: Return early, if the frequency is already tuned
        [media] fc0011: Add some sanity checks and cleanups
        [media] fc0011: Fix xin value clamping
        Revert "[media] [PATH,1/2] mxl5007 move reset to attach"
        ...
      21fbd580
    • Linus Torvalds's avatar
      Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · d9978ec5
      Linus Torvalds authored
      Pull libata updates from Jeff Garzik:
      
      1) apply, and then revert, the sysfs export of ATA host controller
         number.  Discussion was continuing after patch application, trying to
         figure out how to best mesh exported data with the installers,
         boot-time agents and other parties that want this info.
      
      2) Merge Zero-Power Optical Device Driver (ZPODD) support, bringing the
         wonderfulness of sane power management to your CD/DVD device.
      
         Includes one SCSI-subsystem patch (with appropriate ACKs), adding
         runtime PM support to 'sr' driver.  That is the ZPODD interaction
         bits.
      
         Patchset went through some 13 revisions before it got here; kudos to
         Intel for persistence.
      
      3) pata_samsung_cf: use devm_clk_get()
      
      4) more ata_piix, ahci PCI IDs
      
      5) Add SATA driver for R-Car SoC
      
      6) Convert libata to use devm_ioremap_resource (Note: I think Greg sent
         this to you, also)
      
      7) Set proper Sense Key (SK) in the SCSI simulator when ATA passthrough
         indicates check condition.  Google and specification hawks everywhere
         shall rejoice.
      
      * tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (22 commits)
        [libata] fix smatch warning for zpodd_wake_dev
        [libata] Set proper SK when CK_COND is set.
        [libata] Convert to devm_ioremap_resource()
        libata: add R-Car SATA driver
        ahci: Add Device IDs for Intel Wellsburg PCH
        ata_piix: Add Device IDs for Intel Wellsburg PCH
        [SCSI] remove can_power_off flag from scsi_device
        [libata] scsi: no poll when ODD is powered off
        [SCSI] sr: support runtime pm
        ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs
        ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs
        [libata] PM code cleanup for ata port
        [libata] pm: differentiate system and runtime pm for ata port
        Revert "libata: export host controller number thru /sys"
        libata: do not suspend port if normal ODD is attached
        libata: expose pm qos flags for ata device
        libata: handle power transition of ODD
        libata: check zero power ready status for ZPODD
        libata: move acpi notification code to zpodd
        libata: identify and init ZPODD devices
        ...
      d9978ec5
    • Nicolas Pitre's avatar
      tty vt: fix character insertion overflow · a883b70d
      Nicolas Pitre authored
      Commit 81732c3b ("tty vt: Fix line garbage in virtual console on
      command line edition") broke insert_char() in multiple ways.  Then
      commit b1a925f4 ("tty vt: Fix a regression in command line edition")
      partially fixed it.  However, the buffer being moved is still too large
      and overflowing beyond the end of the current line, corrupting existing
      characters on the next line.
      
      Example test case:
      
      echo -e "abc\nde\x1b[A\x1b[4h \x1b[4l\x1b[B"
      
      Expected result:
      
      ab c
      de
      
      Current result:
      
      ab c
       e
      
      Needless to say that this is very annoying when inserting words in the
      middle of paragraphs with certain text editors.
      Signed-off-by: default avatarNicolas Pitre <nico@linaro.org>
      Cc: Jean-François Moine <moinejf@free.fr>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a883b70d
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.9-rc0-tag' of... · 77be36de
      Linus Torvalds authored
      Merge tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull Xen update from Konrad Rzeszutek Wilk:
       "This has two new ACPI drivers for Xen - a physical CPU offline/online
        and a memory hotplug.  The way this works is that ACPI kicks the
        drivers and they make the appropiate hypercall to the hypervisor to
        tell it that there is a new CPU or memory.  There also some changes to
        the Xen ARM ABIs and couple of fixes.  One particularly nasty bug in
        the Xen PV spinlock code was fixed by Stefan Bader - and has been
        there since the 2.6.32!
      
        Features:
         - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor
           to be aware of new CPU and new DIMMs
         - Cleanups
        Bug-fixes:
         - Fixes a long-standing bug in the PV spinlock wherein we did not
           kick VCPUs that were in a tight loop.
         - Fixes in the error paths for the event channel machinery"
      
      Fix up a few semantic conflicts with the ACPI interface changes in
      drivers/xen/xen-acpi-{cpu,mem}hotplug.c.
      
      * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xen: event channel arrays are xen_ulong_t and not unsigned long
        xen: Send spinlock IPI to all waiters
        xen: introduce xen_remap, use it instead of ioremap
        xen: close evtchn port if binding to irq fails
        xen-evtchn: correct comment and error output
        xen/tmem: Add missing %s in the printk statement.
        xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0
        xen/acpi: ACPI cpu hotplug
        xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h
        xen/stub: driver for CPU hotplug
        xen/acpi: ACPI memory hotplug
        xen/stub: driver for memory hotplug
        xen: implement updated XENMEM_add_to_physmap_range ABI
        xen/smp: Move the common CPU init code a bit to prep for PVH patch.
      77be36de
  3. 24 Feb, 2013 20 commits
    • Linus Torvalds's avatar
      Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 89f88337
      Linus Torvalds authored
      Pull KVM updates from Marcelo Tosatti:
       "KVM updates for the 3.9 merge window, including x86 real mode
        emulation fixes, stronger memory slot interface restrictions, mmu_lock
        spinlock hold time reduction, improved handling of large page faults
        on shadow, initial APICv HW acceleration support, s390 channel IO
        based virtio, amongst others"
      
      * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
        Revert "KVM: MMU: lazily drop large spte"
        x86: pvclock kvm: align allocation size to page size
        KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
        x86 emulator: fix parity calculation for AAD instruction
        KVM: PPC: BookE: Handle alignment interrupts
        booke: Added DBCR4 SPR number
        KVM: PPC: booke: Allow multiple exception types
        KVM: PPC: booke: use vcpu reference from thread_struct
        KVM: Remove user_alloc from struct kvm_memory_slot
        KVM: VMX: disable apicv by default
        KVM: s390: Fix handling of iscs.
        KVM: MMU: cleanup __direct_map
        KVM: MMU: remove pt_access in mmu_set_spte
        KVM: MMU: cleanup mapping-level
        KVM: MMU: lazily drop large spte
        KVM: VMX: cleanup vmx_set_cr0().
        KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
        KVM: VMX: disable SMEP feature when guest is in non-paging mode
        KVM: Remove duplicate text in api.txt
        Revert "KVM: MMU: split kvm_mmu_free_page"
        ...
      89f88337
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal · 9e2d59ad
      Linus Torvalds authored
      Pull signal handling cleanups from Al Viro:
       "This is the first pile; another one will come a bit later and will
        contain SYSCALL_DEFINE-related patches.
      
         - a bunch of signal-related syscalls (both native and compat)
           unified.
      
         - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
           (fixing several potential problems with missing argument
           validation, while we are at it)
      
         - a lot of now-pointless wrappers killed
      
         - a couple of architectures (cris and hexagon) forgot to save
           altstack settings into sigframe, even though they used the
           (uninitialized) values in sigreturn; fixed.
      
         - microblaze fixes for delivery of multiple signals arriving at once
      
         - saner set of helpers for signal delivery introduced, several
           architectures switched to using those."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
        x86: convert to ksignal
        sparc: convert to ksignal
        arm: switch to struct ksignal * passing
        alpha: pass k_sigaction and siginfo_t using ksignal pointer
        burying unused conditionals
        make do_sigaltstack() static
        arm64: switch to generic old sigaction() (compat-only)
        arm64: switch to generic compat rt_sigaction()
        arm64: switch compat to generic old sigsuspend
        arm64: switch to generic compat rt_sigqueueinfo()
        arm64: switch to generic compat rt_sigpending()
        arm64: switch to generic compat rt_sigprocmask()
        arm64: switch to generic sigaltstack
        sparc: switch to generic old sigsuspend
        sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
        sparc: kill sign-extending wrappers for native syscalls
        kill sparc32_open()
        sparc: switch to use of generic old sigaction
        sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
        mips: switch to generic sys_fork() and sys_clone()
        ...
      9e2d59ad
    • Dave Airlie's avatar
      Merge branch 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next · 28ee4618
      Dave Airlie authored
      Thierry writes:
      "Remove a duplicate implementation of the CEA VIC lookup and move the CEA
      and other mode tables to drm_edid.c to make it more difficult to create
      duplicates of the tables.
      
      Add some helpers to pack CEA-861/HDMI AVI, audio and SPD infoframes into
      binary buffers that can easily be written into hardware registers. A new
      helper function makes it easy construct an AVI infoframe from a DRM
      display mode.
      
      Convert the Tegra and Radeon drivers to use the new HDMI helpers."
      * 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux:
        drm/radeon: Use generic HDMI infoframe helpers
        drm/tegra: Use generic HDMI infoframe helpers
        drm: Add EDID helper documentation
        drm: Add HDMI infoframe helpers
        video: Add generic HDMI infoframe helpers
        drm: Add some missing forward declarations
        drm: Move mode tables to drm_edid.c
        drm: Remove duplicate drm_mode_cea_vic()
      28ee4618
    • Dave Airlie's avatar
      Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next · a497bfe9
      Dave Airlie authored
      Two regressions fixes from snowboarding land
      
      * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
        drm/i915: Revert hdmi HDP pin checks
        drm/i915: Handle untiled planes when computing their offsets
      a497bfe9
    • Dave Airlie's avatar
      Merge branch 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next · a3b1097c
      Dave Airlie authored
      Thierry writes:
      "Add support for 2 hardware overlays found on Tegra. These support YUV
      pixel formats and can be used as video overlays. .mode_set_base() is
      implemented and support for VBLANK and page-flipping is added.
      
      A few minor bug fixes are also included and a new debugfs file allows
      to inspect the framebuffers attached to the Tegra DRM device."
      
      * 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux:
        drm/tegra: Add list of framebuffers to debugfs
        drm/tegra: Fix color expansion
        drm/tegra: Split DC_CMD_STATE_CONTROL register write
        drm/tegra: Implement page-flipping support
        drm/tegra: Implement VBLANK support
        drm/tegra: Implement .mode_set_base()
        drm/tegra: Add plane support
        drm/tegra: Remove bogus tegra_framebuffer structure
        drm: Add consistency check for page-flipping
      a3b1097c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (more incoming from Andrew) · 5ce1a70e
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
      
       - A little DM fix
      
       - the MM queue
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
        ksm: allocate roots when needed
        mm: cleanup "swapcache" in do_swap_page
        mm,ksm: swapoff might need to copy
        mm,ksm: FOLL_MIGRATION do migration_entry_wait
        ksm: shrink 32-bit rmap_item back to 32 bytes
        ksm: treat unstable nid like in stable tree
        ksm: add some comments
        tmpfs: fix mempolicy object leaks
        tmpfs: fix use-after-free of mempolicy object
        mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
        mm: export mmu notifier invalidates
        mm: accelerate mm_populate() treatment of THP pages
        mm: use long type for page counts in mm_populate() and get_user_pages()
        mm: accurately document nr_free_*_pages functions with code comments
        HWPOISON: change order of error_states[]'s elements
        HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
        memcg: stop warning on memcg_propagate_kmem
        net: change type of virtio_chan->p9_max_pages
        vmscan: change type of vm_total_pages to unsigned long
        fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
        ...
      5ce1a70e
    • Hugh Dickins's avatar
      ksm: allocate roots when needed · ef53d16c
      Hugh Dickins authored
      It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically
      allocated, particularly when very few users will ever actually tune
      merge_across_nodes 0 to use more than 1+1 of those trees.  Not a big
      deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity.
      
      Start off with 1+1 statically allocated, then if merge_across_nodes is
      ever tuned, allocate for nr_node_ids+nr_node_ids.  Do not attempt to
      free up the extra if it's tuned back, that would be a waste of effort.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef53d16c
    • Hugh Dickins's avatar
      mm: cleanup "swapcache" in do_swap_page · 56f31801
      Hugh Dickins authored
      I dislike the way in which "swapcache" gets used in do_swap_page():
      there is always a page from swapcache there (even if maybe uncached by
      the time we lock it), but tests are made according to "swapcache".
      Rework that with "page != swapcache", as has been done in unuse_pte().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56f31801
    • Hugh Dickins's avatar
      mm,ksm: swapoff might need to copy · 9e16b7fb
      Hugh Dickins authored
      Before establishing that KSM page migration was the cause of my
      WARN_ON_ONCE(page_mapped(page))s, I suspected that they came from the
      lack of a ksm_might_need_to_copy() in swapoff's unuse_pte() - which in
      many respects is equivalent to faulting in a page.
      
      In fact I've never caught that as the cause: but in theory it does at
      least need the KSM_RUN_UNMERGE check in ksm_might_need_to_copy(), to
      avoid bringing a KSM page back in when it's not supposed to be.
      
      I intended to copy how it's done in do_swap_page(), but have a strong
      aversion to how "swapcache" ends up being used there: rework it with
      "page != swapcache".
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e16b7fb
    • Hugh Dickins's avatar
      mm,ksm: FOLL_MIGRATION do migration_entry_wait · 5117b3b8
      Hugh Dickins authored
      In "ksm: remove old stable nodes more thoroughly" I said that I'd never
      seen its WARN_ON_ONCE(page_mapped(page)).  True at the time of writing,
      but it soon appeared once I tried fuller tests on the whole series.
      
      It turned out to be due to the KSM page migration itself: unmerge_and_
      remove_all_rmap_items() failed to locate and replace all the KSM pages,
      because of that hiatus in page migration when old pte has been replaced
      by migration entry, but not yet by new pte.  follow_page() finds no page
      at that instant, but a KSM page reappears shortly after, without a
      fault.
      
      Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
      for KSM's break_cow().  I'd have preferred to avoid another flag, and do
      it every time, in case someone else makes the same easy mistake; but did
      not find another transgressor (the common get_user_pages() is of course
      safe), and cannot be sure that every follow_page() caller is prepared to
      sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
      already sleep there, since anon_vma locking was changed to mutex, but
      maybe that's somehow excluded.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5117b3b8
    • Hugh Dickins's avatar
      ksm: shrink 32-bit rmap_item back to 32 bytes · bc56620b
      Hugh Dickins authored
      Think of struct rmap_item as an extension of struct page (restricted to
      MADV_MERGEABLE areas): there may be a lot of them, we need to keep them
      small, especially on 32-bit architectures of limited lowmem.
      
      Siting "int nid" after "unsigned int checksum" works nicely on 64-bit,
      making no change to its 64-byte struct rmap_item; but bloats the 32-bit
      struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which
      rounds up to 40 bytes once allocated from slab.  We'd better avoid that.
      
      Hey, I only just remembered that the anon_vma pointer in struct
      rmap_item has no purpose until the rmap_item is hung from a stable tree
      node (which has its own nid field); and rmap_item's nid field no purpose
      than to say which tree root to tell rb_erase() when unlinking from an
      unstable tree.
      
      Double them up in a union.  There's just one place where we set anon_vma
      early (when we already hold mmap_sem): now we must remove tree_rmap_item
      from its unstable tree there, before overwriting nid.  No need to
      spatter BUG()s around: we'd be seeing oopses if this were wrong.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc56620b
    • Hugh Dickins's avatar
      ksm: treat unstable nid like in stable tree · b599cbdf
      Hugh Dickins authored
      An inconsistency emerged in reviewing the NUMA node changes to KSM: when
      meeting a page from the wrong NUMA node in a stable tree, we say that
      it's okay for comparisons, but not as a leaf for merging; whereas when
      meeting a page from the wrong NUMA node in an unstable tree, we bail out
      immediately.
      
      Now, it might be that a wrong NUMA node in an unstable tree is more
      likely to correlate with instablility (different content, with rbnode
      now misplaced) than page migration; but even so, we are accustomed to
      instablility in the unstable tree.
      
      Without strong evidence for which strategy is generally better, I'd
      rather be consistent with what's done in the stable tree: accept a page
      from the wrong NUMA node for comparison, but not as a leaf for merging.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b599cbdf
    • Hugh Dickins's avatar
      ksm: add some comments · 8fdb3dbf
      Hugh Dickins authored
      Added slightly more detail to the Documentation of merge_across_nodes, a
      few comments in areas indicated by review, and renamed get_ksm_page()'s
      argument from "locked" to "lock_it".  No functional change.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fdb3dbf
    • Greg Thelen's avatar
      tmpfs: fix mempolicy object leaks · 49cd0a5c
      Greg Thelen authored
      Fix several mempolicy leaks in the tmpfs mount logic.  These leaks are
      slow - on the order of one object leaked per mount attempt.
      
      Leak 1 (umount doesn't free mpol allocated in mount):
          while true; do
              mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
              umount /mnt
          done
      
      Leak 2 (errors parsing remount options will leak mpol):
          mount -t tmpfs -o size=100M nodev /mnt
          while true; do
              mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
          done
          umount /mnt
      
      Leak 3 (multiple mpol per mount leak mpol):
          while true; do
              mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
              umount /mnt
          done
      
      This patch fixes all of the above.  I could have broken the patch into
      three pieces but is seemed easier to review as one.
      
      [akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49cd0a5c
    • Greg Thelen's avatar
      tmpfs: fix use-after-free of mempolicy object · 5f00110f
      Greg Thelen authored
      The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
      option is not specified in the remount request.  A new policy can be
      specified if mpol=M is given.
      
      Before this patch remounting an mpol bound tmpfs without specifying
      mpol= mount option in the remount request would set the filesystem's
      mempolicy object to a freed mempolicy object.
      
      To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
          # mkdir /tmp/x
      
          # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
      
          # mount -o remount,size=200M nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
              # note ? garbage in mpol=... output above
      
          # dd if=/dev/zero of=/tmp/x/f count=1
              # panic here
      
      Panic:
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          [...]
          Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
          Call Trace:
            mpol_shared_policy_init+0xa5/0x160
            shmem_get_inode+0x209/0x270
            shmem_mknod+0x3e/0xf0
            shmem_create+0x18/0x20
            vfs_create+0xb5/0x130
            do_last+0x9a1/0xea0
            path_openat+0xb3/0x4d0
            do_filp_open+0x42/0xa0
            do_sys_open+0xfe/0x1e0
            compat_sys_open+0x1b/0x20
            cstar_dispatch+0x7/0x1f
      
      Non-debug kernels will not crash immediately because referencing the
      dangling mpol will not cause a fault.  Instead the filesystem will
      reference a freed mempolicy object, which will cause unpredictable
      behavior.
      
      The problem boils down to a dropped mpol reference below if
      shmem_parse_options() does not allocate a new mpol:
      
          config = *sbinfo
          shmem_parse_options(data, &config, true)
          mpol_put(sbinfo->mpol)
          sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */
      
      This patch avoids the crash by not releasing the mempolicy if
      shmem_parse_options() doesn't create a new mpol.
      
      How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
      not look back further.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f00110f
    • Mel Gorman's avatar
      mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages · 67d46b29
      Mel Gorman authored
      Rob van der Heij reported the following (paraphrased) on private mail.
      
      	The scenario is that I want to avoid backups to fill up the page
      	cache and purge stuff that is more likely to be used again (this is
      	with s390x Linux on z/VM, so I don't give it as much memory that
      	we don't care anymore). So I have something with LD_PRELOAD that
      	intercepts the close() call (from tar, in this case) and issues
      	a posix_fadvise() just before closing the file.
      
      	This mostly works, except for small files (less than 14 pages)
      	that remains in page cache after the face.
      
      Unfortunately Rob has not had a chance to test this exact patch but the
      test program below should be reproducing the problem he described.
      
      The issue is the per-cpu pagevecs for LRU additions.  If the pages are
      added by one CPU but fadvise() is called on another then the pages
      remain resident as the invalidate_mapping_pages() only drains the local
      pagevecs via its call to pagevec_release().  The user-visible effect is
      that a program that uses fadvise() properly is not obeyed.
      
      A possible fix for this is to put the necessary smarts into
      invalidate_mapping_pages() to globally drain the LRU pagevecs if a
      pagevec page could not be discarded.  The downside with this is that an
      inode cache shrink would send a global IPI and memory pressure
      potentially causing global IPI storms is very undesirable.
      
      Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
      check if invalidate_mapping_pages() discarded all the requested pages.
      If a subset of pages are discarded it drains the LRU pagevecs and tries
      again.  If the second attempt fails, it assumes it is due to the pages
      being mapped, locked or dirty and does not care.  With this patch, an
      application using fadvise() correctly will be obeyed but there is a
      downside that a malicious application can force the kernel to send
      global IPIs and increase overhead.
      
      If accepted, I would like this to be considered as a -stable candidate.
      It's not an urgent issue but it's a system call that is not working as
      advertised which is weak.
      
      The following test program demonstrates the problem.  It should never
      report that pages are still resident but will without this patch.  It
      assumes that CPU 0 and 1 exist.
      
      int main() {
      	int fd;
      	int pagesize = getpagesize();
      	ssize_t written = 0, expected;
      	char *buf;
      	unsigned char *vec;
      	int resident, i;
      	cpu_set_t set;
      
      	/* Prepare a buffer for writing */
      	expected = FILESIZE_PAGES * pagesize;
      	buf = malloc(expected + 1);
      	if (buf == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      	buf[expected] = 0;
      	memset(buf, 'a', expected);
      
      	/* Prepare the mincore vec */
      	vec = malloc(FILESIZE_PAGES);
      	if (vec == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Bind ourselves to CPU 0 */
      	CPU_ZERO(&set);
      	CPU_SET(0, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* open file, unlink and write buffer */
      	fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
      	if (fd == -1) {
      		perror("open");
      		exit(EXIT_FAILURE);
      	}
      	unlink("fadvise-test-file");
      	while (written < expected) {
      		ssize_t this_write;
      		this_write = write(fd, buf + written, expected - written);
      
      		if (this_write == -1) {
      			perror("write");
      			exit(EXIT_FAILURE);
      		}
      
      		written += this_write;
      	}
      	free(buf);
      
      	/*
      	 * Force ourselves to another CPU. If fadvise only flushes the local
      	 * CPUs pagevecs then the fadvise will fail to discard all file pages
      	 */
      	CPU_ZERO(&set);
      	CPU_SET(1, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* sync and fadvise to discard the page cache */
      	fsync(fd);
      	if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
      		perror("posix_fadvise");
      		exit(EXIT_FAILURE);
      	}
      
      	/* map the file and use mincore to see which parts of it are resident */
      	buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
      	if (buf == NULL) {
      		perror("mmap");
      		exit(EXIT_FAILURE);
      	}
      	if (mincore(buf, expected, vec) == -1) {
      		perror("mincore");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Check residency */
      	for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
      		if (vec[i])
      			resident++;
      	}
      	if (resident != 0) {
      		printf("Nr unexpected pages resident: %d\n", resident);
      		exit(EXIT_FAILURE);
      	}
      
      	munmap(buf, expected);
      	close(fd);
      	free(vec);
      	exit(EXIT_SUCCESS);
      }
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Tested-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      67d46b29
    • Cliff Wickman's avatar
      mm: export mmu notifier invalidates · fa794199
      Cliff Wickman authored
      We at SGI have a need to address some very high physical address ranges
      with our GRU (global reference unit), sometimes across partitioned
      machine boundaries and sometimes with larger addresses than the cpu
      supports.  We do this with the aid of our own 'extended vma' module
      which mimics the vma.  When something (either unmap or exit) frees an
      'extended vma' we use the mmu notifiers to clean them up.
      
      We had been able to mimic the functions
      __mmu_notifier_invalidate_range_start() and
      __mmu_notifier_invalidate_range_end() by locking the per-mm lock and
      walking the per-mm notifier list.  But with the change to a global srcu
      lock (static in mmu_notifier.c) we can no longer do that.  Our module has
      no access to that lock.
      
      So we request that these two functions be exported.
      Signed-off-by: default avatarCliff Wickman <cpw@sgi.com>
      Acked-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa794199
    • Michel Lespinasse's avatar
      mm: accelerate mm_populate() treatment of THP pages · 240aadee
      Michel Lespinasse authored
      This change adds a follow_page_mask function which is equivalent to
      follow_page, but with an extra page_mask argument.
      
      follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters
      a THP page, and to 0 in other cases.
      
      __get_user_pages() makes use of this in order to accelerate populating
      THP ranges - that is, when both the pages and vmas arrays are NULL, we
      don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and
      we also avoid taking mm->page_table_lock that many times).
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      240aadee
    • Michel Lespinasse's avatar
      mm: use long type for page counts in mm_populate() and get_user_pages() · 28a35716
      Michel Lespinasse authored
      Use long type for page counts in mm_populate() so as to avoid integer
      overflow when running the following test code:
      
      int main(void) {
        void *p = mmap(NULL, 0x100000000000, PROT_READ,
                       MAP_PRIVATE | MAP_ANON, -1, 0);
        printf("p: %p\n", p);
        mlockall(MCL_CURRENT);
        printf("done\n");
        return 0;
      }
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      28a35716
    • Zhang Yanfei's avatar
      mm: accurately document nr_free_*_pages functions with code comments · e0fb5815
      Zhang Yanfei authored
      nr_free_zone_pages(), nr_free_buffer_pages() and nr_free_pagecache_pages()
      are horribly badly named, so accurately document them with code comments
      in case of the misuse of them.
      
      [akpm@linux-foundation.org: tweak comments]
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0fb5815