1. 17 Jun, 2013 5 commits
  2. 05 Jun, 2013 1 commit
    • Ming Lei's avatar
      ARM: 7746/1: mm: lazy cache flushing on non-mapped pages · 81f28946
      Ming Lei authored
      Currently flush_dcache_page() thinks pages as non-mapped if
      mapping_mapped(mapping) return false. This approach is very
      coase:
      	- mmap on part of file may cause all pages backed on
      	the file being thought as mmaped
      
      	- file-backed pages aren't mapped into user space actually
      	if the memory mmaped on the file isn't accessed
      
      This patch uses page_mapped() to decide if the page has been
      mapped.
      
      From the attached test code, I find there is much performance
      improvement(>25%) when accessing page caches via read under this
      situations, so memcpy benefits a lot from not flushing cache
      under this situation.
      
      No.   read time without the patch	No. read time with the patch
      ================================================================
      No. 0, time  22615636 us		No. 0, time  22014717 us
      No. 1, time  4387851 us 		No. 1, time  3113184 us
      No. 2, time  4276535 us 		No. 2, time  3005244 us
      No. 3, time  42598219 us 		No. 3, time  3001565 us
      No. 4, time  4263811 us 		No. 4, time  3002748 us
      No. 5, time  4258486 us 		No. 5, time  3004104 us
      No. 6, time  4253009 us 		No. 6, time  3002188 us
      No. 7, time  4262809 us 		No. 7, time  2998196 us
      No. 8, time  4264525 us 		No. 8, time  3007255 us
      No. 9, time  4267795 us 		No. 9, time  3005094 us
      
      1), No.0. is to read the file from storage device, and others are
      to read the file from page caches basically.
      2), file size is 512M, and is on ext4 over usb mass storage.
      3), the test is done on Pandaboard.
      
      unsigned int  sum = 0;
      unsigned long sum_val = 0;
      
      static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
      {
      	return (tv2->tv_sec - tv1->tv_sec) * 1000000 +
      		(tv2->tv_usec - tv1->tv_usec);
      }
      
      int main(int argc, char *argv[])
      {
      	char *mbuf, fbuf;
      	int fd;
      	int i;
      	unsigned long page_size, size;
      	struct stat stat;
      	struct timeval t1, t2;
      	unsigned char *rbuf = malloc(32 * page_size);
      
      	if (!rbuf) {
      		printf("	%sn", "malloc failed");
      		exit(-1);
      	}
      
      	page_size = getpagesize();
      	fd = open(argv[1], O_RDWR);
      	assert(fd >= 0);
      
      	fstat(fd, &stat);
      	size = stat.st_size;
      	printf("%s: file %s, size %lu, page size %lun",
      		argv[0],
      		argv[1], size, page_size);
      
      	gettimeofday(&t1, NULL);
      	mbuf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	if (!mbuf) {
      		printf("	%sn", "mmap failed");
      		exit(-1);
      	}
      
      	for (i = 0 ; i < size ; i += (page_size * 32)) {
      		int rcnt;
      		lseek(fd, i, SEEK_SET);
      		rcnt = read(fd, rbuf, page_size * 32);
      		if (rcnt != page_size * 32) {
      			printf("%s: read faildn", __func__);
      			exit(-1);
      		}
      	}
      	free(rbuf);
      	munmap(mbuf, size);
      	gettimeofday(&t2, NULL);
      	printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));
      
      	close(fd);
      }
      
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      81f28946
  3. 22 May, 2013 2 commits
    • Ming Lei's avatar
      ARM: 7730/1: DMA-mapping: mark all !DMA_TO_DEVICE pages in unmapping as clean · b2a234ed
      Ming Lei authored
      It is common for one sg to include many pages, so mark all these
      pages as clean to avoid unnecessary flushing on them in
      set_pte_at() or update_mmu_cache().
      
      The patch might improve loading performance of applciation code a bit.
      
      On the below test code to read file(~1GByte size) from usb mass storage
      disk to buffer created with mmap(PROT_READ | PROT_EXEC) on
      Pandaboard, average ~1% improvement can be observed with the patch on
      10 times test.
      
      unsigned int sum = 0;
      static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
      {
      	return (tv2->tv_sec - tv1->tv_sec) * 1000000 + (tv2->tv_usec - tv1->tv_usec);
      }
      int main(int argc, char *argv[])
      {
      	char *mbuffer;
      	int fd;
      	int i;
      	unsigned long page_size, size;
      	struct stat stat;
      	struct timeval t1, t2;
      
      	page_size = getpagesize();
      	fd = open(argv[1], O_RDONLY);
      	assert(fd >= 0);
      
      	fstat(fd, &stat);
      	size = stat.st_size;
      	printf("%s: file %s, file size %lu, page size %lun", argv[0],
      	        read_filename, size, page_size);
      
      	gettimeofday(&t1, NULL);
      	mbuffer = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
      	for (i = 0 ; i < size ; i += page_size)
      	        sum += mbuffer[i];
      	munmap(mbuffer, page_size);
      	gettimeofday(&t2, NULL);
      	printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));
      
      	close(fd);
      }
      Acked-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      b2a234ed
    • Laura Abbott's avatar
      ARM: 7728/1: mm: Use phys_addr_t properly for ioremap functions · 9b97173e
      Laura Abbott authored
      Several of the ioremap functions use unsigned long in places
      resulting in truncation if physical addresses greater than
      4G are passed in. Change the types of the functions and the
      callers accordingly.
      
      Cc: Krzysztof Halasa <khc@pm.waw.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      9b97173e
  4. 15 May, 2013 4 commits
    • Christian Daudt's avatar
      ARM: 7716/1: bcm281xx: Add L2 support for Rev A2 chips · 3b656fed
      Christian Daudt authored
      Rev A2 SoCs have an unorthodox memory re-mapping and this needs
      to be reflected in the cache operations.
      This patch adds new outer cache functions for the l2x0 driver
      to support this SoC revision. It also adds a new compatible
      value for the cache to enable this functionality.
      
      Updates from V1:
      - remove section 1 altogether and note that in comments
      - simplify section selection caused by section 1 removal
      - BUG_ON just in case section 1 shows up
      Signed-off-by: default avatarChristian Daudt <csd@broadcom.com>
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      3b656fed
    • Gregory CLEMENT's avatar
      ARM: 7722/1: zImage: Convert 32bits memory size and address from ATAG to 64bits DTB · faefd550
      Gregory CLEMENT authored
      When CONFIG_ARM_APPENDED_DTB is selected, if the bootloader provides
      an ATAG_MEM it replaces the memory size and the memory address in the
      memory node of the device tree. In the case of a system which can
      handle more than 4GB, the memory node cell size is 4: each data
      (memory size and memory address) are 64 bits and then use 2 cells.
      
      The current code in atags_to_fdt.c made the assumption of a cell size
      of 2 (one cell for the memory size and one cell for the memory
      address), this leads to an improper write of the data and ends with a
      boot hang.
      
      This patch writes the memory size and the memory address on the memory
      node in the device tree depending of the size of the memory node (32
      bits or 64 bits).
      
      It has been tested in the 2 cases:
      - with a dtb using skeleton.dtsi
      - and with a dtb using skeleton64.dtsi
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      faefd550
    • Arnd Bergmann's avatar
      ARM: 7705/1: use optimized do_div only for EABI · 049f3e84
      Arnd Bergmann authored
      In OABI configurations, some uses of the do_div function
      cause gcc to run out of registers. To work around that,
      we can force the use of the out-of-line version for
      configurations that build a OABI kernel.
      
      Without this patch, building netx_defconfig results in:
      
      net/core/pktgen.c: In function 'pktgen_if_show':
      net/core/pktgen.c:682:2775: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
      net/core/pktgen.c:682:3153: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
      net/core/pktgen.c:682:2775: error: 'asm' operand has impossible constraints
      net/core/pktgen.c:682:3153: error: 'asm' operand has impossible constraints
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      049f3e84
    • Ming Lei's avatar
      ARM: 7669/1: keep __my_cpu_offset consistent with generic one · 9394c1c6
      Ming Lei authored
      Commit 14318efb(ARM: 7587/1: implement optimized percpu variable access)
      introduces arm's __my_cpu_offset to optimize percpu vaiable access,
      which really works well on hackbench, but will cause __my_cpu_offset
      to return garbage value before it is initialized in cpu_init() called
      by setup_arch, so accessing percpu variable before setup_arch may cause
      kernel hang. But generic __my_cpu_offset always returns zero before
      percpu area is brought up, and won't hang kernel.
      
      So the patch tries to clear __my_cpu_offset on boot CPU early
      to avoid boot hang.
      
      At least now percpu variable is accessed by lockdep before
      setup_arch(), and enabling CONFIG_LOCK_STAT or CONFIG_DEBUG_LOCKDEP
      can trigger kernel hang.
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      9394c1c6
  5. 12 May, 2013 2 commits
    • Linus Torvalds's avatar
      Linux 3.10-rc1 · f722406f
      Linus Torvalds authored
      f722406f
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 26b840ae
      Linus Torvalds authored
      Pull tracing/kprobes update from Steven Rostedt:
       "The majority of these changes are from Masami Hiramatsu bringing
        kprobes up to par with the latest changes to ftrace (multi buffering
        and the new function probes).
      
        He also discovered and fixed some bugs in doing so.  When pulling in
        his patches, I also found a few minor bugs as well and fixed them.
      
        This also includes a compile fix for some archs that select the ring
        buffer but not tracing.
      
        I based this off of the last patch you took from me that fixed the
        merge conflict error, as that was the commit that had all the changes
        I needed for this set of changes."
      
      * tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/kprobes: Support soft-mode disabling
        tracing/kprobes: Support ftrace_event_file base multibuffer
        tracing/kprobes: Pass trace_probe directly from dispatcher
        tracing/kprobes: Increment probe hit-count even if it is used by perf
        tracing/kprobes: Use bool for retprobe checker
        ftrace: Fix function probe when more than one probe is added
        ftrace: Fix the output of enabled_functions debug file
        ftrace: Fix locking in register_ftrace_function_probe()
        tracing: Add helper function trace_create_new_event() to remove duplicate code
        tracing: Modify soft-mode only if there's no other referrer
        tracing: Indicate enabled soft-mode in enable file
        tracing/kprobes: Fix to increment return event probe hit-count
        ftrace: Cleanup regex_lock and ftrace_lock around hash updating
        ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
        ftrace: Have ftrace_regex_write() return either read or error
        tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
        tracing: Don't succeed if event_enable_func did not register anything
        ring-buffer: Select IRQ_WORK
      26b840ae
  6. 11 May, 2013 4 commits
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.10-rc0-tag-two' of... · 607eeb0b
      Linus Torvalds authored
      Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
       - More fixes in the vCPU PVHVM hotplug path.
       - Add more documentation.
       - Fix various ARM related issues in the Xen generic drivers.
       - Updates in the xen-pciback driver per Bjorn's updates.
       - Mask the x2APIC feature for PV guests.
      
      * tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xen/pci: Used cached MSI-X capability offset
        xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
        xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
        xen: mask x2APIC feature in PV
        xen: SWIOTLB is only used on x86
        xen/spinlock: Fix check from greater than to be also be greater or equal to.
        xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
        xen/vcpu: Document the xen_vcpu_info and xen_vcpu
        xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
      607eeb0b
    • Linus Torvalds's avatar
      Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4c444501
      Linus Torvalds authored
      Pull second SCSI update from James "Jaj B" Bottomley:
       "This is the final round of SCSI patches for the merge window.  It
        consists mostly of driver updates (bnx2fc, ibmfc, fnic, lpfc,
        be2iscsi, pm80xx, qla4x and ipr).
      
        There's also the power management updates that complete the patches in
        Jens' tree, an iscsi refcounting problem fix from the last pull, some
        dif handling in scsi_debug fixes, a few nice code cleanups and an
        error handling busy bug fix."
      
      * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (92 commits)
        [SCSI] qla2xxx: Update firmware link in Kconfig file.
        [SCSI] iscsi class, qla4xxx: fix sess/conn refcounting when find fns are used
        [SCSI] sas: unify the pointlessly separated enums sas_dev_type and sas_device_type
        [SCSI] pm80xx: thermal, sas controller config and error handling update
        [SCSI] pm80xx: NCQ error handling changes
        [SCSI] pm80xx: WWN Modification for PM8081/88/89 controllers
        [SCSI] pm80xx: Changed module name and debug messages update
        [SCSI] pm80xx: Firmware flash memory free fix, with addition of new memory region for it
        [SCSI] pm80xx: SPC new firmware changes for device id 0x8081 alone
        [SCSI] pm80xx: Added SPCv/ve specific hardware functionalities and relevant changes in common files
        [SCSI] pm80xx: MSI-X implementation for using 64 interrupts
        [SCSI] pm80xx: Updated common functions common for SPC and SPCv/ve
        [SCSI] pm80xx: Multiple inbound/outbound queue configuration
        [SCSI] pm80xx: Added SPCv/ve specific ids, variables and modify for SPC
        [SCSI] lpfc: fix up Kconfig dependencies
        [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd
        [SCSI] sd: change to auto suspend mode
        [SCSI] sd: use REQ_PM in sd's runtime suspend operation
        [SCSI] qla4xxx: Fix iocb_cnt calculation in qla4xxx_send_mbox_iocb()
        [SCSI] ufs: Correct the expected data transfersize
        ...
      4c444501
    • Linus Torvalds's avatar
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · ac4e0109
      Linus Torvalds authored
      Pull idle update from Len Brown:
       "Add support for new Haswell-ULT CPU idle power states"
      
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
        intel_idle: initial C8, C9, C10 support
        tools/power turbostat: display C8, C9, C10 residency
      ac4e0109
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/eparis/audit · c4cc75c3
      Linus Torvalds authored
      Pull audit changes from Eric Paris:
       "Al used to send pull requests every couple of years but he told me to
        just start pushing them to you directly.
      
        Our touching outside of core audit code is pretty straight forward.  A
        couple of interface changes which hit net/.  A simple argument bug
        calling audit functions in namei.c and the removal of some assembly
        branch prediction code on ppc"
      
      * git://git.infradead.org/users/eparis/audit: (31 commits)
        audit: fix message spacing printing auid
        Revert "audit: move kaudit thread start from auditd registration to kaudit init"
        audit: vfs: fix audit_inode call in O_CREAT case of do_last
        audit: Make testing for a valid loginuid explicit.
        audit: fix event coverage of AUDIT_ANOM_LINK
        audit: use spin_lock in audit_receive_msg to process tty logging
        audit: do not needlessly take a lock in tty_audit_exit
        audit: do not needlessly take a spinlock in copy_signal
        audit: add an option to control logging of passwords with pam_tty_audit
        audit: use spin_lock_irqsave/restore in audit tty code
        helper for some session id stuff
        audit: use a consistent audit helper to log lsm information
        audit: push loginuid and sessionid processing down
        audit: stop pushing loginid, uid, sessionid as arguments
        audit: remove the old depricated kernel interface
        audit: make validity checking generic
        audit: allow checking the type of audit message in the user filter
        audit: fix build break when AUDIT_DEBUG == 2
        audit: remove duplicate export of audit_enabled
        Audit: do not print error when LSMs disabled
        ...
      c4cc75c3
  7. 10 May, 2013 22 commits