- 06 Mar, 2013 40 commits
-
-
Xiaowei.Hu authored
commit 309a85b6 upstream. ocfs2_block_group_alloc_discontig() disables chain relink by setting ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple cluster groups. It doesn't keep the credits for all chain relink,but ocfs2_claim_suballoc_bits overrides this in this call trace: ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()-> __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits() ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call ocfs2_search_chain() one time and disable it again, and then we run out of credits. Fix is to allow relink by default and disable it in ocfs2_block_group_alloc_discontig. Without this patch, End-users will run into a crash due to run out of credits, backtrace like this: RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>] jbd2_journal_dirty_metadata+0x164/0x170 [jbd2] RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0 RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8 RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0 R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800 FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0) Call Trace: ocfs2_journal_dirty+0x2f/0x70 [ocfs2] ocfs2_relink_block_group+0x111/0x480 [ocfs2] ocfs2_search_chain+0x455/0x9a0 [ocfs2] ... Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jeff Liu authored
commit 32918dd9 upstream. We need to re-initialize the security for a new reflinked inode with its parent dirs if it isn't specified to be preserved for ocfs2_reflink(). However, the code logic is broken at ocfs2_init_security_and_acl() although ocfs2_init_security_get() succeed. As a result, ocfs2_acl_init() does not involked and therefore the default ACL of parent dir was missing on the new inode. Note this was introduced by 9d8f13ba ("security: new security_inode_init_security API adds function callback") To reproduce: set default ACL for the parent dir(ocfs2 in this case): $ setfacl -m default:user:jeff:rwx ../ocfs2/ $ getfacl ../ocfs2/ # file: ../ocfs2/ # owner: jeff # group: jeff user::rwx group::r-x other::r-x default:user::rwx default:user:jeff:rwx default:group::r-x default:mask::rwx default:other::r-x $ touch a $ getfacl a # file: a # owner: jeff # group: jeff user::rw- group::rw- other::r-- Before patching, create reflink file b from a, the user default ACL entry(user:jeff:rwx)was missing: $ ./ocfs2_reflink a b $ getfacl b # file: b # owner: jeff # group: jeff user::rw- group::rw- other::r-- In this case, the end user can also observed an error message at syslog: (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0 After applying this patch, create reflink file c from a: $ ./ocfs2_reflink a c $ getfacl c # file: c # owner: jeff # group: jeff user::rw- user:jeff:rwx #effective:rw- group::r-x #effective:r-- mask::rw- other::r-- Test program: /* Usage: reflink <source> <dest> */ #include <stdio.h> #include <stdint.h> #include <stdbool.h> #include <string.h> #include <errno.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> static int reflink_file(char const *src_name, char const *dst_name, bool preserve_attrs) { int fd; #ifndef REFLINK_ATTR_NONE # define REFLINK_ATTR_NONE 0 #endif #ifndef REFLINK_ATTR_PRESERVE # define REFLINK_ATTR_PRESERVE 1 #endif #ifndef OCFS2_IOC_REFLINK struct reflink_arguments { uint64_t old_path; uint64_t new_path; uint64_t preserve; }; # define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments) #endif struct reflink_arguments args = { .old_path = (unsigned long) src_name, .new_path = (unsigned long) dst_name, .preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE : REFLINK_ATTR_NONE, }; fd = open(src_name, O_RDONLY); if (fd < 0) { fprintf(stderr, "Failed to open %s: %s\n", src_name, strerror(errno)); return -1; } if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) { fprintf(stderr, "Failed to reflink %s to %s: %s\n", src_name, dst_name, strerror(errno)); return -1; } } int main(int argc, char *argv[]) { if (argc != 3) { fprintf(stdout, "Usage: %s source dest\n", argv[0]); return 1; } return reflink_file(argv[1], argv[2], 0); } Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Tao Ma <boyu.mt@taobao.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 7c100936 upstream. On non-BIOS platforms it is possible that the BIOS data area contains garbage instead of being zeroed or something equivalent (firmware people: we are talking of 1.5K here, so please do the sane thing.) We need on the order of 20-30K of low memory in order to boot, which may grow up to < 64K in the future. We probably want to avoid the lowest of the low memory. At the same time, it seems extremely unlikely that a legitimate EBDA would ever reach down to the 128K (which would require it to be over half a megabyte in size.) Thus, pick 128K as the cutoff for "this is insane, ignore." We may still end up reserving a bunch of extra memory on the low megabyte, but that is not really a major issue these days. In the worst case we lose 512K of RAM. This code really should be merged with trim_bios_range() in arch/x86/kernel/setup.c, but that is a bigger patch for a later merge window. Reported-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.orgSigned-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 9b171e0c upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Konrad Rzeszutek Wilk authored
commit a2fd6419 upstream. Both the PowerPC hypervisor and Xen hypervisor can utilize the hvc driver. Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.comSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Konrad Rzeszutek Wilk authored
commit 2482a92e upstream. The earlyprintk for Xen PV guests utilizes a simple hypercall (console_io) to provide output to Xen emergency console. Note that the Xen hypervisor should be booted with 'loglevel=all' to output said information. Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Shawn Guo authored
commit ef4d0888 upstream. When commit 95a2482a (mmc: sdhci-esdhc-imx: add basic imx6q usdhc support) works around host version issue on imx6q, it gets the register address fixup "reg ^= 2" lost for imx25/35/51/53 esdhc. Thus, the controller version on these SoCs is wrongly identified as v1 while it's actually v2. Add the address fixup back and take a different approach to correct imx6q host version, so that the host version read gets back to work for all SoCs. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Greg Thelen authored
commit 5f00110f upstream. The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = *sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mel Gorman authored
commit 67d46b29 upstream. Rob van der Heij reported the following (paraphrased) on private mail. The scenario is that I want to avoid backups to fill up the page cache and purge stuff that is more likely to be used again (this is with s390x Linux on z/VM, so I don't give it as much memory that we don't care anymore). So I have something with LD_PRELOAD that intercepts the close() call (from tar, in this case) and issues a posix_fadvise() just before closing the file. This mostly works, except for small files (less than 14 pages) that remains in page cache after the face. Unfortunately Rob has not had a chance to test this exact patch but the test program below should be reproducing the problem he described. The issue is the per-cpu pagevecs for LRU additions. If the pages are added by one CPU but fadvise() is called on another then the pages remain resident as the invalidate_mapping_pages() only drains the local pagevecs via its call to pagevec_release(). The user-visible effect is that a program that uses fadvise() properly is not obeyed. A possible fix for this is to put the necessary smarts into invalidate_mapping_pages() to globally drain the LRU pagevecs if a pagevec page could not be discarded. The downside with this is that an inode cache shrink would send a global IPI and memory pressure potentially causing global IPI storms is very undesirable. Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to check if invalidate_mapping_pages() discarded all the requested pages. If a subset of pages are discarded it drains the LRU pagevecs and tries again. If the second attempt fails, it assumes it is due to the pages being mapped, locked or dirty and does not care. With this patch, an application using fadvise() correctly will be obeyed but there is a downside that a malicious application can force the kernel to send global IPIs and increase overhead. If accepted, I would like this to be considered as a -stable candidate. It's not an urgent issue but it's a system call that is not working as advertised which is weak. The following test program demonstrates the problem. It should never report that pages are still resident but will without this patch. It assumes that CPU 0 and 1 exist. int main() { int fd; int pagesize = getpagesize(); ssize_t written = 0, expected; char *buf; unsigned char *vec; int resident, i; cpu_set_t set; /* Prepare a buffer for writing */ expected = FILESIZE_PAGES * pagesize; buf = malloc(expected + 1); if (buf == NULL) { printf("ENOMEM\n"); exit(EXIT_FAILURE); } buf[expected] = 0; memset(buf, 'a', expected); /* Prepare the mincore vec */ vec = malloc(FILESIZE_PAGES); if (vec == NULL) { printf("ENOMEM\n"); exit(EXIT_FAILURE); } /* Bind ourselves to CPU 0 */ CPU_ZERO(&set); CPU_SET(0, &set); if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) { perror("sched_setaffinity"); exit(EXIT_FAILURE); } /* open file, unlink and write buffer */ fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } unlink("fadvise-test-file"); while (written < expected) { ssize_t this_write; this_write = write(fd, buf + written, expected - written); if (this_write == -1) { perror("write"); exit(EXIT_FAILURE); } written += this_write; } free(buf); /* * Force ourselves to another CPU. If fadvise only flushes the local * CPUs pagevecs then the fadvise will fail to discard all file pages */ CPU_ZERO(&set); CPU_SET(1, &set); if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) { perror("sched_setaffinity"); exit(EXIT_FAILURE); } /* sync and fadvise to discard the page cache */ fsync(fd); if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) { perror("posix_fadvise"); exit(EXIT_FAILURE); } /* map the file and use mincore to see which parts of it are resident */ buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0); if (buf == NULL) { perror("mmap"); exit(EXIT_FAILURE); } if (mincore(buf, expected, vec) == -1) { perror("mincore"); exit(EXIT_FAILURE); } /* Check residency */ for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) { if (vec[i]) resident++; } if (resident != 0) { printf("Nr unexpected pages resident: %d\n", resident); exit(EXIT_FAILURE); } munmap(buf, expected); close(fd); free(vec); exit(EXIT_SUCCESS); } Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Rob van der Heij <rvdheij@gmail.com> Tested-by: Rob van der Heij <rvdheij@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Robin Holt authored
commit 751efd86 upstream. There is a race condition between mmu_notifier_unregister() and __mmu_notifier_release(). Assume two tasks, one calling mmu_notifier_unregister() as a result of a filp_close() ->flush() callout (task A), and the other calling mmu_notifier_release() from an mmput() (task B). A B t1 srcu_read_lock() t2 if (!hlist_unhashed()) t3 srcu_read_unlock() t4 srcu_read_lock() t5 hlist_del_init_rcu() t6 synchronize_srcu() t7 srcu_read_unlock() t8 hlist_del_rcu() <--- NULL pointer deref. Additionally, the list traversal in __mmu_notifier_release() is not protected by the by the mmu_notifier_mm->hlist_lock which can result in callouts to the ->release() notifier from both mmu_notifier_unregister() and __mmu_notifier_release(). -stable suggestions: The stable trees prior to 3.7.y need commits 21a92735 and 70400303 cherry-picked in that order prior to cherry-picking this commit. The 3.7.y tree already has those two commits. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Avi Kivity <avi@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Sagi Grimberg <sagig@mellanox.co.il> Cc: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andrea Arcangeli authored
commit 70400303 upstream. The variable must be static especially given the variable name. s/RCU/SRCU/ over a few comments. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sagi Grimberg authored
commit 21a92735 upstream. With an RCU based mmu_notifier implementation, any callout to mmu_notifier_invalidate_range_{start,end}() or mmu_notifier_invalidate_page() would not be allowed to call schedule() as that could potentially allow a modification to the mmu_notifier structure while it is currently being used. Since srcu allocs 4 machine words per instance per cpu, we may end up with memory exhaustion if we use srcu per mm. So all mms share a global srcu. Note that during large mmu_notifier activity exit & unregister paths might hang for longer periods, but it is tolerable for current mmu_notifier clients. Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Haggai Eran <haggaie@mellanox.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Phileas Fogg authored
commit 8520e443 upstream. Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 54c807e7 upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. CC: Christoph Hellwig <hch@infradead.org> CC: Jens Axboe <axboe@kernel.dk> CC: Jeff Moyer <jmoyer@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Lukas Czerner authored
commit 304e220f upstream. ext4_has_free_clusters() should tell us whether there is enough free clusters to allocate, however number of free clusters in the file system is converted to blocks using EXT4_C2B() which is not only wrong use of the macro (we should have used EXT4_NUM_B2C) but it's also completely wrong concept since everything else is in cluster units. Moreover when calculating number of root clusters we should be using macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be off by one. However r_blocks_count should always be a multiple of the cluster ratio so doing a plain bit shift should be enough here. We avoid using EXT4_B2C() because it's confusing. As a result of the first problem number of free clusters is much bigger than it should have been and ext4_has_free_clusters() would return 1 even if there is really not enough free clusters available. Fix this by removing the EXT4_C2B() conversion of free clusters and using bit shift when calculating number of root clusters. This bug affects number of xfstests tests covering file system ENOSPC situation handling. With this patch most of the ENOSPC problems with bigalloc file system disappear, especially the errors caused by delayed allocation not having enough space when the actual allocation is finally requested. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Lars-Peter Clausen authored
commit 5eb02c01 upstream. Clearing the NSTBY bit in the control register also automatically clears the BLEN bit. So we need to make sure to set it again during resume, otherwise the backlight will stay off. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Junxiao Bi authored
commit 3278bb74 upstream. If lockres refresh failed, the super lock will never be released which will cause some processes on other cluster nodes hung forever. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
MITSUNARI Shigeo authored
commit 7630b661 upstream. We found that bdev->bd_invalidated was left set once revalidate_disk() is called, which results in page cache flush every time that device is open. Specifically, we found this problem in MD block device. Once we resize a MD device, mdadm --monitor periodically flush all page cache for that device every 60 or 1000 seconds when it opens the device. This bug lies since at least 3.2.0 till the latest kernel(3.6.2). Patch is attached. The following steps will reproduce the problem. 1. prepair a block device (eg /dev/sdb). 2. create two partitions: sudo parted /dev/sdb mklabel gpt mkpart primary 0% 50% mkpart primary 50% 100% 3. create a md device. sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2 4. create file system and mount it sudo mkfs.ext3 /dev/md/hoge sudo mkdir /mnt/test sudo mount /dev/md/hoge /mnt/test 5. try to resize the device sudo mdadm -G /dev/md/hoge --size=max 6. create a file to fill file cache. sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10 and verify the current status of file by free command. 7. mdadm monitor will open the md device every 1000 seconds and you will find all file cache on the device are cleared. The timing can be reduced by the following steps. a) kill mdadm and restart it with --delay option /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog or open the md device directly. sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1 Signed-off-by: MITSUNARI Shigeo <herumi@nifty.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jim Somerville authored
commit 676a0675 upstream. Running the command: inotifywait -e unmount /mnt/disk immediately aborts with a -EINVAL return code. This is however a valid parameter. This abort occurs only if unmount is the sole event parameter. If other event parameters are supplied, then the unmount event wait will work. The problem was introduced by commit 44b350fc ("inotify: Fix mask checks"). In that commit, it states: The mask checks in inotify_update_existing_watch() and inotify_new_watch() are useless because inotify_arg_to_mask() sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway. But instead of removing the useless checks, it did this: mask = inotify_arg_to_mask(arg); - if (unlikely(!mask)) + if (unlikely(!(mask & IN_ALL_EVENTS))) return -EINVAL; The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the check should be: if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT)))) But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask anyway, so the check is always going to pass and thus should simply be removed. Also note that inotify_arg_to_mask completely controls what mask bits get set from arg, there's no way for invalid bits to get enabled there. Lets fix it by simply removing the useless broken checks. Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Tejun Heo authored
commit e182bb38 upstream. When idr_find() was fed a negative ID, it used to look up the ID ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers a WARN_ON_ONCE(). __lock_timer() feeds timer_id from userland directly to idr_find() without sanitizing it which can trigger the above malfunctions. Add a range check on @timer_id before invoking idr_find() in __lock_timer(). While timer_t is defined as int by all archs at the moment, Andrew worries that it may be defined as a larger type later on. Make the test cover larger integers too so that it at least is guaranteed to not return the wrong timer. Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional precaution while moving away from ignoring MSB. Once it's gone we can remove the guard as long as timer_t isn't larger than int. Signed-off-by: Tejun Heo <tj@kernel.org>nnn Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Pawel Moll authored
commit b531f81b upstream. Commit 99fc8645 "ALSA: usb-mixer: parse descriptors with structs" introduced a set of useful parsers for descriptors. Unfortunately the parses for the Processing Unit Descriptor came with a very subtle bug... Functions uac_processing_unit_iProcessing() and uac_processing_unit_specific() were indexing the baSourceID array forgetting the fields before the iProcessing and process-specific descriptors. The problem was observed with Sound Blaster Extigy mixer, where nNrModes in Up/Down-mix Processing Unit Descriptor was accessed at offset 10 of the descriptor (value 0) instead of offset 15 (value 7). In result the resulting control had interesting limit values: Simple mixer control 'Channel Routing Mode Select',0 Capabilities: volume volume-joined penum Playback channels: Mono Capture channels: Mono Limits: 0 - -1 Mono: -1 [100%] Fixed by starting from the bmControls, which was calculated correctly, instead of baSourceID. Now the mentioned control is fine: Simple mixer control 'Channel Routing Mode Select',0 Capabilities: volume volume-joined penum Playback channels: Mono Capture channels: Mono Limits: 0 - 6 Mono: 0 [0%] Signed-off-by: Pawel Moll <mail@pawelmoll.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Matt Fleming authored
commit fb834c7a upstream. commit 1de63d60 ("efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter") attempted to make "noefi" true to its documentation and disable EFI runtime services to prevent the bricking bug described in commit e0094244 ("samsung-laptop: Disable on EFI hardware"). However, it's not possible to clear EFI_RUNTIME_SERVICES from an early param function because EFI_RUNTIME_SERVICES is set in efi_init() *after* parse_early_param(). This resulted in "noefi" effectively becoming a no-op and no longer providing users with a way to disable EFI, which is bad for those users that have buggy machines. Reported-by: Walt Nelson Jr <walt0924@gmail.com> Cc: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1361392572-25657-1-git-send-email-matt@console-pimps.orgSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: efi_runtime_init() is not a separate function, so put a whole set of statements in an if (!disable_runtime) block] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Stefan Bader authored
commit 76eaca03 upstream. There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Wei Liu authored
commit e7e44e44 upstream. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Vetter authored
commit 210561ff upstream. We already have the quirk entry for the mobile platform, but also reports on some desktop versions. So be paranoid and set it everywhere. References: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg33138.html Cc: David Woodhouse <dwmw2@infradead.org> Cc: "Sankaran, Rajesh" <rajesh.sankaran@intel.com> Reported-and-tested-by: Mihai Moldovan <ionic@ionic.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Patrik Jakobsson authored
commit 4f7dfb67 upstream. The Intel PRM says the M1 and M2 divisors must be in the range of 10-20 and 5-9. Since we do all calculations based on them being register values (which are subtracted by 2) we need to specify them accordingly. Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56359Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jani Nikula authored
commit fdfa175d upstream. Amending commit 4518f611 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jan 23 16:16:35 2013 +0100 drm/i915: dump UTS_RELEASE into the error_state Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mika Kuoppala authored
commit 24a1f16d upstream. If encoder is switched off by BIOS, but the panel fitter is left on, we never try to turn off the panel fitter and leave it still attached to the pipe - which can cause blurry output elsewhere. Based on work by Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58867Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Tested-by: Andreas Sturmlechner <andreas.sturmlechner@gmail.com> [danvet: Remove the redundant HAS_PCH_SPLIT check and add a tiny comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Paulo Zanoni authored
commit 196e077d upstream. If bit 0 of the features byte (0x18) is set to 0, then, according to the EDID spec, "the display is non-continuous frequency (multi-mode) and is only specified to accept the video timing formats that are listed in Base EDID and certain Extension Blocks". For more information, please see the EDID spec, check the notes of the table that explains the "Feature Support" byte (18h) and also the notes on the tables of the section that explains "Display Range Limits & Additional Timing Description Definition (tag #FDh)". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45729Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Beulich authored
commit 9d092603 upstream. "be->mode" is obtained from xenbus_read(), which does a kmalloc() for the message body. The short string is never released, so do it along with freeing "be" itself, and make sure the string isn't kept when backend_changed() doesn't complete successfully (which made it desirable to slightly re-structure that function, so that the error cleanup can be done in one place). Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David Henningsson authored
commit bbfd8a19 upstream. Currently, eld_valid is never set to false, except at kernel module load time. This patch makes sure that eld is no longer valid when the cable is (hot-)unplugged. Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Trond Myklebust authored
commit 666b3d80 upstream. Currently, nlmclnt_lock will break out of the for(;;) loop when the reclaimer wakes up the blocking lock thread by setting nlm_lck_denied_grace_period. This causes the lock request to fail with an ENOLCK error. The intention was always to ensure that we resend the lock request after the grace period has expired. Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Larry Finger authored
commit ccae0e50 upstream. Bastian Bittorf reported that some of the silent freezes on a Linksys WRT54G were due to overflow of the RX DMA ring buffer, which was created with 64 slots. That finding reminded me that I was seeing similar crashed on a netbook, which also has a relatively slow processor. After increasing the number of slots to 128, runs on the netbook that previously failed now worked; however, I found that 109 slots had been used in one test. For that reason, the number of slots is being increased to 256. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Bastian Bittorf <bittorf@bluebottle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 8c189ea6 upstream. Commit: c1bf08ac "ftrace: Be first to run code modification on modules" changed ftrace module notifier's priority to INT_MAX in order to process the ftrace nops before anything else could touch them (namely kprobes). This was the correct thing to do. Unfortunately, the ftrace module notifier also contains the ftrace clean up code. As opposed to the set up code, this code should be run *after* all the module notifiers have run in case a module is doing correct clean-up and unregisters its ftrace hooks. Basically, ftrace needs to do clean up on module removal, as it needs to know about code being removed so that it doesn't try to modify that code. But after it removes the module from its records, if a ftrace user tries to remove a probe, that removal will fail due as the record of that code segment no longer exists. Nothing really bad happens if the probe removal is called after ftrace did the clean up, but the ftrace removal function will return an error. Correct code (such as kprobes) will produce a WARN_ON() if it fails to remove the probe. As people get annoyed by frivolous warnings, it's best to do the ftrace clean up after everything else. By splitting the ftrace_module_notifier into two notifiers, one that does the module load setup that is run at high priority, and the other that is called for module clean up that is run at low priority, the problem is solved. Reported-by: Frank Ch. Eigler <fche@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Nicholas Bellinger authored
commit fbbf8555 upstream. This patch adds missing bounds checking for the configfs provided mapped_lun value during target_fabric_make_mappedlun() setup ahead of se_lun_acl initialization. This addresses a potential OOPs when using a mapped_lun value that exceeds the hardcoded TRANSPORT_MAX_LUNS_PER_TPG-1 value within se_node_acl->device_list[]. Reported-by: Jan Engelhardt <jengelh@inai.de> Cc: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Nicholas Bellinger authored
commit fcf29481 upstream. This patch fixes a bug in core_tpg_check_initiator_node_acl() -> core_tpg_get_initiator_node_acl() where a dynamically created se_node_acl generated during session login would be skipped during subsequent lookup due to the '!acl->dynamic_node_acl' check, causing a new se_node_acl to be created with a duplicate ->initiatorname. This would occur when a fabric endpoint was configured with TFO->tpg_check_demo_mode()=1 + TPF->tpg_check_demo_mode_cache()=1 preventing the release of an existing se_node_acl during se_session shutdown. Also, drop the unnecessary usage of core_tpg_get_initiator_node_acl() within core_dev_init_initiator_node_lun_acl() that originally required the extra '!acl->dynamic_node_acl' check, and just pass the configfs provided se_node_acl pointer instead. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [bwh: Backported to 3.2: adjust context, filename of header] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jussi Kivilinna authored
commit bc6b8923 upstream. rtlwifi allocates both setup_packet and data buffer of control message urb, using shared kmalloc in _usbctrl_vendorreq_async_write. Structure used for allocating is: struct { u8 data[254]; struct usb_ctrlrequest dr; }; Because 'struct usb_ctrlrequest' is __packed, setup packet is unaligned and DMA mapping of both 'data' and 'dr' confuses ARM/sunxi, leading to memory corruptions and freezes. Patch changes setup packet to be allocated separately. [v2]: - Use WARN_ON_ONCE instead of WARN_ON Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Linus Torvalds authored
commit 7c45512d upstream. Commit c060f943 ("mm: use aligned zone start for pfn_to_bitidx calculation") fixed out calculation of the index into the pageblock bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages. However, the _allocation_ of that bitmap had never taken this alignment requirement into accout, so depending on the exact size and alignment of the zone, the use of that index could then access past the allocation, resulting in some very subtle memory corruption. This was reported (and bisected) by Ingo Molnar: one of his random config builds would hang with certain very specific kernel command line options. In the meantime, commit c060f943 has been marked for stable, so this fix needs to be back-ported to the stable kernels that backported the commit to use the right alignment. Bisected-and-tested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Lukas Czerner authored
commit 1231b3a1 upstream. Currently when new xattr block is created or released we we would call dquot_free_block() or dquot_alloc_block() respectively, among the else decrementing or incrementing the number of blocks assigned to the inode by one block. This however does not work for bigalloc file system because we always allocate/free the whole cluster so we have to count with that in dquot_free_block() and dquot_alloc_block() as well. Use the clusters-to-blocks conversion EXT4_C2B() when passing number of blocks to the dquot_alloc/free functions to fix the problem. The problem has been revealed by xfstests #117 (and possibly others). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Li Zefan authored
commit 63f43f55 upstream. rename() will change dentry->d_name. The result of this race can be worse than seeing partially rewritten name, but we might access a stale pointer because rename() will re-allocate memory to hold a longer name. It's safe in the protection of dentry->d_lock. v2: check NULL dentry before acquiring dentry lock. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-