- 03 Apr, 2014 19 commits
-
-
Yan, Zheng authored
MDS handles LOOKUPHASH and LOOKUPINO MDS requests in the same way. So __cfh_to_dentry() is redundant. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Yunchuan Wen authored
The object store limit needs to be updated after writing, and this can be done provided the corresponding object has already been initialized. Current object initialization is done asynchrously, which introduce a race if a file is opened, then immediately followed by a writing, the initialization may have not completed, the code will reach the ASSERT in fscache_submit_exclusive_op() to cause kernel bug. Tested-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
-
Yunchuan Wen authored
Synchronize object->store_limit[_l] with new inode->i_size after file writing. Tested-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
-
Yunchuan Wen authored
Add an interface to explicitly synchronize object->store_limit[_l] with inode->i_size Tested-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
-
Sage Weil authored
This is racy--we do not know whather d_parent has changed out from underneath us because i_mutex is not held on the source inode's directory. Also, taking this reference is useless. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
-
Sage Weil authored
Do not assume that r_old_dentry implies that r_old_dentry_dir is also true. Separate out the ref cleanup and make the debugs dump behave when it is NULL. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
-
Sage Weil authored
The fsync(dirfd) only covers namespace operations, not inode updates. We do not need to cover setattr variants or O_TRUNC. Reported-by: Al Viro <viro@xeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
-
Sage Weil authored
This is just old_dir; no reason to abuse the dcache pointers. Reported-by: Al Viro <viro.zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
-
Yan, Zheng authored
If readdir 'frag' is adjusted, readdir 'offset' should be reset. Otherwise some dentries may be lost when readdir and fragmenting directory happen at the some. Another way to fix this issue is let MDS adjust readdir 'frag'. The code that handles MDS reply reset the readdir 'offset' if the readdir reply is different than the requested one. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
-
Yan, Zheng authored
When changing readdir postion, fi->next_offset should be set to 0 if the new postion is not in the first dirfrag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Yan, Zheng authored
Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for directory. For a fragmented directory, offset (frag_t, off) can be larger than inode->i_sb->s_maxbytes. At the very beginning of ceph_dir_llseek(), local variable old_offset is initialized to parameter offset. This doesn't make sense neither. Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
In an effort to reduce fragmentation, prefix every rbd write with a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set to the object size (1 << order). Backwards compatibility is taken care of on the libceph/osd side. "The CEPH_OSD_OP_SETALLOCHINT hint is durable, in that it's enough to do it once. The reason every rbd write is prefixed is that rbd doesn't explicitly create objects and relies on writes creating them implicitly, so there is no place to stick a single hint op into. To get around that we decided to prefix every rbd write with a hint (just like write and setattr ops, hint op will create an object implicitly if it doesn't exist)." Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
In preparation for prefixing rbd writes with an allocation hint introduce a num_ops parameter for rbd_osd_req_create(). The rationale is that not every write request is a write op that needs to be prefixed (e.g. watch op), so the num_ops logic needs to be in the callers. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Our longest osd request now contains 3 ops: copyup+hint+write. Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was hard-coded to 2. Fix it, and switch to rbd_assert while at it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
This is primarily for rbd's benefit and is supposed to combat fragmentation: "... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit." SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Doing rbd_obj_request_put() in rbd_img_request_fill() error paths is not only insufficient, but also triggers an rbd_assert() in rbd_obj_request_destroy(): Assertion failure in rbd_obj_request_destroy() at line 1867: rbd_assert(obj_request->img_request == NULL); rbd_img_obj_request_add() adds obj_requests to the img_request, the opposite is rbd_img_obj_request_del(). Use it. Fixes: http://tracker.ceph.com/issues/7327Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Commit 03507db6 ("rbd: fix buffer size for writes to images with snapshots") moved the call to rbd_img_obj_request_add() up, making the out_partial label bogus. Remove it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
With the addition of erasure coding support in the future, scratch variable-length array in crush_do_rule_ary() is going to grow to at least 200 bytes on average, on top of another 128 bytes consumed by rawosd/osd arrays in the call chain. Replace it with a buffer inside struct osdmap and a mutex. This shouldn't result in any contention, because all osd requests were already serialized by request_mutex at that point; the only unlocked caller was ceph_ioctl_get_dataloc(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
- 31 Mar, 2014 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs fixes from Al Viro: "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and hash modifications into false negatives from __lookup_mnt() (instead of hangs)" On the false negatives from __lookup_mnt(): "The *only* thing we care about is not getting stuck in __lookup_mnt(). If it misses an entry because something in front of it just got moved around, etc, we are fine. We'll notice that mount_lock mismatch and that'll be it" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch mnt_hash to hlist don't bother with propagate_mnt() unless the target is shared keep shadowed vfsmounts together resizable namespace.c hashes
-
Randy Dunlap authored
I am the new kernel tree Documentation maintainer (except for parts that are handled by other people, of course). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Rob Landley <rob@landley.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input updates from Dmitry Torokhov: "Some more updates for the input subsystem. You will get a fix for race in mousedev that has been causing quite a few oopses lately and a small fixup for force feedback support in evdev" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: mousedev - fix race when creating mixed device Input: don't modify the id of ioctl-provided ff effect on upload failure
-
Eric Paris authored
It its possible to configure your PAM stack to refuse login if audit messages (about the login) were unable to be sent. This is common in many distros and thus normal configuration of many containers. The PAM modules determine if audit is enabled/disabled in the kernel based on the return value from sending an audit message on the netlink socket. If userspace gets back ECONNREFUSED it believes audit is disabled in the kernel. If it gets any other error else it refuses to let the login proceed. Just about ever since the introduction of namespaces the kernel audit subsystem has returned EPERM if the task sending a message was not in the init user or pid namespace. So many forms of containers have never worked if audit was enabled in the kernel. BUT if the container was not in net_init then the kernel network code would send ECONNREFUSED (instead of the audit code sending EPERM). Thus by pure accident/dumb luck/bug if an admin configured the PAM stack to reject all logins that didn't talk to audit, but then ran the login untility in the non-init_net namespace, it would work!! Clearly this was a bug, but it is a bug some people expected. With the introduction of network namespace support in 3.14-rc1 the two bugs stopped cancelling each other out. Now, containers in the non-init_net namespace refused to let users log in (just like PAM was configfured!) Obviously some people were not happy that what used to let users log in, now didn't! This fix is kinda hacky. We return ECONNREFUSED for all non-init relevant namespaces. That means that not only will the old broken non-init_net setups continue to work, now the broken non-init_pid or non-init_user setups will 'work'. They don't really work, since audit isn't logging things. But it's what most users want. In 3.15 we should have patches to support not only the non-init_net (3.14) namespace but also the non-init_pid and non-init_user namespace. So all will be right in the world. This just opens the doors wide open on 3.14 and hopefully makes users happy, if not the audit system... Reported-by: Andre Tomt <andre@tomt.net> Reported-by: Adam Richter <adam_richter2004@yahoo.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Theodore Ts'o authored
Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 30 Mar, 2014 4 commits
-
-
Al Viro authored
fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
preparation to switching mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
* switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 29 Mar, 2014 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Ingo Molnar: "A late breaking fix from John. (The bug fixed has a hard lockup potential, but that was not observed, warnings were)" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Revert to calling clock_was_set_delayed() while in irq context
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
Pull Ceph fix from Sage Weil: "This drops a bad assert that a few users have been hitting but we've only recently been able to track down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: drop an unsafe assertion
-
Dmitry Torokhov authored
We should not be using static variable mousedev_mix in methods that can be called before that singleton gets assigned. While at it let's add open and close methods to mousedev structure so that we do not need to test if we are dealing with multiplexor or normal device and simply call appropriate method directly. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=71551Reported-by: GiulioDP <depasquale.giulio@gmail.com> Tested-by: GiulioDP <depasquale.giulio@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-
Elias Vanderstuyft authored
If a new (id == -1) ff effect was uploaded from userspace, ff-core.c::input_ff_upload() will have assigned a positive number to the new effect id. Currently, evdev.c::evdev_do_ioctl() will save this new id to userspace, regardless of whether the upload succeeded or not. On upload failure, this can be confusing because the dev->ff->effects[] array will not contain an element at the index of that new effect id. This patch fixes this by leaving the id unchanged after upload fails. Note: Unfortunately applications should still expect changed effect id for quite some time. This has been discussed on: http://www.mail-archive.com/linux-input@vger.kernel.org/msg08513.html ("ff-core effect id handling in case of a failed effect upload") Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Elias Vanderstuyft <elias.vds@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-
Alex Elder authored
Olivier Bonvalet reported having repeated crashes due to a failed assertion he was hitting in rbd_img_obj_callback(): Assertion failure in rbd_img_obj_callback() at line 2165: rbd_assert(which >= img_request->next_completion); With a lot of help from Olivier with reproducing the problem we were able to determine the object and image requests had already been completed (and often freed) at the point the assertion failed. There was a great deal of discussion on the ceph-devel mailing list about this. The problem only arose when there were two (or more) object requests in an image request, and the problem was always seen when the second request was being completed. The problem is due to a race in the window between setting the "done" flag on an object request and checking the image request's next completion value. When the first object request completes, it checks to see if its successor request is marked "done", and if so, that request is also completed. In the process, the image request's next_completion value is updated to reflect that both the first and second requests are completed. By the time the second request is able to check the next_completion value, it has been set to a value *greater* than its own "which" value, which caused an assertion to fail. Fix this problem by skipping over any completion processing unless the completing object request is the next one expected. Test only for inequality (not >=), and eliminate the bad assertion. Tested-by: Olivier Bonvalet <ob@daevel.fr> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
-
- 28 Mar, 2014 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) We've discovered a common error in several networking drivers, they put VLAN offload features into ->vlan_features, which would suggest that they support offloading 2 or more levels of VLAN encapsulation. Not only do these devices not do that, but we don't have the infrastructure yet to handle that at all. Fixes from Vlad Yasevich. 2) Fix tcpdump crash with bridging and vlans, also from Vlad. 3) Some MAINTAINERS updates for random32 and bonding. 4) Fix late reseeds of prandom generator, from Sasha Levin. 5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki Makita. 6) Fix deadlock in openvswitch, from Flavio Leitner. 7) get_timewait4_sock() doesn't report delay times correctly, fix from Eric Dumazet. 8) Duplicate address detection and addrconf verification need to run in contexts where RTNL can be obtained. Move them to run from a workqueue. From Hannes Frederic Sowa. 9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar. 10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits) vlan: Warn the user if lowerdev has bad vlan features. veth: Turn off vlan rx acceleration in vlan_features ifb: Remove vlan acceleration from vlan_features qlge: Do not propaged vlan tag offloads to vlans bridge: Fix crash with vlan filtering and tcpdump net: Account for all vlan headers in skb_mac_gso_segment MAINTAINERS: bonding: change email address MAINTAINERS: bonding: change email address ipv6: move DAD and addrconf_verify processing to workqueue tcp: fix get_timewait4_sock() delay computation on 64bit openvswitch: fix a possible deadlock and lockdep warning bridge: Fix handling stacked vlan tags bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled vhost: validate vhost_get_vq_desc return value vhost: fix total length when packets are too short random32: avoid attempt to late reseed if in the middle of seeding random32: assign to network folks in MAINTAINERS net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors vlan: Set hard_header_len according to available acceleration ...
-
David S. Miller authored
Vlad Yasevich says: ==================== Audit all drivers for correct vlan_features. Some drivers set vlan acceleration features in vlan_features. This causes issues with Q-in-Q/802.1ad configurations. Audit all the drivers for correct vlan_features. Fix broken ones. Add a warning to vlan code to help catch future offenders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Some drivers incorrectly assign vlan acceleration features to vlan_features thus causing issues for Q-in-Q vlan configurations. Warn the user of such cases. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
For completeness, turn off vlan rx acceleration in vlan_features so that it doesn't show up on q-in-q setups. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Do not include vlan acceleration features in vlan_features as that precludes correct Q-in-Q operation. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices. With the current settings, q-in-q will only generate a single vlan header. Remember to mask off CTAG_TX and CTAG_RX features in vlan_features. CC: Shahed Shaikh <shahed.shaikh@qlogic.com> CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> CC: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-