- 22 Mar, 2012 30 commits
-
-
Alex Elder authored
Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for the next patch.) Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd id's though, and this change makes ids get allocated as they were before--each new id is one more than the maximum currently in use. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
The only time entries are added to or removed from the global rbd_dev_list is exactly when a "put" or "get" operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Josh Durgin authored
There's already a constant for this anyway. Since rbd_header_set_snap() is only used to set the rbd device snap_name field, just do that within that function rather than having it take the snap_name as an argument. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net> v2: Changed interface rbd_header_set_snap() so it explicitly updates the snap_name in the rbd_device. Also added a BUILD_BUG_ON() to verify the size of the snap_name field is sufficient for SNAP_HEAD_NAME.
-
Alex Elder authored
The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy only. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Some minor cleanups in "drivers/block/rbd.c: - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96" in the definition of RBD_MAX_MD_NAME_LEN. - Use DEFINE_SPINLOCK() to define and initialize node_lock. - Drop a needless (char *) cast in parse_rbd_opts_token(). - Make a few minor formatting changes. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
This patch just rearranges a few bits of code to make more portions of ceph_setxattr() and ceph_removexattr() identical. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
All names defined in the directory and file virtual extended attribute tables are constant, and the size of each is known at compile time. So there's no need to compute their length every time any file's attribute is listed. Record the length of each string and use it when needed to determine the space need to represent them. In addition, compute the aggregate size of strings in each table just once at initialization time. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
The names of the callback functions used for virtual extended attributes are based only on the last component of the attribute name. Because of the way these are defined, this precludes allowing a single (lowest) attribute name for different callbacks, dependent on the type of file being operated on. (For example, it might be nice to support both "ceph.dir.layout" and "ceph.file.layout".) Just change the callback names to avoid this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
A struct ceph_vxattr_cb does not represent a callback at all, but rather a virtual extended attribute itself. Drop the "_cb" suffix from its name to reflect that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Entries in the ceph virtual extended attribute tables all follow a distinct pattern in their definition. Enforce this pattern through the use of a macro. Also, a null name field signals the end of the table, so make that be the first field in the ceph_vxattr_cb structure. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Use symbolic constants to define the top-level prefix for "ceph." extended attribute names. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
All callers of ceph_match_vxattr() determine what to pass as the first argument by calling ceph_inode_vxattrs(inode). Just do that inside ceph_match_vxattr() itself, changing it to take an inode rather than the vxattr pointer as its first argument. Also ensure the function works correctly for an empty table (i.e., containing only a terminating null entry). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
For some reason, ceph_setxattr() allocates an extra byte in which a '\0' is stored past the end of an extended attribute value. This is not needed, and is potentially misleading, so get rid of it. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
This eliminates type casts in some places where they are not required. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute the array index by masking off all but the sequence number's lowest bits. Make the number of entries in the array a power of two to allow the use of such a mask (to avoid jumps in the index value when the sequence number wraps). The length of these strings is somewhat arbitrarily set at 60 bytes. The worst-case length of a string produced is 54 bytes, for an IPv6 address that can't be shortened, e.g.: [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767 Change it so we arbitrarily use 64 bytes instead; if nothing else it will make the array of these line up better in hex dumps. Rename a few things to reinforce the distinction between the number of strings in the array and the length of individual strings. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Rearrange ceph_tcp_connect() a bit, making use of "else" rather than re-testing a value with consecutive "if" statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Instead, use the kernel ZERO_PAGE() for that purpose. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Xi Wang authored
The overflow check for a + n * b should be (n > (ULONG_MAX - a) / b), rather than (n > ULONG_MAX / b - a). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Xi Wang authored
The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Xi Wang authored
Return -EINVAL rather than panic if iinfo->symlink_len and inode->i_size do not match. Also use kstrndup rather than kmalloc/memcpy. Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
Amon Ott authored
The root directory of the Ceph mount has inode number 1, so falling back to 1 always creates a collision. 2 is unused on my test systems and seems less likely to collide. Signed-off-by: Amon Ott <ao@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Avoid the need to check for a special zero s_cap_ttl value by just using (jiffies - 1) as the value assigned to indicate "sometime in the past." Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Jim Schutt authored
The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
- 18 Mar, 2012 3 commits
-
-
Linus Torvalds authored
-
Jason Baron authored
Commit 28d82dc1 ("epoll: limit paths") that I did to limit the number of possible wakeup paths in epoll is causing a few applications to longer work (dovecot for one). The original patch is really about limiting the amount of epoll nesting (since epoll fds can be attached to other fds). Thus, we probably can allow an unlimited number of paths of depth 1. My current patch limits it at 1000. And enforce the limits on paths that have a greater depth. This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking changes from David Miller: "1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to crashes, particularly during shutdown. Reported by Dave Jones and fixed by Eric Dumazet. 2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have already freed the SKB, which causes crashes as to the caller this means requeue the packet. Fixes from Eric Dumazet. 3) usbnet driver doesn't allocate the right amount of headroom on fresh RX SKBs, fix from Eric Dumazet. 4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it abolutely should not take a reference to 'dev', this leads to leaks. Fix from RonQing Li. 5) Fix netfilter ctnetlink race between delete and timeout expiration. From Pablo Neira Ayuso. 6) Revert SFQ change which causes regressions, specifically queueing to tail can lead to unavoidable flow starvation. From Eric Dumazet. 7) Fix a memory leak and a crash on corrupt firmware files in bnx2x, from Michal Schmidt." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netfilter: ctnetlink: fix race between delete and timeout expiration ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu. wimax/i2400m: fix erroneous NETDEV_TX_BUSY use net/hyperv: fix erroneous NETDEV_TX_BUSY use net/usbnet: reserve headroom on rx skbs bnx2x: fix memory leak in bnx2x_init_firmware() bnx2x: fix a crash on corrupt firmware file sch_sfq: revert dont put new flow at the end of flows ipv6: fix icmp6_dst_alloc()
-
- 17 Mar, 2012 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools, x86: Build perf on older user-space as well perf tools: Use scnprintf where applicable perf tools: Incorrect use of snprintf results in SEGV
-
Pablo Neira Ayuso authored
Kerin Millar reported hardlockups while running `conntrackd -c' in a busy firewall. That system (with several processors) was acting as backup in a primary-backup setup. After several tries, I found a race condition between the deletion operation of ctnetlink and timeout expiration. This patch fixes this problem. Tested-by: Kerin Millar <kerframil@gmail.com> Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
RongQing.Li authored
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't need to dev_hold(). With dev_hold(), not corresponding dev_put(), will lead to leak. [ bug introduced in 96b52e61 (ipv6: mcast: RCU conversions) ] Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Torvalds authored
Merge some more email patches from Andrew Morton: "A couple of nilfs fixes" * emailed from Andrew Morton <akpm@linux-foundation.org>: nilfs2: fix NULL pointer dereference in nilfs_load_super_block() nilfs2: clamp ns_r_segments_percentage to [1, 99]
-
Ryusuke Konishi authored
According to the report from Slicky Devil, nilfs caused kernel oops at nilfs_load_super_block function during mount after he shrank the partition without resizing the filesystem: BUG: unable to handle kernel NULL pointer dereference at 00000048 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP ... Call Trace: [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2] [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2] [<c0226636>] mount_fs+0x36/0x180 [<c023d961>] vfs_kern_mount+0x51/0xa0 [<c023ddae>] do_kern_mount+0x3e/0xe0 [<c023f189>] do_mount+0x169/0x700 [<c023fa9b>] sys_mount+0x6b/0xa0 [<c04abd1f>] sysenter_do_call+0x12/0x28 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc CR2: 0000000000000048 This turned out due to a defect in an error path which runs if the calculated location of the secondary super block was invalid. This patch fixes it and eliminates the reported oops. Reported-by: Slicky Devil <slicky.dvl@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Slicky Devil <slicky.dvl@gmail.com> Cc: <stable@vger.kernel.org> [2.6.30+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Haogang Chen authored
ns_r_segments_percentage is read from the disk. Bogus or malicious value could cause integer overflow and malfunction due to meaningless disk usage calculation. This patch reports error when mounting such bogus volumes. Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds authored
Pull maintainer update from James Morris: "Please pull this patch which adds Serge as maintainer of the capabilities code, as discussed on lwn and the lsm list. New capabilities must be signed off by the maintainer, and new uses of any capabilities should at be cc'd to the maintainer." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: MAINTAINERS: Add Serge as maintainer of capabilities
-