- 25 May, 2003 40 commits
-
-
Andrew Morton authored
From: Stephen Smalley <sds@epoch.ncsc.mil> This patch against 2.5.69-bk adds a hook to proc_pid_make_inode to allow security modules to set the security attributes on /proc/pid inodes based on the security attributes of the associated task. This is required by SELinux in order to control access to the process state accessible via /proc/pid inodes in accordance with the task's security label. An alternative approach that was considered was to implement an xattr handler for /proc/pid inodes. That approach would still require a hook call from the xattr handler to the security module to obtain an xattr value based on the task security attributes, so it would add a further level of indirection/translation. The only benefit of implementing an xattr handler for the /proc/pid inodes would be that the /proc/pid inode security labels could then be exported to userspace. However, the /proc/pid inode security labels are only used internally by the security module for access control purposes, and userspace access to the full range of process attributes is already provided via the /proc/pid/attr interface. Consequently, a simple hook in proc_pid_make_inode seemed preferable.
-
Andrew Morton authored
From: Stephen Smalley <sds@epoch.ncsc.mil> This patch, relative to the /proc/pid/attr patch against 2.5.69, fixes the mode values of the /proc/pid/attr nodes to avoid interference by the normal Linux access checks for these nodes (and also fixes the /proc/pid/attr/prev mode to reflect its read-only nature). Otherwise, when the dumpable flag is cleared by a set[ug]id or unreadable executable, a process will lose the ability to set its own attributes via writes to /proc/pid/attr due to a DAC failure (/proc/pid inodes are assigned the root uid/gid if the task is not dumpable, and the original mode only permitted the owner to write). The security module should implement appropriate permission checking in its [gs]etprocattr hook functions. In the case of SELinux, the setprocattr hook function only allows a process to write to its own /proc/pid/attr nodes as well as imposing other policy-based restrictions, and the getprocattr hook function performs a permission check between the security labels of the current process and target process to determine whether the operation is permitted.
-
Andrew Morton authored
From: Stephen Smalley <sds@epoch.ncsc.mil> This updated patch against 2.5.69 merges the readdir and lookup routines for proc_base and proc_attr, fixes the copy_to_user call in proc_attr_read and proc_info_read, moves the new data and code within CONFIG_SECURITY, and uses ARRAY_SIZE, per the comments from Al Viro and Andrew Morton. As before, this patch implements a process attribute API for security modules via a set of nodes in a /proc/pid/attr directory. Credit for the idea of implementing this API via /proc/pid/attr nodes goes to Al Viro. Jan Harkes provided a nice cleanup of the implementation to reduce the code bloat.
-
Andrew Morton authored
All slabs which can be reclaimed via VM presure are marked as being shrinkable, so the core slab code will keep count of their pages. Except for the one in XFS. It has strange wrapper stuff.
-
Andrew Morton authored
We have a problem at present in vm_enough_memory(): it uses smoke-n-mirrors to try to work out how much memory can be reclaimed from dcache and icache. it sometimes gets it quite wrong, especially if the slab has internal fragmentation. And it often does. So here we take a new approach. Rather than trying to work out how many pages are reclaimable by counting up the number of inodes and dentries, we change the slab allocator to keep count of how many pages are currently used by slabs which can be shrunk by the VM. The creator of the slab marks the slab as being reclaimable at kmem_cache_create()-time. Slab keeps a global counter of pages which are currently in use by thus-tagged slabs. Of course, we now slightly overestimate the amount of reclaimable memory, because not _all_ of the icache, dcache, mbcache and quota caches are reclaimable. But I think it's better to be a bit permissive rather than bogusly failing brk() calls as we do at present.
-
Andrew Morton authored
From: Neil Brown <neilb@cse.unsw.edu.au> When an NFS request arrives, it contains a filehandle which needs to be converted to a dentry. Many filesystems use find_exported_dentry in fs/exportfs/expfs.c. A key part of this on filesystem where a 32bit inode number uniquely locates a file is export_iget which calls iget(sb, inum). iget will either: 1/ find the inode in the inode cache and return it or 2/ create a new inode and call ->read_inode to load it from the storage device. export_iget then verifies the inode is really a good inode (->read_inode didn't detect any problems) and the right inode (base on generation number from the file handle). For this to work reliably, it is important that whenever an inode is *not* in the cache, the on-device version is up-to-date. Otherwise, when read_inode loads the inode it will get bad data. For a file that has not been deleted, this condition always holds: a dirty inode is always flushed to disc before the inode is unhashed. However for a file that is being deleted this condition doesn't (didn't) hold. When iput -> iput_final -> generic_drop_inode -> generic_delete_inode is called we would unhash the inode before calling into the filesytem through ->delete_inode. So there is a small window between when generic_delete_inode unhashes the inode, and when ->delete_inode writes something to disc, where a call to ->read_inode (for export_iget) might discover what it thinks is a valid inode, but is really one that is in the process of being destroyed. It is this window that I want to close by moving the unhashing to the end of generic_delete_inode.
-
Andrew Morton authored
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> There are a couple of places in the readdir code where it forgets to set the returned error code to -EFAULT, leaving it at the default -EINVAL. Fix that up, and rename getdents_callback64.count to "result", which makes more sense.
-
Andrew Morton authored
From zwane We shutdown the MAC part of the card and have interrupts disabled, interrupt gets queued, we reenable interrupts after shutting down device, service the interrupt, check status and get 0xff from powered down device. No idea what he's talking about here, but apparently the irq return handling isn't working out. Just return IRQ_HANDLED all the time.
-
Andrew Morton authored
From: Oleg Drokin <green@namesys.com> This is a forward port of 2.4's inode attributes support for reiserfs. Original implementation for 2.4 was performed by Nikita Danilov. In order to enable this support, one must use "attrs" mount options, eg: mount /dev/hda1 /mount/pont -t reiserfs -o attrs Also either the filesystem must have been created with a recent mkreiserfs or must have been modified by a recent version of reiserfsck with its "--clean-attributes" option. If that is not done, attributes support will not be enabled and a kernel message will be printed. This is necessary because old kernels left random garbage in the place where these attributes now live. These attributes are totally compatible with ext2's ones. You can manipulate them with chattr/lsattr etc. Additionally the chattr 'd' option may be used to disable tail packing on a specific file or a directory tree. (The 'd' option normally means "don't dump". reiserfs has overloaded it).
-
Andrew Morton authored
From: Zwane Mwaikambo <zwane@linuxpower.ca> kapmd does a conditional check in order to decide whether to set the task's cpu affinity mask. This can change during runtime, therefore we unconditionally set it. There is an early exit in set_cpus_allowed if the current processor is in the allowed mask anyway.
-
Andrew Morton authored
__unhash_process acquires the dcache_lock while holding the tasklist_lock for writing. This can deadlock. Additionally, fs/proc/base.c incorrectly assumed that p->pid would be set to 0 during release_task. The patch fixes that by adding a new spinlock to the task structure and fixing all references to (!p->pid). The alternative to the new spinlock would be to hold dcache_lock around __unhash_process. - fs/proc/base.c assumed that p->pid is reset to 0 during exit. This is not the case anymore. I now look at the count of the pid structure for PIDTYPE_PID. - de_thread now tested - as broken as it was before: open handles to /proc/<pid> are either stale or invalid after an exec of a nptl process, if the exec was call from a secondary thread. - a few lock_kernels removed - that part of /proc doesn't need it. - additional instances of 'if(current->pid)' replaced with pid_alive.
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> mpc_apicid is a u8, and MAX_APICS can be 256.
-
Andrew Morton authored
fs/compat.c: In function `compat_sys_ioctl': fs/compat.c:324: warning: implicit declaration of function `siocdevprivate_ioctl'
-
Andrew Morton authored
Don't assume the size of dev_t: on ppc64 it is unsignedlong and this generates a printk warning.
-
Andrew Morton authored
arch/ppc64/kernel/htab.c:105: warning: implicit declaration of function `pSeries_lpar_hpte_insert' arch/ppc64/kernel/htab.c:109: warning: implicit declaration of function `pSeries_hpte_insert'
-
Andrew Morton authored
Fix a printk warning
-
Andrew Morton authored
two printk warnings
-
Andrew Morton authored
warning: assignment makes pointer from integer without a cast
-
Andrew Morton authored
It needs sched.h for `current'.
-
Andrew Morton authored
From: David Gibson <david@gibson.dropbear.id.au> This removes a bunch of unused variables in prom_init(), squashing the associated warnings.
-
Andrew Morton authored
From: David Gibson <david@gibson.dropbear.id.au> xics.c uses ppc64_boot_msg() without prototype, this fixes it by inclding <asm/machdep.h>.
-
Andrew Morton authored
do_signal32() is used before it is defined, this prototype squashes the warning.
-
Andrew Morton authored
From: David Gibson <david@gibson.dropbear.id.au> Squash implicit declaration warning in ppc64 align.c
-
Andrew Morton authored
From: David Gibson <david@gibson.dropbear.id.au> addnote in arch/ppc64/boot (a userspace tool, not kernel code) uses exit() without including stdlib.h.
-
Andrew Morton authored
PPC64 irq return fix
-
Andrew Morton authored
Fix some warnings in the ppc64 build. Also declare a couple of AIO functions in aio.h rather than aio.c They are needed for 32-bit emulation support.
-
Andrew Morton authored
From: Anton Blanchard <anton@samba.org> PPC64 32/64-bit emulation for AIO.
-
Linus Torvalds authored
Very early initialization (core_initcall) needs to have the cdev initialization done. So make it part of the pre-initcall sequence, the same way the bdev caches were done.
-
Linus Torvalds authored
-
Ingo Molnar authored
This addresses a futex related SMP scalability problem of glibc. A number of regressions have been reported to the NTPL mailing list when going to many CPUs, for applications that use condition variables and the pthread_cond_broadcast() API call. Using this functionality, testcode shows a slowdown from 0.12 seconds runtime to over 237 seconds (!) runtime, on 4-CPU systems. pthread condition variables use two futex-backed mutex-alike locks: an internal one for the glibc CV state itself, and a user-supplied mutex which the API guarantees to take in certain codepaths. (Unfortunately the user-supplied mutex cannot be used to protect the CV state, so we've got to deal with two locks.) The cause of the slowdown is a 'swarm effect': if lots of threads are blocked on a condition variable, and pthread_cond_broadcast() is done, then glibc first does a FUTEX_WAKE on the cv-internal mutex, then down a mutex_down() on the user-supplied mutex. Ie. a swarm of threads is created which all race to serialize on the user-supplied mutex. The more threads are used, the more likely it becomes that the scheduler will balance them over to other CPUs - where they just schedule, try to lock the mutex, and go to sleep. This 'swarm effect' is purely technical, a side-effect of glibc's use of futexes, and the imperfect coupling of the two locks. the solution to this problem is to not wake up the swarm of threads, but 'requeue' them from the CV-internal mutex to the user-supplied mutex. The attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N threads from futex address A to futex address B. This way glibc can wake up a single thread (which will take the user-mutex), and can requeue the rest, with a single system-call. Ulrich Drepper has implemented FUTEX_REQUEUE support in glibc, and a number of people have tested it over the past couple of weeks. Here are the measurements done by Saurabh Desai: System: 4xPIII 700MHz ./cond-perf -r 100 -n 200: 1p 2p 4p Default NPTL: 0.120s 0.211s 237.407s requeue NPTL: 0.124s 0.156s 0.040s ./cond-perf -r 1000 -n 100: Default NPTL: 0.276s 0.412s 0.530s requeue NPTL: 0.349s 0.503s 0.550s ./pp -v -n 128 -i 1000 -S 32768: Default NPTL: 128 games in 1.111s 1.270s 16.894s requeue NPTL: 128 games in 1.111s 1.959s 2.426s ./pp -v -n 1024 -i 10 -S 32768: Default NPTL: 1024 games in 0.181s 0.394s incompleted 2m+ requeue NPTL: 1024 games in 0.166s 0.254s 0.341s the speedup with increasing number of threads is quite significant, in the 128 threads, case it's more than 8 times. In the cond-perf test, on 4 CPUs it's almost infinitely faster than the 'swarm of threads' catastrophy triggered by the old code.
-
Alexander Viro authored
new fields in struct inode - i_cdev and i_cindex. When we do open() on a character device we cache result of cdev lookup in inode and put the inode on a cyclic list anchored in cdev. If we already have that done, we don't bother with any lookups. When inode disappears it's removed from the list. When cdev gets unregistered we remove all cached references to it (and remove such inodes from the list). cdev is held until final fput() now.
-
Alexander Viro authored
New object: struct cdev. It contains a kobject, a pointer to file_operations and a pointer to owner module. These guys have a search structure of the same sort as gendisks and chrdev_open() picks file_operations from them. Intended use: embed such animal in driver-owned structure (e.g. tty_driver) and register it as associated with given range of device numbers. Generic code will do lookup for such object and use it for the rest. The behaviour of register_chrdev() is _not_ changed - it allocates struct cdev and registers it; any old driver will work as if nothing had changed. On that stage we only use it during chrdev_open() to find file_operations. Later it will be cached in inode->i_cdev (and index in range - in inode->i_cindex) so that ->open() could get whatever objects it wants directly without any special-cased lookups, etc.
-
Alexander Viro authored
code responsible for gendisk lookups taken out in drivers/base and generalized - now it allows to have a range-based mapping from numbers to kobjects for given struct subsystem.
-
Alexander Viro authored
register_chrdev_region() sanitized, code in tty_io.c that dealt with it cleaned up.
-
Alexander Viro authored
preparation to cdev-cidr - the lookup mechanism for gendisks is switched to dealing with disk->kobj instead of disk.
-
Alexander Viro authored
pt.c fed through Lindent
-
Alexander Viro authored
Remove cpp abuses - same as had been done for pd/pf/pcd.
-
Alexander Viro authored
Remove the rest of cpp abuse in pg.c
-
Alexander Viro authored
pg.c fed through Lindent
-
Alexander Viro authored
This removes cpp abuses - same as had been done for pd/pf/pcd.
-