Matt Reimer [Tue, 18 Nov 2008 18:54:32 +0000 (10:54 -0800)]
[MTD] [NAND] pxa3xx: convert from ns to clock ticks more accurately
The various fields in NDTR{01} are in units of clock ticks minus one, but the
ns2cycle macro mistakenly adds one, inflating the number of clock ticks and
making it impossible to set any of these fields to zero.
Signed-off-by: Matt Reimer <mreimer@vpop.net> Signed-off-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Matt Reimer [Tue, 18 Nov 2008 18:47:42 +0000 (10:47 -0800)]
[MTD] [NAND] pxa3xx: fix non-page-aligned reads
Reads from non-page-aligned addresses were broken because while the
address to read from was correctly written to NDCB*, a full page was
always read. Fix this by ignoring the column and only using the page
address.
I suspect this whole-page behavior is due to the controller's need to
read the entire page in order to generate correct ECC. In the non-ECC
case this could be optimized to use the column address, and to set the
read length to what is being requested rather than the length of an
entire page.
Signed-off-by: Matt Reimer <mreimer@vpop.net> Signed-off-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Randy Dunlap [Wed, 17 Dec 2008 20:46:17 +0000 (12:46 -0800)]
[MTD] [NAND] fix nandsim sched.h references
Fix sched.h references:
build-r7149.out:/local/linsrc/linux-next-20081215/drivers/mtd/nand/nandsim.c:1326: error: dereferencing pointer to incomplete type
build-r7149.out:/local/linsrc/linux-next-20081215/drivers/mtd/nand/nandsim.c:1326: error: 'PF_MEMALLOC' undeclared (first use in this function)
build-r7149.out:/local/linsrc/linux-next-20081215/drivers/mtd/nand/nandsim.c:1328: error: dereferencing pointer to incomplete type
build-r7149.out:/local/linsrc/linux-next-20081215/drivers/mtd/nand/nandsim.c:1335: error: dereferencing pointer to incomplete type
build-r7149.out:/local/linsrc/linux-next-20081215/drivers/mtd/nand/nandsim.c:1335: error: 'PF_MEMALLOC' undeclared (first use in this function)
build-r7149.out:make[4]: *** [drivers/mtd/nand/nandsim.o] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Kay Sievers [Fri, 7 Nov 2008 00:37:21 +0000 (01:37 +0100)]
avr: struct device - replace bus_id with dev_name(), dev_set_name()
(I did not compile or test it, please let me know, or help fixing
it, if something is wrong with the conversion)
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".
To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.
We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.
We want to submit a patch to -next, which will remove bus_id from
"struct device", to find the remaining pieces to convert, and finally
switch over to the new api, which will remove the 20 bytes array
and does no longer have a size limitation.
Thanks,
Kay
From: Kay Sievers <kay.sievers@vrfy.org>
Subject: avr: struct device - replace bus_id with dev_name(), dev_set_name()
Ken Chen [Mon, 10 Nov 2008 08:26:08 +0000 (11:26 +0300)]
proc: add /proc/*/stack
/proc/*/stack adds the ability to query a task's stack trace. It is more
useful than /proc/*/wchan as it provides full stack trace instead of single
depth. Example output:
Alexey Dobriyan [Mon, 27 Oct 2008 19:48:36 +0000 (22:48 +0300)]
proc: stop using BKL
There are four BKL users in proc: de_put(), proc_lookup_de(),
proc_readdir_de(), proc_root_readdir(),
1) de_put()
-----------
de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
needed. BKL doesn't matter to possible refcount leak as well.
2) proc_lookup_de()
-------------------
Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
potentially blocking, all callers of proc_lookup_de() eventually end up
from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
doesn't protect anything.
3) proc_readdir_de()
--------------------
"." and ".." part doesn't need BKL, walking PDE list is under
proc_subdir_lock, calling filldir callback is potentially blocking
because it writes to luserspace. All proc_readdir_de() callers
eventually come from ->readdir hook which is under directory's
->i_mutex -- BKL doesn't protect anything.
4) proc_root_readdir_de()
-------------------------
proc_root_readdir_de is ->readdir hook, see (3).
Since readdir hooks doesn't use BKL anymore, switch to
generic_file_llseek, since it also takes directory's i_mutex.
Impact: fix kernel warnings [and potential crash] during suspend+resume
Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
that causes warnings after suspend-resume cycles in Dhaval's case and
during stress tests in Jens's case. It would also probably cause failures
if heavily stressed. The solution, ironically enough, is to revert to
rcupreempt's code for initializing the dynticks state. And the patch
even results in smaller code -- so what was I thinking???
This is 2.6.29 material, given that people really do suspend and resume
Linux these days. ;-)
rcu: fix rcutree grace-period-latency bug on small systems
Impact: fix delays during bootup
Kudos to Andi Kleen for finding a grace-period-latency problem! The
problem was that the special-case code for small machines never updated
the ->signaled field to indicate that grace-period initialization had
completed, which prevented force_quiescent_state() from ever expediting
grace periods. This problem resulted in grace periods extending for more
than 20 seconds. Not subtle. I introduced this bug during my inspection
process when I fixed a race between grace-period initialization and
force_quiescent_state() execution.
The following patch properly updates the ->signaled field for the
"small"-system case (no more than 32 CPUs for 32-bit kernels and no more
than 64 CPUs for 64-bit kernels).
Reported-by: Andi Kleen <andi@firstfloor.org> Tested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David S. Miller [Mon, 5 Jan 2009 08:59:00 +0000 (00:59 -0800)]
tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
In splice TCP receive, the SPLICE_F_NONBLOCK flag is used
to compute the "timeo" value. So checking it again inside
of the main receive loop to trigger -EAGAIN processing is
entirely unnecessary.
Noticed by Jarek P. and Lennert Buytenhek.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Paris [Fri, 2 Jan 2009 22:40:06 +0000 (17:40 -0500)]
SELinux: shrink sizeof av_inhert selinux_class_perm and context
I started playing with pahole today and decided to put it against the
selinux structures. Found we could save a little bit of space on x86_64
(and no harm on i686) just reorganizing some structs.
Admittedly there aren't many of av_inherit or selinux_class_perm's in
the kernel (33 and 1 respectively) But the change to the size of struct
context reverberate out a bit. I can get some hard number if they are
needed, but I don't see why they would be. We do change which cacheline
context->len and context->str would be on, but I don't see that as a
problem since we are clearly going to have to load both if the context
is to be of any value. I've run with the patch and don't seem to be
having any problems.
An example of what's going on using struct av_inherit would be:
Julian Calaby [Mon, 5 Jan 2009 08:07:18 +0000 (00:07 -0800)]
sparc: Clean arch-specific code in prom_common.c
prom_nextprop() and prom_firstprop() have slightly different calling
conventions in 32 and 64 bit SPARC.
prom_common.c uses a ifdef guard to ensure that these functions are
called correctly.
Adjust code to eliminate this ifdef by using a calling convention that
is compatible with both 32 and 64 bit SPARC.
Signed-off-by: Julian Calaby <julian.calaby@gmail.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: don't mask EOF and socket errors on nonblocking splice receive
Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return. Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes some unused code, and make the calculation
of the number of blocks required conditional in order to reduce
the number of times this (potentially expensive) calculation
is done.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Send useful information with uevent messages
In order to distinguish between two differing uevent messages
and to avoid using the (racy) method of reading status from
sysfs in future, this adds some status information to our
uevent messages.
Btw, before anybody says "sysfs isn't racy", I'm aware of that,
but the way that GFS2 was using it (send an ambiugous uevent and
then expect the receiver to read sysfs to find out the status
of the reported operation) was.
The additional benefit of using the new interface is that it
should be possible for a node to recover multiple journals
at the same time, since there is no longer any confusion as
to which journal the status belongs to.
At some future stage, when all the userland programs have been
converted, I intend to remove the old interface.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There was a use-after-free with the GFS2 super block during
umount. This patch moves almost all of the umount code from
->put_super into ->kill_sb, the only bit that cannot be moved
being the glock hash clearing which has to remain as ->put_super
due to umount ordering requirements. As a result its now obvious
that the kfree is the final operation, whereas before it was
hidden in ->put_super.
Also gfs2_jindex_free is then only referenced from a single file
so thats moved and marked static too.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The functions which are being moved can all be marked
static in their new locations, since they only have
a single caller each. Their new locations are more
logical than before and some of the functions are
small enough that the compiler might well inline them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold()
since it doesn't work like rindex hold, despite the comment. That
allows gfs2_jindex_hold() to be moved into ops_fstype.c where it
can be made static.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes the two daemons, gfs2_scand and gfs2_glockd
and replaces them with a shrinker which is called from the VM.
The net result is that GFS2 responds better when there is memory
pressure, since it shrinks the glock cache at the same rate
as the VFS shrinks the dcache and icache. There are no longer
any time based criteria for shrinking glocks, they are kept
until such time as the VM asks for more memory and then we
demote just as many glocks as required.
There are potential future changes to this code, including the
possibility of sorting the glocks which are to be written back
into inode number order, to get a better I/O ordering. It would
be very useful to have an elevator based workqueue implementation
for this, as that would automatically deal with the read I/O cases
at the same time.
This patch is my answer to Andrew Morton's remark, made during
the initial review of GFS2, asking why GFS2 needs so many kernel
threads, the answer being that it doesn't :-) This patch is a
net loss of about 200 lines of code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
By moving gfs2_recoverd, we can make an additional function static
and it also leaves only (the already scheduled for removal) gfs2_glockd
in daemon.c.
At the same time the declaration of gfs2_quotad is moved to quota.h
to reflect the new location of gfs2_quotad in a previous patch. Also
the recovery.h and quota.h headers are cleaned up.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Following on from the recent clean up of gfs2_quotad, this patch moves
the processing of "truncate in progress" inodes from the glock workqueue
into gfs2_quotad. This fixes a hang due to the "truncate in progress"
processing requiring glocks in order to complete.
It might seem odd to use gfs2_quotad for this particular item, but
we have to use a pre-existing thread since creating a thread implies
a GFP_KERNEL memory allocation which is not allowed from the glock
workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
both scheduled for removal at some (hopefully not too distant) future
point. That leaves only gfs2_quotad whose workload is generally fairly
light and is easily adapted for this extra task.
Also, as a result of this change, it opens the way for a future patch to
make the reading of the inode's information asynchronous with respect to
the glock workqueue, which is another improvement that has been on the list
for some time now.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch is a clean up of gfs2_quotad prior to giving it an
extra job to do in addition to the current portfolio of updating
the quota and statfs information from time to time.
As a result it has been moved into quota.c allowing one of the
functions it calls to be made static. Also the clean up allows
the two existing functions to have separate timeouts and also
to coexist with its future role of dealing with the "truncate in
progress" inode flag.
The (pointless) setting of gfs2_quotad_secs is removed since we
arrange to only wake up quotad when one of the two timers expires.
In addition the struct gfs2_quota_data is moved into a slab cache,
mainly for easier debugging. It should also be possible to use
a shrinker in the future, rather than the current scheme of scanning
the quota data entries from time to time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Although the glock dumps print quite a lot of information about
the glocks themselves, there are more things which can be
usefully added to the dump realting to the objects themselves.
This patch adds a few more fields to the inode and resource
group lines, which should be useful for debugging.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch moves the final field so that we can get rid
of struct gfs2_rgrpd_host, as promised some time ago. Also
by rearranging the fields slightly, we are able to reduce
the size of the gfs2_rgrpd structure at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The final field in gfs2_dinode_host was the i_flags field. Thats
renamed to i_diskflags in order to avoid confusion with the existing
inode flags, and moved into the inode proper at a suitable location
to avoid creating a "hole".
At that point struct gfs2_dinode_host is no longer needed and as
promised (quite some time ago!) it can now be removed completely.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This moves the directory entry count into the proper inode.
Potentially we could get this to share the space used by
something else in the future, but this is one more step
on the way to removing the gfs2_dinode_host structure.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Move generation number into "proper" part of inode
This moves the generation number from the gfs2_dinode_host
into the gfs2_inode structure. Eventually the plan is to get
rid of the gfs2_dinode_host structure completely.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There is a bug in writepage and delete_inode which allows jdata files to
invalidate pages from the address space without being in a transaction at
the time. This causes problems in case the pages are in the journal. This
patch fixes that case and prevents the resulting oops.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Move the contents of some headers which contained very
little into more sensible places, and remove the original
header files. This should make it easier to find things.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch implements the FIEMAP ioctl for GFS2. We can use the generic
code (aside from a lock order issue, solved as per Ted Tso's suggestion)
for which I've introduced a new variant of the generic function. We also
have one exception to deal with, namely stuffed files, so we do that
"by hand", setting all the required flags.
This has been tested with a modified (I could only find an old version) of
Eric's test program, and appears to work correctly.
This patch does not currently support FIEMAP of xattrs, but the plan is to add
that feature at some future point.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
Heiko Carstens [Mon, 5 Jan 2009 01:36:32 +0000 (17:36 -0800)]
qeth: get rid of extra argument after printk to dev_* conversion
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_setadapter_parms':
drivers/s390/net/qeth_l3_main.c:1049: warning: too many arguments for format
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The device driver qeth dos not support large send using EDDP for
HiperSockets.
Signed-off-by: Klaus-Dieter Wacker <kdwacker@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Blaschka [Mon, 5 Jan 2009 01:35:44 +0000 (17:35 -0800)]
qeth: do not spin for SETIP ip assist command
The ip assist hw command for setting an IP address last unacceptable
long so we can not spin while we waiting for the irq. Since we can
ensure process context for all occurrences of this command we can use
wait.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 5 Jan 2009 01:35:18 +0000 (17:35 -0800)]
qeth: avoid crash in case of layer mismatch for VSWITCH
For z/VM GuestLAN or VSWITCH devices the transport layer is
configured in z/VM. The layer2 attribute of a participating Linux
device has to match the z/VM definition. In case of a mismatch
Linux currently crashes in qeth recovery due to a reference to the
not yet existing net_device.
Solution: add a check for existence of net_device and add a message
pointing to the mismatch of layer definitions in Linux and z/VM.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 5 Jan 2009 01:34:52 +0000 (17:34 -0800)]
qeth: exploit source MAC address for inbound layer3 packets
OSA-devices operating in layer3 mode offer adding of the source MAC
address to the QDIO header of inbound packets. The qeth driver can
exploit this functionality to replace FAKELL-entries in the ethernet
header of received packets.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The pre z9 machines provide an mcl string in EBCDIC format,
z9 or later provide string in ASCII format.
Signed-off-by: Klaus-Dieter Wacker <kdwacker@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b47300168e770b60ab96c8924854c3b0eb4260eb "Do not fire linkwatch
events until the device is registered." was made as a workaround for
drivers that call netif_carrier_off before registering the device.
Unfortunately this causes these drivers to incorrectly report their
link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
flag when the interface is first brought up. This issues was
previously pointed out[1] but was dismissed saying that IFF_RUNNING is
not related to the link status. From my digging IFF_RUNNING, as
reported to userspace, is based on the link state. It is set based on
__LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
not reported to user space so it may well be independent of the link,
I don't know if and when it may get set.)
The end result depends slightly depending on the driver. The the two I
tested were e1000e and b44. With e1000e if the system is booted
without a network cable attached the interface will falsely report
RUNNING when it is brought up causing NetworkManager to attempt to
start it and eventually time out. With b44 when the system is booted
with a network cable attached and brought up with dhcpcd it will time
out the first time.
The attached patch that will still set the operstate variable
correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
but then return rather than skipping the linkwatch_fire_event call
entirely as the previous fix did. (sorry it isn't inline, I don't have
a patch friendly email client at the moment)
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Mon, 5 Jan 2009 01:12:04 +0000 (17:12 -0800)]
e100: cosmetic cleanup
Add missing space after if, switch, for and while keywords.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 4dec9b807be757780ca3611a959ac22c28d292a7 ("rfkill: strip pointless
notifier chain") removed the only user of rfkill_led_trigger() that was not
guarded by #ifdef CONFIG_RFKILL_LEDS. Therefore, move rfkill_led_trigger()
completely inside #ifdef CONFIG_RFKILL_LEDS and avoid the compile time
warning:
net/rfkill/rfkill.c:59: warning: 'rfkill_led_trigger' defined but not used
Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Mercer [Mon, 5 Jan 2009 01:07:50 +0000 (17:07 -0800)]
qlge: bugfix: Fix shadow register endian issue.
Shadow registers are consistent memory locations to which the chip
echos ring indexes in little endian format. These values need to
be endian swapped before referencing.
Note:
The register pointer declaration uses the volatile modifier which
causes warnings in checkpatch.
Per Documentation/volatile-considered-harmful.txt:
- Pointers to data structures in coherent memory which might be modified
by I/O devices can, sometimes, legitimately be volatile. A ring buffer
used by a network adapter, where that adapter changes pointers to
indicate which descriptors have been processed, is an example of this
type of situation.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 5 Jan 2009 00:32:11 +0000 (16:32 -0800)]
Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
audit: validate comparison operations, store them in sane form
clean up audit_rule_{add,del} a bit
make sure that filterkey of task,always rules is reported
audit rules ordering, part 2
fixing audit rule ordering mess, part 1
audit_update_lsm_rules() misses the audit_inode_hash[] ones
sanitize audit_log_capset()
sanitize audit_fd_pair()
sanitize audit_mq_open()
sanitize AUDIT_MQ_SENDRECV
sanitize audit_mq_notify()
sanitize audit_mq_getsetattr()
sanitize audit_ipc_set_perm()
sanitize audit_ipc_obj()
sanitize audit_socketcall
don't reallocate buffer in every audit_sockaddr()
Baruch Siach [Mon, 5 Jan 2009 00:23:01 +0000 (16:23 -0800)]
enc28j60: fix RX buffer overflow
The enc28j60 driver doesn't check whether the length of the packet as reported
by the hardware fits into the preallocated buffer. When stressed, the hardware
may report insanely large packets even tough the "Receive OK" bit is set. Fix
this.
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
CRED: Differentiate objective and effective subjective credentials on a task
The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.
There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.
Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds. However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.
One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.
The affected capability check is in generic_permission():
if (!(mask & MAY_EXEC) || execute_ok(inode))
if (capable(CAP_DAC_OVERRIDE))
return 0;
This change splits capable() from has_capability() down into the commoncap and
SELinux code. The capable() security op now only deals with the current
process, and uses the current process's subjective creds. A new security op -
task_capable() - is introduced that can check any task's objective creds.
strictly the capable() security op is superfluous with the presence of the
task_capable() op, however it should be faster to call the capable() op since
two fewer arguments need be passed down through the various layers.
This can be tested by compiling the following program from the XFS testsuite:
/*
* t_access_root.c - trivial test program to show permission bug.
*
* Written by Michael Kerrisk - copyright ownership not pursued.
* Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
*/
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>