[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.
The cache reaper currently tries to free all alien caches and all remote
per cpu pages in each pass of cache_reap. For a machines with large number
of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
Interrupts are disabled while reclaiming creating unacceptable delays.
This patch changes that behavior by adding a per cpu reap_node variable.
Instead of attempting to free all caches, we free only one alien cache and
the per cpu pages from one remote node. That reduces the time spend in
cache_reap. However, doing so will lengthen the time it takes to
completely drain all remote per cpu pagesets and all alien caches. The
time needed will grow with the number of nodes in the system. All caches
are drained when they overflow their respective capacity. So the drawback
here is only that a bit of memory may be wasted for awhile longer.
Details:
1. Rename drain_remote_pages to drain_node_pages to allow the specification
of the node to drain of pcp pages.
2. Add additional functions init_reap_node, next_reap_node for NUMA
that manage a per cpu reap_node counter.
3. Add a reap_alien function that reaps only from the current reap_node.
For us this seems to be a critical issue. Holdoffs of an average of ~7ms
cause some HPC benchmarks to slow down significantly. F.e. NAS parallel
slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor
compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with
the additional interrupt holdoff reductions.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the code tries up to spin_retry times to grab a lock using the cs
instruction. The cs instruction has exclusive access to a memory region
and therefore invalidates the appropiate cache line of all other cpus. If
there is contention on a lock this leads to cache line trashing. This can
be avoided if we first check wether a cs instruction is likely to succeed
before the instruction gets actually executed.
Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] vmscan: no zone_reclaim if PF_MALLOC is set
If the process has already set PF_MALLOC and is already using
current->reclaim_state then do not try to reclaim memory from the zone.
This is set by kswapd and/or synchrononous global reclaim which will not
take it lightly if we zap the reclaim_state.
Signed-off-by: Christoph Lameter <clameter@sig.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Fri, 10 Mar 2006 01:33:45 +0000 (17:33 -0800)]
[PATCH] xtensa must set RWSEM_GENERIC_SPINLOCK=y
/usr/src/ctest/git/kernel/mm/rmap.c: In function `page_referenced_one':
/usr/src/ctest/git/kernel/mm/rmap.c:354: warning: implicit declaration of function `rwsem_is_locked'
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Doug Warzecha [Fri, 10 Mar 2006 01:33:35 +0000 (17:33 -0800)]
[PATCH] dcdbas: dcdbas_pdev referenced after platform_device_unregister on exit
smi_data_buf_free() references dcdbas_pdev when calling
dma_free_coherent(). In dcdbas_exit(), smi_data_buf_free() is called after
platform_device_unregister(dcdbas_pdev).
This patch moves platform_device_unregister(dcdbas_pdev) after
smi_data_buf_free() in dcdbas_exit().
Hugh Dickins [Fri, 10 Mar 2006 01:33:34 +0000 (17:33 -0800)]
[PATCH] page_add_file_rmap(): remove BUG_ON()s
Remove two early-development BUG_ONs from page_add_file_rmap.
The pfn_valid test (originally useful for checking that nobody passed an
artificial struct page) comes too late, since we already have the struct
page.
The PageAnon test (useful when anon was first distinguished from file rmap)
prevents ->nopage implementations from reusing ->mapping, which would
otherwise be available.
Ralf Baechle [Wed, 8 Mar 2006 14:13:04 +0000 (14:13 +0000)]
[MIPS] Discard .exit.text at runtime.
At times gcc will place bits of __exit functions into .rodata. If
compiled into the kernle itself we used to discard .exit.text - but
not the bits left in .rodata. While harmless this did at times result
in a large number of warnings. So until gcc fixes this, discard
.exit.text at runtime.
Ralf Baechle [Thu, 2 Mar 2006 16:50:12 +0000 (16:50 +0000)]
[MIPS] Threaten removal of code for NEC DDB5074 and DDB5476 evaluation boards.
What: Support for NEC DDB5074 and DDB5476 evaluation boards.
When: June 2006
Why: Board specific code doesn't build anymore since ~2.6.0 and no
users have complained indicating there is no more need for these
boards. This should really be considered a last call.
Who: Ralf Baechle <ralf@linux-mips.org>
Andi Kleen [Thu, 9 Mar 2006 01:57:25 +0000 (17:57 -0800)]
[PATCH] i386: port ATI timer fix from x86_64 to i386 II
ATI chipsets tend to generate double timer interrupts for the local APIC
timer when both the 8254 and the IO-APIC timer pins are enabled. This is
because they route it to both and the result is anded together and the CPU
ends up processing it twice.
This patch changes check_timer to disable the 8254 routing for interrupt 0.
I think it would be safe on all chipsets actually (i tested it on a couple
and it worked everywhere) and Windows seems to do it in a similar way, but
to be conservative this patch only enables this mode on ATI (and adds
options to enable/disable too)
Ported over from a similar x86-64 change.
I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
tweaked it a bit to work even without ACPI.
Inspired by a patch from Chuck Ebbert, but redone.
Randy Dunlap [Thu, 9 Mar 2006 00:46:08 +0000 (16:46 -0800)]
[NET] compat ifconf: fix limits
A recent change to compat. dev_ifconf() in fs/compat_ioctl.c
causes ifconf data to be truncated 1 entry too early when copying it
to userspace. The correct amount of data (length) is returned,
but the final entry is empty (zero, not filled in).
The for-loop 'i' check should use <= to allow the final struct
ifreq32 to be copied. I also used the ifconf-corruption program
in kernel bugzilla #4746 to make sure that this change does not
re-introduce the corruption.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Matz [Wed, 8 Mar 2006 05:55:48 +0000 (21:55 -0800)]
[PATCH] fix kexec asm
While testing kexec and kdump we hit problems where the new kernel would
freeze or instantly reboot. The easiest way to trigger it was to kexec a
kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7
instead would work fine.
The patch fixes a few problems with the kexec inline asm.
Signed-off-by: Chris Mason <mason@suse.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Mackall [Wed, 8 Mar 2006 05:55:47 +0000 (21:55 -0800)]
[PATCH] dac960: add disk entropy in request completions
Signed-off-by: Matt Mackall <mpm@selenic.com> Tested-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jack Steiner [Wed, 8 Mar 2006 05:55:46 +0000 (21:55 -0800)]
[PATCH] slab: allocate larger cache_cache if order 0 fails
kmem_cache_init() incorrectly assumes that the cache_cache object will fit
in an order 0 allocation. On very large systems, this is not true. Change
the code to try larger order allocations if order 0 fails.
Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Horst Hummel [Wed, 8 Mar 2006 05:55:39 +0000 (21:55 -0800)]
[PATCH] s390: dasd partition detection
DASD allows to open a device as soon as gendisk is registered, which means the
device is a fake device (capacity=0) and we do know nothing about blocksize
and partitions at that point of time. In case the device is opened by
someone, the bdev and inode creation is done with the fake device info and the
following partition detection code is just using the wrong data.
To avoid this modify the DASD state machine to make sure that the open is
rejected until the device analysis is either finished or an unformatted device
was detected.
Gerald Schaefer [Wed, 8 Mar 2006 05:55:37 +0000 (21:55 -0800)]
[PATCH] s390: fix strnlen_user return value
strnlen_user is supposed to return then length count + 1 if no terminating \0
is found, and it should return 0 on exception. Found by David Howells
<dhowells@redhat.com>.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dipankar Sarma [Wed, 8 Mar 2006 05:55:35 +0000 (21:55 -0800)]
[PATCH] fix file counting
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench. Tested on both x86_64 and powerpc.
The way we do file struct accounting is not very suitable for batched
freeing. For scalability reasons, file accounting was
constructor/destructor based. This meant that nr_files was decremented
only when the object was removed from the slab cache. This is susceptible
to slab fragmentation. With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -
At the same time, I see only a 2000+ objects in filp cache. The following
patch I fixes this problem.
This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api. In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.
Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dipankar Sarma [Wed, 8 Mar 2006 05:55:33 +0000 (21:55 -0800)]
[PATCH] rcu batch tuning
This patch adds new tunables for RCU queue and finished batches. There are
two types of controls - number of completed RCU updates invoked in a batch
(blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark,
qlowmark).
By default, the per-cpu batch limit is set to a small value. If the input
RCU rate exceeds the high watermark, we do two things - force quiescent
state on all cpus and set the batch limit of the CPU to INTMAX. Setting
batch limit to INTMAX forces all finished RCUs to be processed in one shot.
If we have more than INTMAX RCUs queued up, then we have bigger problems
anyway. Once the incoming queued RCUs fall below the low watermark, the
batch limit is set to the default.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Wed, 8 Mar 2006 05:55:31 +0000 (21:55 -0800)]
[PATCH] percpu_counter_sum()
Implement percpu_counter_sum(). This is a more accurate but slower version of
percpu_counter_read_positive().
We need this for Alex's speedup-ext3_statfs patch and for the nr_file
accounting fix. Otherwise these things would be too inaccurate on large CPU
counts.
Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Alex Tomas <alex@clusterfs.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GOTO Masanori [Wed, 8 Mar 2006 05:55:29 +0000 (21:55 -0800)]
[PATCH] x86: Fix i386 nmi_watchdog that does not trigger die_nmi
Fix i386 nmi_watchdog that does not meet watchdog timeout condition. It
does not hit die_nmi when it should be triggered, because the current
nmi_watchdog_tick in arch/i386/kernel/nmi.c never count up alert_counter
like this:
void nmi_watchdog_tick (struct pt_regs * regs) {
if (last_irq_sums[cpu] == sum) {
alert_counter[cpu]++; <- count up alert_counter, but
if (alert_counter[cpu] == 5*nmi_hz)
die_nmi(regs, "NMI Watchdog detected LOCKUP");
alert_counter[cpu] = 0; <- reset alert_counter
This patch changes it back to the previous and working version.
This was found and originally written by Kohta NAKASHIMA.
(akpm: also uninline write_watchdog_counter(), saving 184 byets)
Phillip Susi [Wed, 8 Mar 2006 05:55:24 +0000 (21:55 -0800)]
[PATCH] udf: fix uid/gid options and add uid/gid=ignore and forget options
Fix a bug in udf where it would write uid/gid = 0 to the disk for files
owned by the id given with the uid=/gid= mount options. It also adds 4 new
mount options: uid/gid=forget and uid/gid=ignore. Without any options the
id in core and on disk always match. Giving uid/gid=nnn specifies a
default ID to be used in core when the on disk ID is -1. uid/gid=ignore
forces the in core ID to allways be used no matter what the on disk ID is.
uid/gid=forget forces the on disk ID to always be written out as -1.
The use of these options allows you to override ownerships on a disk or
disable ownwership information from being written, allowing the media to be
used portably between different computers and possibly different users
without permissions issues that would require root to correct.
Signed-off-by: Phillip Susi <psusi@cfl.rr.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pavel Machek [Wed, 8 Mar 2006 05:55:20 +0000 (21:55 -0800)]
[PATCH] serial core: work around sub-driver bugs
We're presently getting oopses because Bluetooth (and possibly other) drivers
are calling core functions after things have been shut down.
So rather than oopsing, let's drop a warning then take avoiding action, so the
machine survives. Once all the sub-drivers are fixed up we can remove the
take-avoiding-action part.
Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/powerpc/platforms/pseries/eeh.c: In function `eeh_add_device_tree_late':
arch/powerpc/platforms/pseries/eeh.c:901: warning: implicit declaration of function `eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c: At top level:
arch/powerpc/platforms/pseries/eeh.c:918: error: conflicting types for 'eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c:901: error: previous implicit declaration of 'eeh_add_device_late' was here
make[2]: *** [arch/powerpc/platforms/pseries/eeh.o] Error 1
But we forgot the !CONFIG_EEH stub.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Wed, 8 Mar 2006 21:20:46 +0000 (13:20 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] mca recovery return value when no bus check
[IA64] SGI SN drivers: don't report !sn2 hardware as an error
[IA64] don't report !sn2 or !summit hardware as an error
[IA64] gensparse_defconfig: turn on PNPACPI
[IA64] Increase severity of MCA recovery messages
Linus Torvalds [Wed, 8 Mar 2006 18:33:05 +0000 (10:33 -0800)]
slab: fix calculate_slab_order() for SLAB_RECLAIM_ACCOUNT
Instead of having a hard-to-read and confusing conditional in the
caller, just make the slab order calculation handle this special case,
since it's simple and obvious there.
Paul Mackerras [Wed, 8 Mar 2006 02:24:22 +0000 (13:24 +1100)]
powerpc: Fix various syscall/signal/swapcontext bugs
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:
* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
path, so it was only doing anything with it once it saw some other
bit being set. In other words, the noerror behaviour would apply to
the next system call where we had to reschedule or deliver a signal,
which is not necessarily the current system call.
* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
path when the _TIF_SINGLESTEP bit was set.
* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
_TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
by system calls. I took it out of _TIF_USER_WORK_MASK.
* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
to be restored (unless perhaps a signal was delivered or the syscall
was traced or single-stepped). Thus the non-volatile registers
weren't restored on exit from a signal handler. We probably got
away with it mostly because signal handlers written in C wouldn't
alter the non-volatile registers.
* On 32-bit I simplified the code and made it more like 64-bit by
making the syscall exit path jump to ret_from_except to handle
preemption and signal delivery.
* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
set - but I think because of that 32-bit was actually restoring the
non-volatile registers on exit from a signal handler.
* I changed the order of enabling interrupts and saving the
non-volatile registers before calling do_syscall_trace_leave; now we
enable interrupts first.
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] ip22zilog: Fix oops on runlevel change with serial console
[SERIAL] Fix two bugs in parport_serial
Linus Torvalds [Wed, 8 Mar 2006 02:02:16 +0000 (18:02 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3353/1: NAS100d: protect nas100d_power_exit() with machine_is_nas100d()
[ARM] 3352/1: DSB required for the completion of a TLB maintenance operation
Bjorn Helgaas [Thu, 2 Mar 2006 23:59:50 +0000 (16:59 -0700)]
[IA64] gensparse_defconfig: turn on PNPACPI
Turn on CONFIG_PNPACPI. I recently removed 8250_acpi.c. All devices
previously claimed by 8250_acpi.c should now be claimed by 8250_pnp.c.
This depends on having CONFIG_PNPACPI so ACPI devices show up as PNP
devices.
All other ia64 defconfigs either have CONFIG_PNPACPI already, or
don't have 8250 support turned on at all.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Russ Anderson [Tue, 7 Mar 2006 23:23:25 +0000 (15:23 -0800)]
[IA64] Increase severity of MCA recovery messages
The MCA recovery messages are currently KERN_DEBUG,
so they don't show up in /var/log/messages (by default).
Increase the severity to KERN_ERR, for the initial
message (and also add the physical address to this
message). Leave the successful isolation message as
KERN_DEBUG, but increase the severity when isolation
fails to KERN_CRIT.
[Russ' patch made these all KERN_CRIT]
Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
The size of the skb carrying the netlink message is not
equivalent to the length of the actual netlink message
due to padding. ip_queue matches the length of the payload
against the original packet size to determine if packet
mangling is desired, due to the above wrong assumption
arbitary packets may not be mangled depening on their
original size.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
[ARM] 3353/1: NAS100d: protect nas100d_power_exit() with machine_is_nas100d()
Patch from Alessandro Zummo
nas100d_power_exit(void) gets some protection
to avoid freeing an irq when it is not appropriate to do so.
Signed-off-by: Rod Whitby <rod@whitby.id.au> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[SERIAL] ip22zilog: Fix oops on runlevel change with serial console
Incorrect uart_write_wakeup() calls cause reference to a NULL tty
pointer. This has been fixed in the sunsab and sunzilog serial drivers
in October 2005. Update the ip22zilog, which is based on sunzilog,
accordingly.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk
Catalin Marinas [Tue, 7 Mar 2006 14:42:27 +0000 (14:42 +0000)]
[ARM] 3352/1: DSB required for the completion of a TLB maintenance operation
Patch from Catalin Marinas
Chapter B2.7.3 in the latest ARM ARM (with v6 information) states that
the completion of a TLB maintenance operation is only guaranteed by
the execution of a DSB (Data Syncronization Barrier, formerly Data
Write Barrier or Drain Write Buffer).
Note that a DSB is only needed in the flush_tlb_kernel_* functions
since the completion is guaranteed by a mode change (i.e. switching
back to user mode) for the flush_tlb_user_* functions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Michael Chan [Tue, 7 Mar 2006 03:28:35 +0000 (19:28 -0800)]
[TG3]: Add DMA address workaround
Add DMA workaround for chips that do not support full 64-bit DMA
addresses.
5714, 5715, and 5780 chips only support DMA addresses less than 40
bits. On 64-bit systems with IOMMU, set the dma_mask to 40-bit so
that pci_map_xxx() calls will map the DMA address below 40 bits if
necessary. On 64-bit systems without IOMMU, set the dma_mask to
64-bit and check for DMA addresses exceeding the limit in
tg3_start_xmit().
5788 only supports 32-bit DMA so need to set the mask appropriately
also.
Thanks to Chris Elmquist at SGI for reporting and helping to debug
the problem on 5714.
Thanks to David Miller for explaining the HIGHMEM and DMA stuff.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cornelia Huck [Mon, 6 Mar 2006 23:43:02 +0000 (15:43 -0800)]
[PATCH] s390: improve response code handling in chsc_enable_facility()
Rather than checking for some known failures, check positively for the
success response code 0x0001 and return -EIO for unrecognized failure
response codes.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Greg Smith <gsmith@nc.rr.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Mon, 6 Mar 2006 23:42:58 +0000 (15:42 -0800)]
[PATCH] smaps: shared fix
The point of the smaps "shared" is to count the number of pages that are
mapped by more than one process, according to Mauricio Lin. However, smaps
uses page_count for this, so it will return a false positive for every page
that is mapped by just that one process, which is also in pagecache or
swapcache. There are false positive situations for anonymous pages not in
swapcache as well: - page reclaim, migration - get_user_pages (eg.
direct-io, ptrace)
Use page_mapcount instead, to count the number of mappings to the page.
Use vm_normal_page so that weird things like /dev/mem aren't counted either.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Peter Staubach [Mon, 6 Mar 2006 23:42:56 +0000 (15:42 -0800)]
[PATCH] ramfs needs to update directory m/ctime on symlink
ramfs neglects to update the directory mtime and ctime fields when creating
a new symbolic link. Ramfs was modified in 2.6.15 to update these fields
when other types of entries are created. The symlink support is separate
from that other support, so that change did not cover quite all of the
possibilities.
All of the directory content manipulation entry points now seem to be
covered with respect to these time field updates.
Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the format of numa_maps to be more compact and contain additional
information that is useful for managing and troubleshooting memory on a
NUMA system. Numa_maps can now also support huge pages.
Fixes:
1. More compact format. Only display fields if they contain additional
information.
2. Always display information for all vmas. The old numa_maps did not display
vma with no mapped entries. This was a bit confusing because page
migration removes ptes for file backed vmas. After page migration
a part of the vmas vanished.
3. Rename maxref to maxmap. This is the maximum mapcount of all the pages
in a vma and may be used as an indicator as to how many processes
may be using a certain vma.
4. Include the ability to scan over huge page vmas.
New items shown:
dirty
Number of pages in a vma that have either the dirty bit set in the
page_struct or in the pte.
file=<filename>
The file backing the pages if any
stack
Stack area
heap
Heap area
huge
Huge page area. The number of pages shows is the number of huge
pages not the regular sized pages.
swapcache
Number of pages with swap references. Must be >0 in order to
be shown.
active
Number of active pages. Only displayed if different from the number
of pages mapped.
writeback
Number of pages under writeback. Only displayed if >0.
Jack Steiner [Mon, 6 Mar 2006 23:42:50 +0000 (15:42 -0800)]
[PATCH] Increase max kmalloc size for very large systems
Systems with extemely large numbers of nodes or cpus need to kmalloc
structures larger than is currently supported. This patch increases the
maximum supported size for very large systems.
This patch should have no effect on current systems.
(akpm: why not just use alloc_pages() for sysfs_cpus?)
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Mon, 6 Mar 2006 23:42:47 +0000 (15:42 -0800)]
[PATCH] add missing pm_power_off's
Add the missing pm_power_off's for the h8300, v850 and xtensa
architectures.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tony Lindgren [Mon, 6 Mar 2006 23:42:45 +0000 (15:42 -0800)]
[PATCH] fix next_timer_interrupt() for hrtimer
Also from Thomas Gleixner <tglx@linutronix.de>
Function next_timer_interrupt() got broken with a recent patch 6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to
hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
tree for next event.
Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
VST) implementations, as the system can be in idle when next hrtimer event
was supposed to happen. At least ARM and S390 currently use
next_timer_interrupt().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Atsushi Nemoto [Mon, 6 Mar 2006 23:42:42 +0000 (15:42 -0800)]
[PATCH] x86: fix potential jiffies overflow in timer_resume()
i386 timer_resume is updating jiffies, not jiffies_64. It looks there is a
potential overflow problem. And jiffies_64 and wall_jiffies should be
protected by xtime_lock.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Karsten Keil [Mon, 6 Mar 2006 23:42:39 +0000 (15:42 -0800)]
[PATCH] i4l: fix refcounting problem with ttyIx devices
If the same ttyIx device was opened by two processes the module was not
released and so the usage count went never to zero again. This oneliner fixes
the issue.
Signed-off-by: Oskar Senft <o.senft@sirrix.com> Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Johnson [Mon, 6 Mar 2006 23:42:36 +0000 (15:42 -0800)]
[PATCH] cramfs mounts provide corrupted content since 2.6.15
Fix handling of cramfs images created by util-linux containing empty
regular files. Images created by cramfstools 1.x were ok.
Fill out inode contents in cramfs_iget5_set() instead of get_cramfs_inode()
to prevent issues if cramfs_iget5_test() is called with I_LOCK|I_NEW still
set.
Signed-off-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Cc: Olaf Hering <olh@suse.de> Cc: Chris Mason <mason@suse.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Tue, 7 Mar 2006 01:41:44 +0000 (17:41 -0800)]
Allocate 96 bytes for SCSI sense data reply
The SCSI layer uses SCSI_SENSE_BUFFERSIZE (96) for the sense buffer
size, even though some other code uses "sizeof(struct request_sense)"
(which is 64 bytes). Allocate the buffer using the bigger of the two
for safety.
Linus Torvalds [Tue, 7 Mar 2006 01:38:49 +0000 (17:38 -0800)]
Add early-boot-safety check to cond_resched()
Just to be safe, we should not trigger a conditional reschedule during
the early boot sequence. We've historically done some questionable
early on, and the safety warnings in __might_sleep() are generally
turned off during that period, so there might be problems lurking.
This affects CONFIG_PREEMPT_VOLUNTARY, which takes over might_sleep() to
cause a voluntary conditional reschedule.
[PATCH] USB Serial: fix use-after-free bug in usb-serial core
This fixes a use-after-free bug in the usb-serial core. It is simple to
trigger this (open a usb-serial port, then yank the device out before
closing the port.) Thanks to Stefan Seyfried <seife@suse.de> for
reporting this, and to the slab debugging code which enabled it to be
tracked down.
Linus Torvalds [Mon, 6 Mar 2006 20:10:07 +0000 (12:10 -0800)]
Fix "check_slabp" printout size calculation
We want to use the "struct slab" size, not the size of the pointer to
same. As it is, we'd not print out the last <n> entry pointers in the
slab (where <n> is ~10, depending on whether it's a 32-bit or 64-bit
kernel).
Gaah, that slab code was written by somebody who likes unreadable crud.
Russell King [Sun, 5 Mar 2006 00:31:22 +0000 (00:31 +0000)]
[SERIAL] Fix two bugs in parport_serial
Steinar H. Gunderson reported:
- For some reason, it detects the 9845 as a 9735 -- it appears this is
simply related to the ordering in parport_serial_pci_tbl[]. If we move
the 9845 up above the 9735, it prints out 9710:9845, but no change in
behaviour. (We didn't find out why this was the case; we left it alone
since it didn't affect our problem.)
- The card has no parallel port (at least no physical ones), yet it reports
(via its subsystem ID of 0x0014) one parallel port and four serial ports.
The probe for the parallel port fails, and the driver just aborts. Thus,
it doesn't find the serial ports.
Fix the debugging code to use dev_dbg, but don't bother displaying the
PCI ID of the detected board (that's accessible via other means.)
Also, arrange for parport_register() to return 0 even if it finds no
ports.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>