Hugh Dickins [Tue, 29 Nov 2005 16:55:48 +0000 (16:55 +0000)]
[PATCH] pfnmap: do_no_page BUG_ON again
Use copy_user_highpage directly instead of cow_user_page in do_no_page:
in the immediately following page_cache_release, and elsewhere, it is
assuming that new_page is normal. If any VM_PFNMAP driver can get to
do_no_page, it's just a BUG (but not in the case of do_anonymous_page).
David S. Miller [Tue, 29 Nov 2005 21:59:03 +0000 (13:59 -0800)]
[SPARC64]: Fix >8K I/O mappings.
Increment the PFN field of the PTE so that the tests
on vm_pfn in mm/memory.c match up. The TLB ignores these
lower bits for larger page sizes, so it's OK to set things
like this.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 29 Nov 2005 21:01:56 +0000 (13:01 -0800)]
Support strange discontiguous PFN remappings
These get created by some drivers that don't generally even want a pfn
remapping at all, but would really mostly prefer to just map pages
they've allocated individually instead.
For now, create a helper function that turns such an incomplete PFN
remapping call into a loop that does that explicit mapping. In the long
run we almost certainly want to export a totally different interface for
that, though.
Ben Collins [Tue, 29 Nov 2005 19:45:26 +0000 (11:45 -0800)]
[PATCH] Fix missing pfn variables caused by vm changes
I image this showed up because of "unused var..." when the changes
occured, because flush_cache_page() is a noop in most places. This
showed up for me on parisc however, where flush_cache_page() is a real
function.
Adrian Bunk [Tue, 29 Nov 2005 14:49:38 +0000 (14:49 +0000)]
[MTD] Make functions static, include header files with prototypes
This patch contains the following possible cleanups:
- every file should #include the headers containing the prototypes for
it's global functions
- make needlessly global functions static
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Richard Purdie [Tue, 29 Nov 2005 14:28:31 +0000 (14:28 +0000)]
[MTD] chips: make sharps driver usable again
Update the pre-CFI Sharp driver sharps.c so it compiles. map_read32 /
map_write32 no longer exist in the kernel so the driver is totally broken
as it stands. The replacement functions use different parameters resulting
in the other changes.
Change collie to use this driver until someone works out why the cfi driver
fails on that machine.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Tested-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Woodhouse [Thu, 17 Nov 2005 08:20:31 +0000 (08:20 +0000)]
[MTD] Make some tables 'const' so they can live in .rodata
arjan: drivers/mtd/maps/sc520cdp.c:167: warning: par_table is never written to and should be declared 'const'
arjan: drivers/mtd/maps/pci.c:105: warning: mtd_pci_map is never written to and should be declared 'const'
arjan: mind fixing those up ?
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Todd Poynor [Tue, 15 Nov 2005 23:28:20 +0000 (23:28 +0000)]
[MTD] CFI: Use 16-bit access to autoselect/read device id data
Recent models of Intel/Sharp and Spansion CFI flash now have significant
bits in the upper byte of device ID codes, read via what Spansion calls
"autoselect" and Intel calls "read device identifier". Currently these
values are truncated to the low 8 bits in the mtd data structures, as
all CFI read query info has previously been read one byte at a time.
Add a new method for reading 16-bit info, currently just manufacturer
and device codes; datasheets hint at future uses for upper bytes in
other fields.
Signed-off-by: Todd Poynor <tpoynor@mvista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[IA64] Remove getting break_num by decoding instruction
break.b always sets cr.iim to 0 and the current code tries to
get the break_num by decoding instruction. However, their
seems to be a race condition while reading the regs->cr_iip,
as on other cpu the break.b at regs->cr_iip might have been
replaced with the original instruction as a result of
unregister_kprobe() and hence decoding instruction to
obtain break_num will result in wrong value in this case.
Also includes changes to kprobes.c which now has to handle
break number zero.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Dean Roe [Wed, 9 Nov 2005 20:25:06 +0000 (14:25 -0600)]
[IA64] - Make pfn_valid more precise for SGI Altix systems
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel. pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().
Signed-off-by: Dean Roe <roe@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Thomas Gleixner [Tue, 29 Nov 2005 15:57:17 +0000 (16:57 +0100)]
[JFFS2] Fix the slab cache constructor of 'struct jffs2_inode_info' objects.
JFFS2 initialize f->sem mutex as "locked" in the slab constructor which is a
bug. Objects are freed with unlocked f->sem mutex. So, when they allocated
again, f->sem is unlocked because the slab cache constructor is not called for
them. The constructor is called only once when memory pages are allocated for
objects (namely, when the slab layer allocates new slabs). So, sometimes
'struct jffs2_inode_info' are allocated with unlocked f->sem, sometimes with
locked. This is a bug. Instead, initialize f->sem as unlocked in the
constructor. I.e., in the "constructed" state f->sem must be unlocked.
From: Keijiro Yano <keijiro_yano@yahoo.co.jp> Acked-by: Artem B. Bityutskiy <dedekind@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Komal Shah [Tue, 29 Nov 2005 02:37:15 +0000 (18:37 -0800)]
[PATCH] ARM: OMAP: camera: build infrastructure
I have attached patch which updates omap camera build infrastructure.
Making a place for OMAP2 stuff. If we are agree on this then defconfigs
needs update, which I can submit next.
Tested with building h3 and h2 defconfigs.
Signed-off-by: Komal Shah <komal_shah802003@yahoo.com>
Brian Swetland [Tue, 29 Nov 2005 02:11:30 +0000 (18:11 -0800)]
[PATCH] ARM: OMAP: mux: simplify MUX_CFG_730 and add some usb client pins
As I continue to try to sort out USB gadget support on the 730, I find
that I need to add some pin definitions.
Since the 730 lacks PU_PD registers, and the pullup bit is always in
the config register on the 730, I simplified the MUX_CFG_730() macro
a little bit (to avoid typing a bunch of 0s and NAs on *every* line).
Andrea Arcangeli [Mon, 28 Nov 2005 21:44:15 +0000 (13:44 -0800)]
[PATCH] shrinker->nr = LONG_MAX means deadlock for icache
With Andrew Morton <akpm@osdl.org>
The slab scanning code tries to balance the scanning rate of slabs versus the
scanning rate of LRU pages. To do this, it retains state concerning how many
slabs have been scanned - if a particular slab shrinker didn't scan enough
objects, we remember that for next time, and scan more objects on the next
pass.
The problem with this is that with (say) a huge number of GFP_NOIO
direct-reclaim attempts, the number of objects which are to be scanned when we
finally get a GFP_KERNEL request can be huge. Because some shrinker handlers
just bail out if !__GFP_FS.
So the patch clamps the number of objects-to-be-scanned to 2* the total number
of objects in the slab cache.
Signed-off-by: Andrea Arcangeli <andrea@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 28 Nov 2005 21:44:13 +0000 (13:44 -0800)]
[PATCH] md: fix --re-add for raid1 and raid6
If you have an array with a write-intent-bitmap, and you remove a device, then
re-add it, a full recovery isn't needed. We detect a re-add by looking at
saved_raid_disk. For raid1, it doesn't matter which disk it was, only whether
or not it was an active device. The old code being removed set a value of
'mirror' which was then ignored, so it can go. The changed code performs the
correct check.
For raid6, if there are two missing devices, make sure we chose the right slot
on --re-add rather than always the first slot.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 28 Nov 2005 21:44:12 +0000 (13:44 -0800)]
[PATCH] md: set default_bitmap_offset properly in set_array_info
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 28 Nov 2005 21:44:11 +0000 (13:44 -0800)]
[PATCH] md: fix problem with raid6 intent bitmap
When doing a recovery, we need to know whether the array will still be
degraded after the recovery has finished, so we can know whether bits can be
clearred yet or not. This patch performs the required check.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 28 Nov 2005 21:44:09 +0000 (13:44 -0800)]
[PATCH] md: improve read speed to raid10 arrays using 'far copies'
raid10 has two different layouts. One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices). The point
of far-copies is that it allows the first section (normally first half)
to be layed out in normal raid0 style, and thus provide raid0 sequential
read performance.
Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance. So
turn off that bad bit of read_balance for far-copies arrays.
With this patch, read speed of an 'f2' array is comparable with a raid0
with the same number of devices, though write speed is ofcourse still
very slow.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rik van Riel [Mon, 28 Nov 2005 21:44:07 +0000 (13:44 -0800)]
[PATCH] temporarily disable swap token on memory pressure
Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.
The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.
Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.
This patch resolves the problem Zwane ran into.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Mon, 28 Nov 2005 21:44:05 +0000 (13:44 -0800)]
[PATCH] cpuset fork locking fix
Move the cpuset_fork() call below the write_unlock_irq call in
kernel/fork.c copy_process().
Since the cpuset-dual-semaphore-locking-overhaul.patch, the cpuset_fork()
routine acquires task_lock(), so cannot be called while holding the
tasklist_lock for write.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now when pages_low is reached, we want to kick asynch reclaim, which gives us
an interval of "b" before we must start synch reclaim, and gives kswapd an
interval of "a" before it need go back to sleep.
When pages_min is reached, normal allocators must enter synch reclaim, but
PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie. atomic allocations, recursive
allocations, etc.) get access to varying amounts of the reserve "c".
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ext3: Wrong return value for EXT3_IOC_GROUP_ADD
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it
fails due to the presence of multiple resizers at the filesystem.
The problem is a little bit more serious than a wrong return value in this
case, since the clause err=0 in the exit_journal path will lead to a call
to update_backups which in turns causes a NULL pointer dereference.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hirokazu Takata [Mon, 28 Nov 2005 21:43:58 +0000 (13:43 -0800)]
[PATCH] m32r: Fix sys_tas() syscall
This patch fixes a deadlock problem of the m32r SMP kernel.
In the m32r kernel, sys_tas() system call is provided as a test-and-set
function for userspace, for backward compatibility.
In some multi-threading application program, deadlocks were rarely caused
at sys_tas() funcion. Such a deadlock was caused due to a collision of
__pthread_lock() and __pthread_unlock() operations.
The "tas" syscall is repeatedly called by pthread_mutex_lock() to get a
lock, while a lock variable's value is not 0. On the other hand,
pthead_mutex_unlock() sets the lock variable to 0 for unlocking.
In the previous implementation of sys_tas() routine, there was a
possibility that a unlock operation was ignored in the following case:
- Assume a lock variable (*addr) was equal to 1 before sys_tas() execution.
- __pthread_unlock() operation is executed by the other processor
and the lock variable (*addr) is set to 0, between a read operation
("oldval = *addr;") and the following write operation ("*addr = 1;")
during a execution of sys_tas().
In this case, the following write operation ("*addr = 1;") overwrites the
__pthread_unlock() result, and sys_tas() fails to get a lock in the next
turn and after that.
According to the attatched patch, sys_tas() returns 0 value in the next
turn and deadlocks never happen.
Ben Collins [Mon, 28 Nov 2005 21:43:56 +0000 (13:43 -0800)]
[PATCH] Fix hardcoded cpu=0 in workqueue for per_cpu_ptr() calls
Tracked this down on an Ultra Enterprise 3000. It's a 6-way machine. Odd
thing about this machine (and it's good for finding bugs like this) is that
the CPU id's are not 0 based. For instance, on my machine the CPU's are
6/7/10/11/14/15.
This caused some NULL pointer dereference in kernel/workqueue.c because for
single_threaded workqueue's, it hardcoded the cpu to 0.
I changed the 0's to any_online_cpu(cpu_online_mask), which cpumask.h
claims is "First cpu in mask". So this fits the same usage.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Drokin [Mon, 28 Nov 2005 21:43:53 +0000 (13:43 -0800)]
[PATCH] reiserfs: fix 32-bit overflow in map_block_for_writepage()
I now see another overflow in reiserfs that should lead to data corruptions
with files that are bigger than 4G under certain circumstances when using
mmap.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove bogus usage of test/set_bit() from fbcon rotation code and just
manipulate the bits directly. This fixes an oops on powerpc among others
and should be faster. Seems to work fine on the G5 here.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Mon, 28 Nov 2005 21:43:47 +0000 (13:43 -0800)]
[PATCH] memory_sysdev_class is static
So don't define it as extern in the header file.
drivers/base/memory.c:28: error: static declaration of 'memory_sysdev_class' follows non-static declaration
include/linux/memory.h:88: error: previous declaration of 'memory_sysdev_class' was here
Ashok Raj [Mon, 28 Nov 2005 21:43:46 +0000 (13:43 -0800)]
[PATCH] clean up lock_cpu_hotplug() in cpufreq
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Stern [Mon, 28 Nov 2005 21:43:44 +0000 (13:43 -0800)]
[PATCH] Workaround for gcc 2.96 (undefined references)
LD .tmp_vmlinux1
mm/built-in.o(.text+0x100d6): In function `copy_page_range':
: undefined reference to `__pud_alloc'
mm/built-in.o(.text+0x1010b): In function `copy_page_range':
: undefined reference to `__pmd_alloc'
mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
: undefined reference to `__pud_alloc'
fs/built-in.o(.text+0xc930): In function `install_arg_page':
: undefined reference to `__pud_alloc'
make: *** [.tmp_vmlinux1] Error 1
Those missing references in mm/memory.c arise from this code in
include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
__PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
/*
* The following ifdef needed to get the 4level-fixup.h header to work.
* Remove it when 4level-fixup.h has been removed.
*/
#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
NULL: pud_offset(pgd, address);
}
With my configuration the pgd_none and pud_none routines are inlines
returning a constant 0. Apparently the old compiler avoids generating
calls to __pud_alloc and __pmd_alloc but still lists them as undefined
references in the module's symbol table.
I don't know which change caused this problem. I think it was added
somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
several 2.6.14-rc kernels without difficulty. However I can't point to an
individual culprit.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Mon, 28 Nov 2005 22:34:23 +0000 (14:34 -0800)]
mm: re-architect the VM_UNPAGED logic
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Trond Myklebust [Fri, 25 Nov 2005 22:10:11 +0000 (17:10 -0500)]
SUNRPC: Funny looking code in __rpc_purge_upcall
In __rpc_purge_upcall (net/sunrpc/rpc_pipe.c), the newer code to clean up
the in_upcall list has a typo.
Thanks to Vince Busam <vbusam@google.com> for spotting this!
Trond Myklebust [Fri, 25 Nov 2005 22:10:06 +0000 (17:10 -0500)]
NFS: Fix a spinlock recursion inside nfs_update_inode()
In cases where the server has gone insane, nfs_update_inode() may end
up calling nfs_invalidate_inode(), which again calls stuff that takes
the inode->i_lock that we're already holding.
In addition, given the sort of things we have in NFS these days that
need to be cleaned up on inode release, I'm not sure we should ever
be calling make_bad_inode().
Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
the caches, and marking the inode as being stale.
Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.