Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.
There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.
This patch saves ~150kb of pinned kernel memory on a 32-bit box.
cfq hash is no more necessary. We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue. In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.
Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree
Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Changes by me:
- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too
It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.
Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.
For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.
Also spotted in the CFS CPU scheduler code.
Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.
cfq-iosched: rework the whole round-robin list concept
Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.
This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.
- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
a while to "think".
- Implement logic for detecting cooperating processes, so we
choose the best available queue whenever possible.
- Improve residual slice time accounting.
- Remove dead code: we no longer see async requests coming in on
sync queues. That part was removed a long time ago. That means
that we can also remove the difference between cfq_cfqq_sync()
and cfq_cfqq_class_sync(), they are now indentical. And we can
kill the on_dispatch array, just make it a counter.
- Allow a process to go into the current list, if it hasn't been
serviced in this scheduler tick yet.
Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.
Jens Axboe [Wed, 14 Feb 2007 18:59:49 +0000 (19:59 +0100)]
cfq-iosched: improve preemption for cooperating tasks
When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.
Paul Mackerras [Mon, 30 Apr 2007 03:03:39 +0000 (13:03 +1000)]
[POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c
Commit 404d5b185b4eb56d6fa2f7bd27833f8df1c38ce4 changed the definition
of dev_dbg in the !DEBUG case from being a #define to being a static
inline. There was code in drivers/ps3/vuart.c to do exactly that,
which fails to compile now. This fixes it by removing the redefinition,
as the redefinition is now superfluous.
Andrew Morton [Thu, 26 Apr 2007 07:07:05 +0000 (00:07 -0700)]
[POWERPC] ppc4xx_sgdma needs dma-mapping.h
For dma_alloc_*()
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
arch/powerpc/sysdev/timer.c:51: error: variable `timer_sysclass' has
initializer but incomplete type
arch/powerpc/sysdev/timer.c:52: error: unknown field `resume' specified in initializer
<etc>
Signed-off-by: Srinivasa Ds <srinivasa@in.ibm.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Adrian Bunk [Sat, 28 Apr 2007 19:19:56 +0000 (05:19 +1000)]
[POWERPC] Remove the unused HTDMSOUND driver
Recently, someone fixed a syntax error in the HTDMSOUND driver
introduced 4 years ago.
Unfortunately not by trying to compile this driver for his hardware but
by code inspection - which seems to be a strong indication that there
are no users left for this OSS sound driver.
This patch therefore removes it.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Dan Malek <dan@embeddedalley.com> Acked-by: Marcelo Tosatti <marcelo@kvack.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark A. Greer [Fri, 27 Apr 2007 20:48:24 +0000 (06:48 +1000)]
[POWERPC] Add dt_xlate_addr() to bootwrapper
dt_xlate_reg() looks up the 'reg' property in the specified node
to get the address and size to translate. Add dt_xlate_addr()
which is passed in the address and size to translate.
Signed-off-by: Mark A. Greer <mgreer@mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Fri, 27 Apr 2007 19:50:05 +0000 (05:50 +1000)]
[POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards.
Why create a platform specific board_info structure that is hacked
together, ugly, and dangerous, when we've got a perfectly fine common
board_info structure that is hacked-together, ugly and dangerous.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Fri, 27 Apr 2007 19:50:02 +0000 (05:50 +1000)]
[POWERPC] Stop using ppc_sys for Xilinx Virtex boards
The arch/ppc/syslib/ppc_sys.c infrastructure does not work well for the
virtex ports. Move the ml300 and ml403 board ports over to use the new
virtex_devices infrastructure.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Fri, 27 Apr 2007 19:50:01 +0000 (05:50 +1000)]
[POWERPC] New registration for common Xilinx Virtex ppc405 platform devices
Currently virtex support in mainline make use of the infrastructure in
arch/ppc/syslib/ppc_sys.c for registering common devices on virtex ppc405
platforms. The ppc_sys.c code is not well suited to the dynamic nature of
FPGA designs and makes adding new board ports more complex. This patch
adds a new listing of common devices which does not depend on the ppc_sys.c
infrastructure.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Fri, 27 Apr 2007 19:49:59 +0000 (05:49 +1000)]
[POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform
Reverse dependency order for Xilinx Virtex parts. For these parts, It
makes more sense for boards/chips to specify which features they
provide instead of the features listing the parts they are implemented
in. I think it also makes adding new board ports simpler.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Paul Mackerras <paulus@samba.org>
> Paul, please discard this patch. The optional graphics card may have
> also device_type 'serial' if it is in VGA mode.
> I will send an updated patch later.
Simon Arlott [Tue, 24 Apr 2007 22:44:57 +0000 (23:44 +0100)]
ieee1394: ohci1394: Fix mistake in printk message.
Fix the "attempting to setting" message in ohci1394.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 3 Apr 2007 21:55:40 +0000 (23:55 +0200)]
ieee1394: eth1394: allow MTU bigger than 1500
RFC 2734 says: "IP-capable nodes may operate with an MTU size larger
than the default [1500 octets], but the means by which a larger MTU is
configured are beyond the scope of this document."
Allow users to set an MTU bigger than 1500.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Jean Delvare [Sun, 1 Apr 2007 08:06:33 +0000 (10:06 +0200)]
ieee1394: eth1394: Move common recv_init code to helper function
There is some common code between ether1394_open and ether1394_add_host
which can be moved to a separate helper function for a slightly smaller
eth1394 driver (-160 bytes on i386.)
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Mon, 26 Mar 2007 23:36:50 +0000 (01:36 +0200)]
ieee1394: eth1394: don't autoload by hotplug when ohci1394 starts
Until now, ieee1394 put an IP-over-1394 capability entry into each new
host's config ROM. As soon as the controller was initialized --- i.e.
right after modprobe ohci1394 --- this entry triggered a hotplug event
which typically caused auto-loading of eth1394.
This irritated or annoyed many users and distributors. Of course they
could blacklist eth1394, but then ieee1394 wrongly advertized IP-over-
1394 capability to the FireWire bus.
Therefore
- remove the offending kernel config option
IEEE1394_CONFIG_ROM_IP1394,
- let eth1394 add the ROM entry by itself, i.e. only after eth1394 was
loaded.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=7793 .
To emulate the behaviour of older kernels, simply add the following to
to /etc/modprobe.conf:
Andrew Morton [Thu, 26 Apr 2007 07:16:04 +0000 (00:16 -0700)]
ieee1394: iso.c needs sched.h
alpha:
drivers/ieee1394/iso.c: In function 'hpsb_iso_xmit_sync':
drivers/ieee1394/iso.c:440: error: invalid use of undefined type 'struct task_struct'
drivers/ieee1394/iso.c:440: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function)
drivers/ieee1394/iso.c:440: error: (Each undeclared identifier is reported only once
drivers/ieee1394/iso.c:440: error: for each function it appears in.)
drivers/ieee1394/iso.c:440: warning: implicit declaration of function 'signal_pending'
drivers/ieee1394/iso.c:440: error: invalid use of undefined type 'struct task_struct'
drivers/ieee1394/iso.c:440: warning: implicit declaration of function 'schedule'
drivers/ieee1394/iso.c: In function 'hpsb_iso_wake':
drivers/ieee1394/iso.c:562: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (brought into alphabetic order)
Torsten Kaiser [Mon, 9 Apr 2007 19:03:15 +0000 (21:03 +0200)]
ieee1394: ieee1394_transactions needs sched.h
drivers/ieee1394/ieee1394_transactions.c fails for me if CONFIG_SMP=n
gcc complains:
CC drivers/ieee1394/ieee1394_transactions.o
drivers/ieee1394/ieee1394_transactions.c: In function 'hpsb_get_tlabel':
drivers/ieee1394/ieee1394_transactions.c:183: error:
'TASK_INTERRUPTIBLE' undeclared (first use in this function)
drivers/ieee1394/ieee1394_transactions.c:183: error: (Each undeclared
identifier is reported only once
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (added comment)
Stefan Richter [Sun, 11 Mar 2007 21:51:24 +0000 (22:51 +0100)]
ieee1394: replace vmalloc by kmalloc in csr1212
The biggest chunk ever allocated by CSR1212_MALLOC is 1024 Bytes +
sizeof(struct csr1212_csr_rom_cache) big. Most of the time much
smaller data structures are allocated. Therefore vmalloc is a waste.
The one exception is csr1212_append_new_cache() which is called to
append a chunk of CSR1212_EXTENDED_ROM_SIZE + sizeof(struct
csr1212_csr_rom_cache) if the currently allocated ROM cache is too
small. CSR1212_EXTENDED_ROM_SIZE is generously defined as 256 kBytes.
In SVN commit 1220, Steve Kinneberg lowered this to 2 kBytes in the
config_rom_2.4 branch. This same commit also switched CSR1212_MALLOC
from kmalloc to vmalloc in the SVN trunk branch:
> r1220 | kberg | 2004-05-31 01:51:44 +0200 (Mon, 31 May 2004) | 13 lines
>
> CSR1212 Extended ROM bug fixes:
> trunk line changes:
> - Use vmalloc instead of kmalloc
> - Change delayed_reset_bus() to operate in a work_queue instead of a
> timer interrupt.
> - Fix hpsb_allocate_and_register_addrspace() to not allocate space
> on top of already allocated space.
> - Fix problems in csr1212.c filling ConfigROM images when extend
> ROMs are present.
> config-rom-2.4 changes:
> - Changed extended rom allocation from 256K to 8K.
(It was actually 2 kB, not 8 kB.)
> - Fix hpsb_allocate_and_register_addrspace() to not allocate space
> on top of already allocated space.
> - Fix problems in csr1212.c filling ConfigROM images when extend
> ROMs are present.
I am now setting CSR1212_EXTENDED_ROM_SIZE to 2 kB minus the overhead of
struct csr1212_csr_rom_cache. Note, this code path is not used by the
in-kernel drivers though. raw1394 could trigger it, but the respective
libraw1394 functions don't exist yet.
Furthermore, userspace programs can replace the entire local ROM via
raw1394. If kmalloc does not fulfill their needs --- well, tough luck.
I decree that nobody needs such huge extended ROMs. (Extended ROMs are
defined by IEEE 1212 clause 7.7.18. The spec does not impose
practically relevant restrictions on the size of extended ROM chunks.)
Another potentially demanding use of CSR1212_MALLOC is if external
FireWire devices come with Extended ROM entries. If they are too big
for kmalloc (or have been too big for vmalloc) we just fail to read
their ROM. This is quite unlikely though, to my knowledge.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 11 Mar 2007 21:49:05 +0000 (22:49 +0100)]
ieee1394: drop csr1212's support for external compilation
csr1212 was written to be compiled either as part of the ieee1394 kernel
driver or of an anticipated IEEE 1212 userspace library. We now drop
support for the latter. The costs in terms of code footprint and depth
of abstraction are not countered by any actual benefit.
Also remove some obsolete #includes.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Andrew Morton [Mon, 23 Apr 2007 18:50:56 +0000 (11:50 -0700)]
ieee1394: sbp2: include fixes
drivers/ieee1394/sbp2.c: In function 'sbp2util_access_timeout':
drivers/ieee1394/sbp2.c:399: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function)
drivers/ieee1394/sbp2.c:399: error: (Each undeclared identifier is reported only once
drivers/ieee1394/sbp2.c:399: error: for each function it appears in.)
drivers/ieee1394/sbp2.c:399: warning: implicit declaration of function 'signal_pending'
drivers/ieee1394/sbp2.c:399: warning: implicit declaration of function 'schedule_timeout'
drivers/ieee1394/sbp2.c: In function 'sbp2_prep_command_orb_sg':
drivers/ieee1394/sbp2.c:1438: warning: implicit declaration of function 'page_address'
drivers/ieee1394/sbp2.c:1438: warning: passing argument 2 of 'dma_map_single' makes pointer from integer without a cast
drivers/ieee1394/sbp2.c: In function 'sbp2_handle_status_write':
drivers/ieee1394/sbp2.c:1842: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function)
Possibly due to changes in -mm, but this file should explicitly include the
headers for the stuff it uses.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (brought into alphabetic order)
Stefan Richter [Sun, 4 Feb 2007 19:54:57 +0000 (20:54 +0100)]
ieee1394: sbp2: optimize DMA direction of s/g tables
Unlike the name suggests, "cmd->scatter_gather_element" holds only the
s/g table, not the actual s/g elements. Since the table is only read
but never written by the device, DMA_BIDIRECTIONAL can be replaced by
DMA_TO_DEVICE which may be cheaper on some architectures.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 4 Feb 2007 19:25:43 +0000 (20:25 +0100)]
ieee1394: sbp2: enforce 32bit DMA mapping
In order to use OHCI-1394 physical DMA, all s/g elements, s/g tables,
ORBs, and response buffers have to reside within the first 4 GB of the
FireWire controller's physical address space. Set the correct mask for
DMA mappings.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>