Luis Henriques [Tue, 24 Mar 2009 22:10:02 +0000 (22:10 +0000)]
sched: remove unused fields from struct rq
Impact: cleanup, new schedstat ABI
Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.
Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.
The schedtop tool has been updated to properly handle new version of
schedstat:
Jody McIntyre [Tue, 24 Mar 2009 20:00:28 +0000 (16:00 -0400)]
tracing: Documentation / sample code fixes for tracepoints
Fix the tracepoint documentation to refer to "tracepoint-sample"
instead of "tracepoint-example" to match what actually exists;
fix the directory, and clarify how to compile.
Change every instance of "example" in the sample tracepoint code
to "sample" for consistency.
Josh Stone [Tue, 24 Mar 2009 09:44:28 +0000 (09:44 +0000)]
net: Add dependent headers to trace/skb.h
The tracing header needs to include definitions for the macros used and
the types referenced. This lets automated tracing tools like SystemTap
make use of the tracepoint without any specific knowledge of its
meaning (leaving that to the user).
Signed-off-by: Josh Stone <jistone@redhat.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yinghai Lu [Tue, 24 Mar 2009 20:23:16 +0000 (13:23 -0700)]
x86: fix set_extra_move_desc calling
Impact: fix bug with irq-descriptor moving when logical flat
Rusty observed:
> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time. In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain. Is this right, or
> should we just set it to what the user said? Or as now, indicate that we're
> restricting it.
Eric pointed out that desc->affinity should be what the user requested,
if it is at all possible to honor the user space request.
This bug got introduced by commit 22f65d31b "x86: Update io_apic.c to use
new cpumask API".
Fix it by moving the masking to before the descriptor moving ...
Reported-by: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <49C94134.4000408@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Vorontsov [Mon, 16 Mar 2009 21:14:03 +0000 (00:14 +0300)]
sdhci: Add quirk for forcing maximum block size to 2048 bytes
FSL eSDHC controllers can support maximum block size up to 4096 bytes,
the MBL (Maximum Block Length) field in the capabilities register
extended by one bit, and is set to 0x3.
But the SDHCI core doesn't support blocks of 4096 bytes, and thus
forces blksz to the lowest value -- 512 bytes. With this patch we can
pin up the blksz to the maximum supported block size, i.e. 2048 bytes.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Anton Vorontsov [Mon, 16 Mar 2009 21:14:00 +0000 (00:14 +0300)]
sdhci: Add quirk for controllers that need small delays for PIO
Small udelay is needed to make eSDHC work in PIO mode. Without
the delay reading causes endless interrupt storm, and writing
corrupts data. The first guess would be that we must wait for
some bit in some register, but I didn't find any reliable bits
that change before and after the delay.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Anton Vorontsov [Mon, 16 Mar 2009 21:13:59 +0000 (00:13 +0300)]
sdhci: Add set_clock callback and a quirk for nonstandard clocks
FSL eSDHC hosts have incompatible register map to manage the SDCLK.
This patch adds set_clock callback so that drivers could overwrite
set_clock behaviour.
Similar patch[1] was posted by Ben Dooks, though in Ben's version the
callback is named change_clock, plus the patch has some unrelated bits
that makes the patch difficult to reuse.
[1] http://lkml.org/lkml/2008/12/2/160
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Ben Dooks [Mon, 16 Mar 2009 21:13:57 +0000 (00:13 +0300)]
sdhci: Add get_{max,timeout}_clock callbacks
Some controllers do not provide clock information in their capabilities
(in the Samsung case, it is because there are multiple clock sources
available to the controller). Add hooks to allow the system to supply
clock information.
p.s.
In the original Ben's patch there was a bug that makes sdhci_add_host()
return -ENODEV even if callbacks were specified. This is fixed now.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Anton Vorontsov [Mon, 16 Mar 2009 21:13:52 +0000 (00:13 +0300)]
sdhci: Add support for card-detection polling
This patch adds SDHCI_QUIRK_BROKEN_CARD_DETECTION quirk. When specified,
sdhci driver will set MMC_CAP_NEEDS_POLL MMC host capability, and won't
enable card insert/remove interrupts.
This is needed for hosts with unreliable card detection, such as FSL
eSDHC. The original eSDHC driver was tring to "debounce" card-detection
IRQs by reading present state and disabling particular interrupts. But
with this debouncing scheme I noticed that sometimes we miss card
insertion/removal events.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Anton Vorontsov [Mon, 16 Mar 2009 21:13:48 +0000 (00:13 +0300)]
sdhci: Split card-detection IRQs management from sdhci_init()
Card detection interrupts should be handled separately as they should
not be enabled before mmc_add_host() returns and should be disabled
before calling mmc_remove_host(). The same is for suspend and resume
routines.
sdhci_init() no longer enables card-detection irqs. Instead, two new
functions implemented: sdhci_enable_card_detection() and
sdhci_disable_card_detection().
New sdhci_reinit() call implemented to behave the same way as the old
sdhci_init().
Also, this patch implements and uses few new helpers to manage IRQs in
a more conveinient way, that is:
Anton Vorontsov [Mon, 16 Mar 2009 21:13:46 +0000 (00:13 +0300)]
sdhci: Add support for bus-specific IO memory accessors
Currently the SDHCI driver works with PCI accessors (write{l,b,w} and
read{l,b,w}).
With this patch drivers may change memory accessors, so that we can
support hosts with "weird" IO memory access requirments.
For example, in "FSL eSDHC" SDHCI hardware all registers are 32 bit
width, with big-endian addressing. That is, readb(0x2f) should turn
into readb(0x2c), and readw(0x2c) should be translated to
le16_to_cpu(readw(0x2e)).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Wolfgang Muees [Mon, 16 Mar 2009 11:23:03 +0000 (12:23 +0100)]
mmc_spi: adjust for delayed data token response
Some cards are not able to send the data token in time, but
miss the time frame for some bits(!). So synchronize to the
start of the token.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Data transfers on third OMAP3 MMC controller don't work
because DMA line numbers are only defined for MMC1 and MMC2.
Fix that and store line numbers in mmc_omap_host structure
to reduce code size.
Tested on OMAP3 pandora board.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Jarkko Lavinen [Thu, 12 Mar 2009 13:30:58 +0000 (15:30 +0200)]
omap_hsmmc: Disable SDBP at suspend
Turn off the bus power at suspend.
Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Juha Yrjola [Fri, 14 Nov 2008 13:22:00 +0000 (15:22 +0200)]
omap_hsmmc: Implement scatter-gather emulation
Instead of using the bounce buffer, using scatter-gather emulation
(as in the OMAP1/2 MMC driver) removes the need of one extra memory
copy and improves performance.
Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Adrian Hunter [Mon, 12 Jan 2009 14:13:08 +0000 (16:13 +0200)]
omap_hsmmc: Fix response type for busy after response
Some MMC commands result in the card becoming busy after
the response is received. This needs to be specified
for the omap_hsmmc host controller, which is what this
patch does. However, the effect is that some commands
with no data will cause a Transfer Complete (TC) interrupt
in addition to the Command Complete (CC) interrupt.
In order to deal with that, the irq handler has needed
a few changes also.
The benefit of this change is that the omap_hsmmc host
controller driver now waits for the TC interrupt while
the card is busy, so the mmc_block driver needs to poll
the card status just once instead of repeatedly.
i.e. the net result is more sleep and less cpu.
The command sequence for open-ended multi-block write
with DMA is now:
Issue write command CMD25
Receive CC interrupt
Data is sent
Receive TC interrupt (DMA is done)
Issue stop command CMD12
Receive CC interrupt
Card is busy
Receive TC interrupt
Card is now ready for next transfer
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
drivers/mmc/host/tmio_mmc.h: In function 'tmio_mmc_kmap_atomic':
drivers/mmc/host/tmio_mmc.h:147: error: implicit declaration of function 'kmap_atomic'
drivers/mmc/host/tmio_mmc.h:147: error: 'KM_BIO_SRC_IRQ' undeclared (first use in this function)
drivers/mmc/host/tmio_mmc.h: In function 'tmio_mmc_kunmap_atomic':
drivers/mmc/host/tmio_mmc.h:153: error: implicit declaration of function 'kunmap_atomic'
drivers/mmc/host/tmio_mmc.h:153: error: 'KM_BIO_SRC_IRQ' undeclared (first use in this function)
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Magnus Damm [Wed, 11 Mar 2009 12:59:03 +0000 (21:59 +0900)]
tmio_mmc: Fix use after free in remove()
Update the tmio_mmc code to call mmc_free_host() when
done using the private data. Without this fix the driver
frees memory and then keeps on using it as private data.
Signed-off-by: Magnus Damm <damm@opensource.se> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Wolfgang Muees [Wed, 11 Mar 2009 13:28:39 +0000 (14:28 +0100)]
mmc_spi: allow higher timeouts for SPI mode
Some SD cards have very high timeouts in SPI mode.
So adjust the timeouts from theory to practice.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Wolfgang Muees [Wed, 11 Mar 2009 13:17:43 +0000 (14:17 +0100)]
mmc_spi: wait more bytes for card response
Some cards are slower than the standard allows and need more
time to respond to a command. Max. observed number of bytes
was 12.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Wolfgang Muees [Wed, 11 Mar 2009 13:13:15 +0000 (14:13 +0100)]
mmc_spi: allow setting of spi mode 3
Allow the platform data structures to specify spi mode 3
(if there is a pullup on the clock line or the spi hardware
is not able to serve spi mode 0).
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Kim Kyuwon [Fri, 20 Feb 2009 12:10:08 +0000 (13:10 +0100)]
omap_hsmmc: Initialize hsmmc controller registers when resuming
Most registers lose its state when the processor wakes up from sleep state.
Thus registers should be initialized, when the processor wakes up. However the
current hsmmc 'resume' function doesn't consider this issue and finally makes
deadlock. So this patch fixes this problem.
Signed-off-by: Kim Kyuwon <chammoru@gmail.com> Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Adrian Hunter [Tue, 24 Feb 2009 12:48:16 +0000 (14:48 +0200)]
omap_hsmmc: do not re-power when powering off MMC
Remove code that turns MMC1 power back on after it
has been powered off (when the voltage is 1.8V).
The offending code is not necessary because the
host controller bus voltage is initialized to
3V when probing or resuming. Note that MMC powers up
with the highest voltage available (see mmc_power_up())
which will be 3V also.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Adrian Hunter [Wed, 11 Feb 2009 12:52:20 +0000 (14:52 +0200)]
mmc: Add Extended CSD register to debugfs
Extended CSD is a MMC card register. As increasingly interesting
fields are being added to Extended CSD, it is helpful to see its
value. Note that SD cards do not have an Extended CSD
register, so it is MMC only.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
It makes much more sense for the mmc bus driver and the mmc-block module to
share MODALIAS information so that they are linked automatically.
There is no real information of use in the MMC system at the current time.
All devices inserted require us to load the mmc-block device. Until such
time as useful parameters exist simply reflect the module linkage via
the module alias below:
mmc:block
Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Jorg Schummer [Thu, 19 Feb 2009 11:17:03 +0000 (13:17 +0200)]
mmc: delayed_work was never cancelled
The delayed work item mmc_host.detect is now cancelled before flushing
the work queue. This takes care of cases when delayed_work was scheduled
for mmc_host.detect, but not yet placed in the work queue.
Signed-off-by: Jorg Schummer <ext-jorg.2.schummer@nokia.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
On m68k:
| drivers/net/dnet.c: In function 'dnet_readw_mac':
| drivers/net/dnet.c:36: error: implicit declaration of function 'writel'
| drivers/net/dnet.c:43: error: implicit declaration of function 'readl'
| drivers/net/dnet.c: In function 'dnet_probe':
| drivers/net/dnet.c:873: error: implicit declaration of function 'ioremap'
| drivers/net/dnet.c:873: warning: assignment makes pointer from integer without a cast
| drivers/net/dnet.c:939: error: implicit declaration of function 'iounmap'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mason [Tue, 24 Mar 2009 14:24:31 +0000 (10:24 -0400)]
Btrfs: optimize fsyncs on old files
The fsync log has code to make sure all of the parents of a file are in the
log along with the file. It uses a minimal log of the parent directory
inodes, just enough to get the parent directory on disk.
If the transaction that originally created a file is fully on disk,
and the file hasn't been renamed or linked into other directories, we
can safely skip the parent directory walk. We know the file is on disk
somewhere and we can go ahead and just log that single file.
This is more important now because unrelated unlinks in the parent directory
might make us force a commit if we try to log the parent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 24 Mar 2009 14:24:20 +0000 (10:24 -0400)]
Btrfs: tree logging unlink/rename fixes
The tree logging code allows individual files or directories to be logged
without including operations on other files and directories in the FS.
It tries to commit the minimal set of changes to disk in order to
fsync the single file or directory that was sent to fsync or O_SYNC.
The tree logging code was allowing files and directories to be unlinked
if they were part of a rename operation where only one directory
in the rename was in the fsync log. This patch adds a few new rules
to the tree logging.
1) on rename or unlink, if the inode being unlinked isn't in the fsync
log, we must force a full commit before doing an fsync of the directory
where the unlink was done. The commit isn't done during the unlink,
but it is forced the next time we try to log the parent directory.
Solution: record transid of last unlink/rename per directory when the
directory wasn't already logged. For renames this is only done when
renaming to a different directory.
The fsync above will unlink the original some_dir without recording
it in its new location (foo2). After a crash, some_dir will be gone
unless the fsync of some_file forces a full commit
2) we must log any new names for any file or dir that is in the fsync
log. This way we make sure not to lose files that are unlinked during
the same transaction.
2a) we must log any new names for any file or dir during rename
when the directory they are being removed from was logged.
2a is actually the more important variant. Without the extra logging
a crash might unlink the old name without recreating the new one
3) after a crash, we must go through any directories with a link count
of zero and redo the rm -rf
mkdir f1/foo
normal commit
rm -rf f1/foo
fsync(f1)
The directory f1 was fully removed from the FS, but fsync was never
called on f1, only its parent dir. After a crash the rm -rf must
be replayed. This must be able to recurse down the entire
directory tree. The inode link count fixup code takes care of the
ugly details.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 24 Mar 2009 14:24:13 +0000 (10:24 -0400)]
Btrfs: Make sure i_nlink doesn't hit zero too soon during log replay
During log replay, inodes are copied from the log to the main filesystem
btrees. Sometimes they have a zero link count in the log but they actually
gain links during the replay or have some in the main btree.
This patch updates the link count to be at least one after copying the
inode out of the log. This makes sure the inode is deleted during an
iput while the rest of the replay code is still working on it.
The log replay has fixup code to make sure that link counts are correct
at the end of the replay, so we could use any non-zero number here and
it would work fine.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 16 Mar 2009 14:59:57 +0000 (10:59 -0400)]
Btrfs: limit balancing work while flushing delayed refs
The delayed reference mechanism is responsible for all updates to the
extent allocation trees, including those updates created while processing
the delayed references.
This commit tries to limit the amount of work that gets created during
the final run of delayed refs before a commit. It avoids cowing new blocks
unless it is required to finish the commit, and so it avoids new allocations
that were not really required.
The goal is to avoid infinite loops where we are always making more work
on the final run of delayed refs. Over the long term we'll make a
special log for the last delayed ref updates as well.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 15:41:46 +0000 (11:41 -0400)]
Btrfs: readahead checksums during btrfs_finish_ordered_io
This reads in blocks in the checksum btree before starting the
transaction in btrfs_finish_ordered_io. It makes it much more likely
we'll be able to do operations inside the transaction without
needing any btree reads, which limits transaction latencies overall.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 15:00:37 +0000 (11:00 -0400)]
Btrfs: leave btree locks spinning more often
btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
for the buffers it was dirtying. This may require a kmalloc and it
was not atomic. So, anyone who called btrfs_mark_buffer_dirty had to
set any btree locks they were holding to blocking first.
This commit changes dirty tracking for extent buffers to just use a flag
in the extent buffer. Now that we have one and only one extent buffer
per page, this can be safely done without losing dirty bits along the way.
This also introduces a path->leave_spinning flag that callers of
btrfs_search_slot can use to indicate they will properly deal with a
path returned where all the locks are spinning instead of blocking.
Many of the btree search callers now expect spinning paths,
resulting in better btree concurrency overall.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 00:12:45 +0000 (20:12 -0400)]
Btrfs: Only let very young transactions grow during commit
Commits are fairly expensive, and so btrfs has code to sit around for a while
during the commit and let new writers come in.
But, while we're sitting there, new delayed refs might be added, and those
can be expensive to process as well. Unless the transaction is very very
young, it makes sense to go ahead and let the commit finish without hanging
around.
The commit grow loop isn't as important as it used to be, the fsync logging
code handles most performance critical syncs now.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 00:12:45 +0000 (20:12 -0400)]
Btrfs: reduce stack in cow_file_range
The fs/btrfs/inode.c code to run delayed allocation during writout
needed some stack usage optimization. This is the first pass, it does
the check for compression earlier on, which allows us to do the common
(no compression) case higher up in the call chain.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 00:12:45 +0000 (20:12 -0400)]
Btrfs: reduce stalls during transaction commit
To avoid deadlocks and reduce latencies during some critical operations, some
transaction writers are allowed to jump into the running transaction and make
it run a little longer, while others sit around and wait for the commit to
finish.
This is a bit unfair, especially when the callers that jump in do a bunch
of IO that makes all the others procs on the box wait. This commit
reduces the stalls this produces by pre-reading file extent pointers
during btrfs_finish_ordered_io before the transaction is joined.
It also tunes the drop_snapshot code to politely wait for transactions
that have started writing out their delayed refs to finish. This avoids
new delayed refs being flooded into the queue while we're trying to
close off the transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 14:17:05 +0000 (10:17 -0400)]
Btrfs: process the delayed reference queue in clusters
The delayed reference queue maintains pending operations that need to
be done to the extent allocation tree. These are processed by
finding records in the tree that are not currently being processed one at
a time.
This is slow because it uses lots of time searching through the rbtree
and because it creates lock contention on the extent allocation tree
when lots of different procs are running delayed refs at the same time.
This commit changes things to grab a cluster of refs for processing,
using a cursor into the rbtree as the starting point of the next search.
This way we walk smoothly through the rbtree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 14:11:24 +0000 (10:11 -0400)]
Btrfs: try to cleanup delayed refs while freeing extents
When extents are freed, it is likely that we've removed the last
delayed reference update for the extent. This checks the delayed
ref tree when things are freed, and if no ref updates area left it
immediately processes the delayed ref.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 14:10:06 +0000 (10:10 -0400)]
Btrfs: do extent allocation and reference count updates in the background
The extent allocation tree maintains a reference count and full
back reference information for every extent allocated in the
filesystem. For subvolume and snapshot trees, every time
a block goes through COW, the new copy of the block adds a reference
on every block it points to.
If a btree node points to 150 leaves, then the COW code needs to go
and add backrefs on 150 different extents, which might be spread all
over the extent allocation tree.
These updates currently happen during btrfs_cow_block, and most COWs
happen during btrfs_search_slot. btrfs_search_slot has locks held
on both the parent and the node we are COWing, and so we really want
to avoid IO during the COW if we can.
This commit adds an rbtree of pending reference count updates and extent
allocations. The tree is ordered by byte number of the extent and byte number
of the parent for the back reference. The tree allows us to:
1) Modify back references in something close to disk order, reducing seeks
2) Significantly reduce the number of modifications made as block pointers
are balanced around
3) Do all of the extent insertion and back reference modifications outside
of the performance critical btrfs_search_slot code.
#3 has the added benefit of greatly reducing the btrfs stack footprint.
The extent allocation tree modifications are done without the deep
(and somewhat recursive) call chains used in the past.
These delayed back reference updates must be done before the transaction
commits, and so the rbtree is tied to the transaction. Throttling is
implemented to help keep the queue of backrefs at a reasonable size.
Since there was a similar mechanism in place for the extent tree
extents, that is removed and replaced by the delayed reference tree.
Yan Zheng <yan.zheng@oracle.com> helped review and fixup this code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 13 Mar 2009 14:24:59 +0000 (10:24 -0400)]
Btrfs: don't preallocate metadata blocks during btrfs_search_slot
In order to avoid doing expensive extent management with tree locks held,
btrfs_search_slot will preallocate tree blocks for use by COW without
any tree locks held.
A later commit moves all of the extent allocation work for COW into
a delayed update mechanism, and this preallocation will no longer be
required.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Stefan Richter [Thu, 5 Mar 2009 18:13:43 +0000 (19:13 +0100)]
DVB: firedtv: fix printk format mismatch
Eliminate
drivers/media/dvb/firewire/firedtv-avc.c: In function 'debug_fcp':
drivers/media/dvb/firewire/firedtv-avc.c:156: warning: format '%d' expects type 'int', but argument 5 has type 'size_t'
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:09:28 +0000 (21:09 +0100)]
firewire: core: optimize propagation of BROADCAST_CHANNEL
Cache the test result of whether a device implements BROADCAST_CHANNEL.
This minimizes traffic on the bus after each bus reset. A majority of
devices does not implement BROADCAST_CHANNEL.
Remove busy retries; just rely on the hardware to retry requests to busy
responders. Remove unnecessary log messages.
Rename the flag is_irm to broadcast_channel_allocated to better reflect
its meaning. Reset the flag earlier in fw_core_handle_bus_reset.
Pass the generation down as a call parameter; that way generation can't
be newer than card->broadcast_channel_allocated and device->node_id.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:07:46 +0000 (21:07 +0100)]
firewire: core: increase bus manager grace period
Per IEEE 1394 clause 8.4.2.5, bus manager capable nodes which are not
incumbent shall wait at least 125ms before trying to establish
themselves as bus manager.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:02:21 +0000 (21:02 +0100)]
firewire: cdev: add closure to async stream ioctl
This changes the as yet unreleased FW_CDEV_IOC_SEND_STREAM_PACKET ioctl
to generate an fw_cdev_event_response event just like the other two
ioctls for asynchronous request transmission do. This way, clients get
feedback on successful or unsuccessful transmission.
This also adds input validation for length, tag, channel, sy, speed.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:01:54 +0000 (21:01 +0100)]
firewire: cdev: simplify FW_CDEV_IOC_SEND_REQUEST return value
This changes the ioctl() return value of FW_CDEV_IOC_SEND_REQUEST and of
the as yet unreleased FW_CDEV_IOC_SEND_BROADCAST_REQUEST. They used to
return
sizeof(struct fw_cdev_send_request *) + data_length
which is obviously a failed attempt to emulate the return value of
raw1394's respective interface which uses write() instead of ioctl().
However, the first summand, as size of a kernel pointer, is entirely
meaningless to clients and the second summand is already known to
clients. And the result does not resemble raw1394's write() return
code anyway.
So simplify it to a constant non-negative value, i.e. 0. The only
dangers here would be that future client implementations check for error
by ret != 0 instead of ret < 0 when running on top of an old kernel; or
that current clients interpret ret = 0 or more as failure. But both are
hypothetical cases which don't justify to return irritating values.
While we touch this code, also remove "& 0x1f" from tcode in the call of
fw_send_request. The tcode cannot be bigger than 0x1f at this point.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:01:08 +0000 (21:01 +0100)]
firewire: cdev: fix race of ioctl_send_request with bus reset
The bus reset handler concurrently frees client->device->node. Use
device->node_id instead. This is equivalent to device->node->node_id
while device->generation is current.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 10 Mar 2009 20:00:23 +0000 (21:00 +0100)]
firewire: cdev: secure add_descriptor ioctl
The access permissions and ownership or ACL of /dev/fw* character device
files will typically be set based on the device type of the respective
nodes, as obtained by firewire-core from descriptors in the device's
configuration ROM. An example policy is to deny write permission by
default but grant write permission to files of AV/C video and audio
devices and IIDC video devices.
The FW_CDEV_IOC_ADD_DESCRIPTOR ioctl could be used to partly subvert
such a policy: Find a device file with relaxed permissions, use the
ioctl to add a descriptor with AV/C marker to the local node's ROM, thus
gain access to the local node's character device file. (This is only
possible if there are udev scripts installed which actively relax
permissions for known device types and if there is a device of such a
type connected.)
Accessibility of the local node's device file is relevant to host
security if the host contains two or more IEEE 1394 link layer
controllers which are plugged into a single bus.
Therefore change the ABI to deny FW_CDEV_IOC_ADD_DESCRIPTOR if the file
belongs to a remote node. (This change has no impact on known
implementers of the ABI: None of them uses the ioctl yet.)
Also clarify the documentation: The ioctl affects all local nodes, not
just one local node.
Cc: stable@kernel.org Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Jay Fenlason [Mon, 23 Feb 2009 20:59:34 +0000 (15:59 -0500)]
firewire: broadcast channel support
This patch adds the ISO broadcast channel support that is required of a
1394a IRM. In specific, if the local device the IRM, it allocates ISO
channel 31 and sets the broadcast channel register of all devices on the
local bus to BROADCAST_CHANNEL_INITIAL | BROADCAST_CHANNEL_VALID to indicate
that channel 31 can be use for broadcast messages.
One minor complication is that on startup the local device may become IRM
before all the devices on the bus have been enumerated by the stack. Therefore
we have to keep a "the local device is IRM" flag and possibly set the
broadcast channel register of new devices at enumeration time.
Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Allow userspace and other firewire drivers (fw-ipv4 I'm looking at
you!) to send Asynchronous Transmit Streams as described in 7.8.3 of
release 1.1 of the 1394 Open Host Controller Interface Specification.
Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (tweaks)
Stefan Richter [Tue, 3 Feb 2009 16:55:19 +0000 (17:55 +0100)]
firewire: normalize a variable name
Standardize on if (err)
handle_error;
and if (ret < 0)
handle_error;
Don't call a variable err if we store values in it which mean success.
Also, offset some return statements by a blank line since this how we do
it in drivers/firewire.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 11 Jan 2009 12:44:46 +0000 (13:44 +0100)]
firewire: cdev: simplify a schedule_delayed_work wrapper
The kernel API documentation says that queue_delayed_work() returns 0
(only) if the work was already queued. The return codes of
schedule_delayed_work() are not documented but the same.
In init_iso_resource(), the work has never been queued yet, hence we
can assume schedule_delayed_work() to be a guaranteed success there.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Thu, 8 Jan 2009 22:07:40 +0000 (23:07 +0100)]
firewire: cdev: add ioctls for iso resource management, amendment
Some fixes:
- Remove stale documentation.
- Fix a != vs. == thinko that got in the way of channel management.
- Try bandwidth deallocation even if channel deallocation failed.
A simplification:
- fw_cdev_allocate_iso_resource.channels is now ordered like
libdc1394's dc1394_iso_allocate_channel() channels_allowed
argument.
By the way, I looked closer at cards from NEC, TI, and VIA, and noticed
that they all don't implement IEEE 1394a behaviour which is meant to
deviate from IEEE 1212's notion of lock compare-swap. This means that
we have to do two lock transactions instead of one in many cases where
one transaction would already succeed on a fully 1394a compliant IRM.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Necessary due to
Date: Tue, 22 Jul 2008 23:23:40 -0700
From: David Moore <dcm@acm.org>
Subject: firewire: Include iso timestamp in headers when header_size > 4
Side note: The lack of upwards compatibility sounds worse than it is.
All existing client implementations, libraw1394 and libdc1394, set
header_size = 4. And since the ABI v1 behaviour does not offer any
advantages over the new behaviour, we deliberately do not provide the
old behaviour anymore.
Also add documentation about the format of fw_cdev_get_cycle_timer which
may be used in conjunction with the timestamp of iso packets but has a
different format.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Mon, 5 Jan 2009 19:28:10 +0000 (20:28 +0100)]
firewire: cdev: shut down iso context before freeing the buffer
DMA must be halted before we DMA-unmap and free the DMA buffer. Since
we cannot rely on the client to stop the context before it closes the
fd, we have to reorder fw_iso_buffer_destroy vs. fw_iso_context_destroy.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Make the size check of ioctl_send_request and
ioctl_send_broadcast_request speed dependent. Also change the error
return code from -EINVAL to -EIO to distinguish this from other errors
concerning the ioctl parameters.
Another payload size limit for which we don't check here though is the
remote node's Bus_Info_Block.max_rec.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: restrict broadcast write requests to Units Space
We don't want random users write to Memory Space (e.g. PCs with physical
DMA filters down) or to core CSRs like Reset_Start.
This does not protect SBP-2 target CSRs. But properly behaving SBP-2
targets ignore broadcast write requests to these registers, and the
maximum damage which can happen with laxer targets is DOS. But there
are ways to create DOS situations anyway if there are devices with weak
device file permissions (like audio/video devices) present at the same
bus as an SBP-2 target.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
firewire: cdev: add ioctl for broadcast write requests
Write transactions to the broadcast node ID are a convenient way to
trigger functions of multiple nodes at once. IIDC is a protocol which
can make use of this if multiple cameras with same command_regs_base are
connected at the same bus.
Based on
Date: Wed, 10 Sep 2008 11:32:16 -0400
From: Jay Fenlason <fenlason@redhat.com>
Subject: [patch] SEND_BROADCAST_REQUEST
Changes: ioctl_send_request() and ioctl_send_broadcast_request() now
share code. Broadcast speed corrected to S100. Check for proper tcode.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: add ioctl to query maximum transmission speed
While the speed of asynchronous transactions is automatically chosen by
the kernel, the speed of isochronous streams has to be chosen by the
initiating client.
In case of 1394a bus topologies, the maximum possible speed could be
figured out with some effort by evaluation of the remote node's link
speed field in the config ROM, the local node's link speed field, and
the PHY speeds and topologic information in the local node's or IRM's
topology map CSR. However, this does not work in case of 1394b buses.
Hence add an ioctl to export the maximum speed which the kernel already
determined.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: add ioctls for manual iso resource management
This adds ioctls for allocation and deallocation of a channel or/and
bandwidth without auto-reallocation and without auto-deallocation.
The benefit of these ioctls is that libraw1394-style isochronous
resource management can be implemented without write access to the IRM's
character device file.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
firewire: cdev: add ioctls for isochronous resource management
Based on
Date: Tue, 18 Nov 2008 11:41:27 -0500
From: Jay Fenlason <fenlason@redhat.com>
Subject: [Patch V4] Add ISO resource management support
with several changes to the ABI and implementation. Only the part of
the ABI which enables auto-reallocation and auto-deallocation is
included here.
This implements ioctls for kernel-assisted allocation of isochronous
channels and isochronous bandwidth. The benefits are:
- The client does not have to have write access to the /dev/fw* device
corresponding to the IRM.
- The client does not have to perform reallocation after bus resets.
- Channel and bandwidth are deallocated by the kernel if the file is
closed before the client deallocated the resources. Thus resources
are released even if the client crashes.
It is anticipated that future in-kernel code (firewire-core IRM code;
the firewire port of firedtv), will use the fw-iso.c portions of this
code too.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: David Moore <dcm@acm.org>