www.pilppa.org Git - linux-2.6-omap-h63xx.git/log

sched: remove unused fields from struct rq

Impact: cleanup, new schedstat ABI

Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.

Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.

The schedtop tool has been updated to properly handle new version of
schedstat:

http://rt.wiki.kernel.org/index.php/Schedtop_utility

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090324221002.GA10061@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 into devel

tracing: Documentation / sample code fixes for tracepoints

Fix the tracepoint documentation to refer to "tracepoint-sample"
instead of "tracepoint-example" to match what actually exists;
fix the directory, and clarify how to compile.

Change every instance of "example" in the sample tracepoint code
to "sample" for consistency.

Signed-off-by: Jody McIntyre <scjody@sun.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: torvalds@linux-foundation.org
LKML-Reference: <20090324200027.GH8294@clouds>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: use default_cpu_mask_to_apicid for 64bit

Impact: cleanup

Use online_mask directly on 64bit too.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49C94DAE.9070300@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

net: Add dependent headers to trace/skb.h

The tracing header needs to include definitions for the macros used and
the types referenced. This lets automated tracing tools like SystemTap
make use of the tracepoint without any specific knowledge of its
meaning (leaving that to the user).

Signed-off-by: Josh Stone <jistone@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arp_tables: ifname_compare() can assume 16bit alignment

Arches without efficient unaligned access can still perform a loop
assuming 16bit alignment in ifname_compare()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86: fix set_extra_move_desc calling

Impact: fix bug with irq-descriptor moving when logical flat

Rusty observed:

> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time.  In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain.  Is this right, or
> should we just set it to what the user said?  Or as now, indicate that we're
> restricting it.

Eric pointed out that desc->affinity should be what the user requested,
if it is at all possible to honor the user space request.

This bug got introduced by commit 22f65d31b "x86: Update io_apic.c to use
new cpumask API".

Fix it by moving the masking to before the descriptor moving ...

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <49C94134.4000408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

netfilter: trivial Kconfig spelling fixes

Supplements commit 67c0d57930ff9a24c6c34abee1b01f7716a9b0e2.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

atmel-mci: fix sdc_reg typo

This fixes a bug when setting the sdc_reg for 4-bit bus width
transactions.

Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

tmio_mmc: add maintainer

This is Ian's baby, so make a note of it in MAINTAINERS.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: Add OpenFirmware bindings for SDHCI driver

This patch adds a new driver: sdhci-of. The driver is similar to
the sdhci-pci, it contains common probe code, and controller-specific
ops and quirks.

So far there are only Freescale eSDHC ops and quirks.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add quirk for forcing maximum block size to 2048 bytes

FSL eSDHC controllers can support maximum block size up to 4096 bytes,
the MBL (Maximum Block Length) field in the capabilities register
extended by one bit, and is set to 0x3.

But the SDHCI core doesn't support blocks of 4096 bytes, and thus
forces blksz to the lowest value -- 512 bytes. With this patch we can
pin up the blksz to the maximum supported block size, i.e. 2048 bytes.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add quirk for controllers that need IRQ re-init after reset

FSL eSDHC controllers losing signal/interrupt enable states after
reset, so we should re-enable them.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add quirk for controllers that need small delays for PIO

Small udelay is needed to make eSDHC work in PIO mode. Without
the delay reading causes endless interrupt storm, and writing
corrupts data. The first guess would be that we must wait for
some bit in some register, but I didn't find any reliable bits
that change before and after the delay.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add set_clock callback and a quirk for nonstandard clocks

FSL eSDHC hosts have incompatible register map to manage the SDCLK.
This patch adds set_clock callback so that drivers could overwrite
set_clock behaviour.

Similar patch[1] was posted by Ben Dooks, though in Ben's version the
callback is named change_clock, plus the patch has some unrelated bits
that makes the patch difficult to reuse.

[1] http://lkml.org/lkml/2008/12/2/160

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add get_{max,timeout}_clock callbacks

Some controllers do not provide clock information in their capabilities
(in the Samsung case, it is because there are multiple clock sources
available to the controller). Add hooks to allow the system to supply
clock information.

p.s.
In the original Ben's patch there was a bug that makes sdhci_add_host()
return -ENODEV even if callbacks were specified. This is fixed now.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add support for hosts reporting inverted write-protect state

This patch adds SDHCI_QUIRK_INVERTED_WRITE_PROTECT quirk. When
specified, the sdhci driver will invert WP state.

p.s. Actually, the quirk is more board-specific than
controller-specific.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add support for card-detection polling

This patch adds SDHCI_QUIRK_BROKEN_CARD_DETECTION quirk. When specified,
sdhci driver will set MMC_CAP_NEEDS_POLL MMC host capability, and won't
enable card insert/remove interrupts.

This is needed for hosts with unreliable card detection, such as FSL
eSDHC. The original eSDHC driver was tring to "debounce" card-detection
IRQs by reading present state and disabling particular interrupts. But
with this debouncing scheme I noticed that sometimes we miss card
insertion/removal events.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Enable only relevant (DMA/PIO) interrupts during transfers

Some hosts (that is, FSL eSDHC) throw PIO interrupts during DMA
transfers, this causes tons of unneeded interrupts, and thus highly
degraded speed.

This patch modifies the driver so that now we only enable relevant
(DMA or PIO) interrupts during transfers.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Split card-detection IRQs management from sdhci_init()

Card detection interrupts should be handled separately as they should
not be enabled before mmc_add_host() returns and should be disabled
before calling mmc_remove_host(). The same is for suspend and resume
routines.

sdhci_init() no longer enables card-detection irqs. Instead, two new
functions implemented: sdhci_enable_card_detection() and
sdhci_disable_card_detection().

New sdhci_reinit() call implemented to behave the same way as the old
sdhci_init().

Also, this patch implements and uses few new helpers to manage IRQs in
a more conveinient way, that is:

- sdhci_clear_set_irqs()
- sdhci_unmask_irqs()
- sdhci_mask_irqs()
- SDHCI_INT_ALL_MASK constant

sdhci_enable_sdio_irq() converted to these new helpers, plus the
helpers will be used by the subsequent patches.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: Add support for bus-specific IO memory accessors

Currently the SDHCI driver works with PCI accessors (write{l,b,w} and
read{l,b,w}).

With this patch drivers may change memory accessors, so that we can
support hosts with "weird" IO memory access requirments.

For example, in "FSL eSDHC" SDHCI hardware all registers are 32 bit
width, with big-endian addressing. That is, readb(0x2f) should turn
into readb(0x2c), and readw(0x2c) should be translated to
le16_to_cpu(readw(0x2e)).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc_spi: adjust for delayed data token response

Some cards are not able to send the data token in time, but
miss the time frame for some bits(!). So synchronize to the
start of the token.

Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Wait for SDBP

It is necessary to wait for bus power before sending
any commands.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Fix MMC3 dma

Data transfers on third OMAP3 MMC controller don't work
because DMA line numbers are only defined for MMC1 and MMC2.
Fix that and store line numbers in mmc_omap_host structure
to reduce code size.
Tested on OMAP3 pandora board.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Disable SDBP at suspend

Turn off the bus power at suspend.

Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Do not prefix slot name

Allow slot_name to be the same as the other OMAP
driver, by removing the redundant "slot:" prefix.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Allow cover switch to cause rescan

Allow a cover switch to be used to cause a rescan of the
MMC slot.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Add 8-bit bus width mode support

Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Implement scatter-gather emulation

Instead of using the bounce buffer, using scatter-gather emulation
(as in the OMAP1/2 MMC driver) removes the need of one extra memory
copy and improves performance.

Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Fix response type for busy after response

Some MMC commands result in the card becoming busy after
the response is received. This needs to be specified
for the omap_hsmmc host controller, which is what this
patch does. However, the effect is that some commands
with no data will cause a Transfer Complete (TC) interrupt
in addition to the Command Complete (CC) interrupt.
In order to deal with that, the irq handler has needed
a few changes also.

The benefit of this change is that the omap_hsmmc host
controller driver now waits for the TC interrupt while
the card is busy, so the mmc_block driver needs to poll
the card status just once instead of repeatedly.
i.e. the net result is more sleep and less cpu.

The command sequence for open-ended multi-block write
with DMA is now:

Issue write command CMD25
Receive CC interrupt
Data is sent
Receive TC interrupt (DMA is done)
Issue stop command CMD12
Receive CC interrupt
Card is busy
Receive TC interrupt
Card is now ready for next transfer

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Do dma cleanup also with data CRC errors

Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: add maintainer for mvsdio driver

Nicolas Pitre accepted to look after the mvsdio driver.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: SDIO driver for Marvell SoCs

This supports MMC/SD/SDIO currently found on the Kirkwood 88F6281 and
88F6192 SoC controllers.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

MMC: tmio_mmc.h: fix build problem

drivers/mmc/host/tmio_mmc.h: In function 'tmio_mmc_kmap_atomic':
drivers/mmc/host/tmio_mmc.h:147: error: implicit declaration of function 'kmap_atomic'
drivers/mmc/host/tmio_mmc.h:147: error: 'KM_BIO_SRC_IRQ' undeclared (first use in this function)
drivers/mmc/host/tmio_mmc.h: In function 'tmio_mmc_kunmap_atomic':
drivers/mmc/host/tmio_mmc.h:153: error: implicit declaration of function 'kunmap_atomic'
drivers/mmc/host/tmio_mmc.h:153: error: 'KM_BIO_SRC_IRQ' undeclared (first use in this function)

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

tmio_mmc: Fix use after free in remove()

Update the tmio_mmc code to call mmc_free_host() when
done using the private data. Without this fix the driver
frees memory and then keeps on using it as private data.

Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

tmio_mmc: Fix one off, use resource_size() in probe()

Update the tmio_mmc code to use resource_size(). With this
patch applied the correct resource size is passed to ioremap().

Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc_spi: allow higher timeouts for SPI mode

Some SD cards have very high timeouts in SPI mode.
So adjust the timeouts from theory to practice.

Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc_spi: wait more bytes for card response

Some cards are slower than the standard allows and need more
time to respond to a command. Max. observed number of bytes
was 12.

Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc_spi: allow setting of spi mode 3

Allow the platform data structures to specify spi mode 3
(if there is a pullup on the clock line or the spi hardware
is not able to serve spi mode 0).

Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdhci: change list address

Domain change of the sdhci development list.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: During unsafe resume, select the right volatge for the card

During mmc unsafe resume, choose the right voltage for the card after
powerup.

Although this has not seen to cause trouble, it's the wrong behaviour.

Signed-off-by: Balaji Rao <balajirrao@openmoko.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdio: check that addresses are within the address space

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdio: handle null tuples

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

sdio: handle cis end marker in link field

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: Initialize hsmmc controller registers when resuming

Most registers lose its state when the processor wakes up from sleep state.
Thus registers should be initialized, when the processor wakes up. However the
current hsmmc 'resume' function doesn't consider this issue and finally makes
deadlock. So this patch fixes this problem.

Signed-off-by: Kim Kyuwon <chammoru@gmail.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

omap_hsmmc: do not re-power when powering off MMC

Remove code that turns MMC1 power back on after it
has been powered off (when the voltage is 1.8V).

The offending code is not necessary because the
host controller bus voltage is initialized to
3V when probing or resuming. Note that MMC powers up
with the highest voltage available (see mmc_power_up())
which will be 3V also.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: Add Extended CSD register to debugfs

Extended CSD is a MMC card register. As increasingly interesting
fields are being added to Extended CSD, it is helpful to see its
value. Note that SD cards do not have an Extended CSD
register, so it is MMC only.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: add MODALIAS linkage for MMC/SD devices

Currently we are using an explicit udev rule to trigger loading of the
mmc-block module when an MMC or SD card is detected:

SUBSYSTEM=="mmc", RUN+="/sbin/modprobe -Qba mmc-block"

It makes much more sense for the mmc bus driver and the mmc-block module to
share MODALIAS information so that they are linked automatically.

There is no real information of use in the MMC system at the current time.
All devices inserted require us to load the mmc-block device. Until such
time as useful parameters exist simply reflect the module linkage via
the module alias below:

mmc:block

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: delayed_work was never cancelled

The delayed work item mmc_host.detect is now cancelled before flushing
the work queue. This takes care of cases when delayed_work was scheduled
for mmc_host.detect, but not yet placed in the work queue.

Signed-off-by: Jorg Schummer <ext-jorg.2.schummer@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

mmc: struct device - replace bus_id with dev_name(), dev_set_name()

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6

dnet: drivers/net/dnet.c needs <linux/io.h>

On m68k:
| drivers/net/dnet.c: In function 'dnet_readw_mac':
| drivers/net/dnet.c:36: error: implicit declaration of function 'writel'
| drivers/net/dnet.c:43: error: implicit declaration of function 'readl'
| drivers/net/dnet.c: In function 'dnet_probe':
| drivers/net/dnet.c:873: error: implicit declaration of function 'ioremap'
| drivers/net/dnet.c:873: warning: assignment makes pointer from integer without a cast
| drivers/net/dnet.c:939: error: implicit declaration of function 'iounmap'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Btrfs: optimize fsyncs on old files

The fsync log has code to make sure all of the parents of a file are in the
log along with the file. It uses a minimal log of the parent directory
inodes, just enough to get the parent directory on disk.

If the transaction that originally created a file is fully on disk,
and the file hasn't been renamed or linked into other directories, we
can safely skip the parent directory walk. We know the file is on disk
somewhere and we can go ahead and just log that single file.

This is more important now because unrelated unlinks in the parent directory
might make us force a commit if we try to log the parent.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: tree logging unlink/rename fixes

The tree logging code allows individual files or directories to be logged
without including operations on other files and directories in the FS.
It tries to commit the minimal set of changes to disk in order to
fsync the single file or directory that was sent to fsync or O_SYNC.

The tree logging code was allowing files and directories to be unlinked
if they were part of a rename operation where only one directory
in the rename was in the fsync log.  This patch adds a few new rules
to the tree logging.

1) on rename or unlink, if the inode being unlinked isn't in the fsync
log, we must force a full commit before doing an fsync of the directory
where the unlink was done.  The commit isn't done during the unlink,
but it is forced the next time we try to log the parent directory.

Solution: record transid of last unlink/rename per directory when the
directory wasn't already logged.  For renames this is only done when
renaming to a different directory.

mkdir foo/some_dir
normal commit
rename foo/some_dir foo2/some_dir
mkdir foo/some_dir
fsync foo/some_dir/some_file

The fsync above will unlink the original some_dir without recording
it in its new location (foo2).  After a crash, some_dir will be gone
unless the fsync of some_file forces a full commit

2) we must log any new names for any file or dir that is in the fsync
log.  This way we make sure not to lose files that are unlinked during
the same transaction.

2a) we must log any new names for any file or dir during rename
when the directory they are being removed from was logged.

2a is actually the more important variant.  Without the extra logging
a crash might unlink the old name without recreating the new one

3) after a crash, we must go through any directories with a link count
of zero and redo the rm -rf

mkdir f1/foo
normal commit
rm -rf f1/foo
fsync(f1)

The directory f1 was fully removed from the FS, but fsync was never
called on f1, only its parent dir.  After a crash the rm -rf must
be replayed.  This must be able to recurse down the entire
directory tree.  The inode link count fixup code takes care of the
ugly details.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: Make sure i_nlink doesn't hit zero too soon during log replay

During log replay, inodes are copied from the log to the main filesystem
btrees. Sometimes they have a zero link count in the log but they actually
gain links during the replay or have some in the main btree.

This patch updates the link count to be at least one after copying the
inode out of the log. This makes sure the inode is deleted during an
iput while the rest of the replay code is still working on it.

The log replay has fixup code to make sure that link counts are correct
at the end of the replay, so we could use any non-zero number here and
it would work fine.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: limit balancing work while flushing delayed refs

The delayed reference mechanism is responsible for all updates to the
extent allocation trees, including those updates created while processing
the delayed references.

This commit tries to limit the amount of work that gets created during
the final run of delayed refs before a commit. It avoids cowing new blocks
unless it is required to finish the commit, and so it avoids new allocations
that were not really required.

The goal is to avoid infinite loops where we are always making more work
on the final run of delayed refs. Over the long term we'll make a
special log for the last delayed ref updates as well.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: readahead checksums during btrfs_finish_ordered_io

This reads in blocks in the checksum btree before starting the
transaction in btrfs_finish_ordered_io. It makes it much more likely
we'll be able to do operations inside the transaction without
needing any btree reads, which limits transaction latencies overall.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: leave btree locks spinning more often

btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
for the buffers it was dirtying.  This may require a kmalloc and it
was not atomic.  So, anyone who called btrfs_mark_buffer_dirty had to
set any btree locks they were holding to blocking first.

This commit changes dirty tracking for extent buffers to just use a flag
in the extent buffer.  Now that we have one and only one extent buffer
per page, this can be safely done without losing dirty bits along the way.

This also introduces a path->leave_spinning flag that callers of
btrfs_search_slot can use to indicate they will properly deal with a
path returned where all the locks are spinning instead of blocking.

Many of the btree search callers now expect spinning paths,
resulting in better btree concurrency overall.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: Only let very young transactions grow during commit

Commits are fairly expensive, and so btrfs has code to sit around for a while
during the commit and let new writers come in.

But, while we're sitting there, new delayed refs might be added, and those
can be expensive to process as well. Unless the transaction is very very
young, it makes sense to go ahead and let the commit finish without hanging
around.

The commit grow loop isn't as important as it used to be, the fsync logging
code handles most performance critical syncs now.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: Check for a blocking lock before taking the spin

This reduces contention on the extent buffer spin locks by testing for a
blocking lock before trying to take the spinlock.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: reduce stack in cow_file_range

The fs/btrfs/inode.c code to run delayed allocation during writout
needed some stack usage optimization. This is the first pass, it does
the check for compression earlier on, which allows us to do the common
(no compression) case higher up in the call chain.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: reduce stalls during transaction commit

To avoid deadlocks and reduce latencies during some critical operations, some
transaction writers are allowed to jump into the running transaction and make
it run a little longer, while others sit around and wait for the commit to
finish.

This is a bit unfair, especially when the callers that jump in do a bunch
of IO that makes all the others procs on the box wait. This commit
reduces the stalls this produces by pre-reading file extent pointers
during btrfs_finish_ordered_io before the transaction is joined.

It also tunes the drop_snapshot code to politely wait for transactions
that have started writing out their delayed refs to finish. This avoids
new delayed refs being flooded into the queue while we're trying to
close off the transaction.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: process the delayed reference queue in clusters

The delayed reference queue maintains pending operations that need to
be done to the extent allocation tree. These are processed by
finding records in the tree that are not currently being processed one at
a time.

This is slow because it uses lots of time searching through the rbtree
and because it creates lock contention on the extent allocation tree
when lots of different procs are running delayed refs at the same time.

This commit changes things to grab a cluster of refs for processing,
using a cursor into the rbtree as the starting point of the next search.
This way we walk smoothly through the rbtree.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: try to cleanup delayed refs while freeing extents

When extents are freed, it is likely that we've removed the last
delayed reference update for the extent. This checks the delayed
ref tree when things are freed, and if no ref updates area left it
immediately processes the delayed ref.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: reduce stack usage in some crucial tree balancing functions

Many of the tree balancing functions follow the same pattern.

1) cow a block
2) do something to the result

This commit breaks them up into two functions so the variables and
code required for part two don't suck down stack during part one.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: do extent allocation and reference count updates in the background

The extent allocation tree maintains a reference count and full
back reference information for every extent allocated in the
filesystem.  For subvolume and snapshot trees, every time
a block goes through COW, the new copy of the block adds a reference
on every block it points to.

If a btree node points to 150 leaves, then the COW code needs to go
and add backrefs on 150 different extents, which might be spread all
over the extent allocation tree.

These updates currently happen during btrfs_cow_block, and most COWs
happen during btrfs_search_slot.  btrfs_search_slot has locks held
on both the parent and the node we are COWing, and so we really want
to avoid IO during the COW if we can.

This commit adds an rbtree of pending reference count updates and extent
allocations.  The tree is ordered by byte number of the extent and byte number
of the parent for the back reference.  The tree allows us to:

1) Modify back references in something close to disk order, reducing seeks
2) Significantly reduce the number of modifications made as block pointers
are balanced around
3) Do all of the extent insertion and back reference modifications outside
of the performance critical btrfs_search_slot code.

#3 has the added benefit of greatly reducing the btrfs stack footprint.
The extent allocation tree modifications are done without the deep
(and somewhat recursive) call chains used in the past.

These delayed back reference updates must be done before the transaction
commits, and so the rbtree is tied to the transaction.  Throttling is
implemented to help keep the queue of backrefs at a reasonable size.

Since there was a similar mechanism in place for the extent tree
extents, that is removed and replaced by the delayed reference tree.

Yan Zheng <yan.zheng@oracle.com> helped review and fixup this code.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: don't preallocate metadata blocks during btrfs_search_slot

In order to avoid doing expensive extent management with tree locks held,
btrfs_search_slot will preallocate tree blocks for use by COW without
any tree locks held.

A later commit moves all of the extent allocation work for COW into
a delayed update mechanism, and this preallocation will no longer be
required.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

DVB: firedtv: fix printk format mismatch

Eliminate
drivers/media/dvb/firewire/firedtv-avc.c: In function 'debug_fcp':
drivers/media/dvb/firewire/firedtv-avc.c:156: warning: format '%d' expects type 'int', but argument 5 has type 'size_t'

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: constify device ID tables

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: raw1394: add sparse annotations to raw1394_compat_write

Eliminate the following warnings in raw1394_compat_write()'s error
return path, seen on x86-64 with CONFIG_COMPAT=y:

drivers/ieee1394/raw1394.c:381:17: warning: incorrect type in return expression (different address spaces)
drivers/ieee1394/raw1394.c:381:17:    expected char const [noderef] <asn:1>*
drivers/ieee1394/raw1394.c:381:17:    got void *
drivers/ieee1394/raw1394.c:2252:14: warning: incorrect type in argument 1 (different address spaces)
drivers/ieee1394/raw1394.c:2252:14:    expected void const *ptr
drivers/ieee1394/raw1394.c:2252:14:    got char const [noderef] <asn:1>*[assigned] buffer
drivers/ieee1394/raw1394.c:2253:19: warning: incorrect type in argument 1 (different address spaces)
drivers/ieee1394/raw1394.c:2253:19:    expected void const *ptr
drivers/ieee1394/raw1394.c:2253:19:    got char const [noderef] <asn:1>*[assigned] buffer

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: Storage class should be before const qualifier

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: sbp2: follow up on "ieee1394: inherit ud vendor_id from node vendor_id"

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: optimize propagation of BROADCAST_CHANNEL

Cache the test result of whether a device implements BROADCAST_CHANNEL.
This minimizes traffic on the bus after each bus reset.  A majority of
devices does not implement BROADCAST_CHANNEL.

Remove busy retries; just rely on the hardware to retry requests to busy
responders.  Remove unnecessary log messages.

Rename the flag is_irm to broadcast_channel_allocated to better reflect
its meaning.  Reset the flag earlier in fw_core_handle_bus_reset.

Pass the generation down as a call parameter; that way generation can't
be newer than card->broadcast_channel_allocated and device->node_id.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: simplify broadcast channel allocation

fw-iso.c has channel allocation code now, use it.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: increase bus manager grace period

Per IEEE 1394 clause 8.4.2.5, bus manager capable nodes which are not
incumbent shall wait at least 125ms before trying to establish
themselves as bus manager.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: drop unused call parameters of close_transaction

All callers inserted NULL and 0 here.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add closure to async stream ioctl

This changes the as yet unreleased FW_CDEV_IOC_SEND_STREAM_PACKET ioctl
to generate an fw_cdev_event_response event just like the other two
ioctls for asynchronous request transmission do. This way, clients get
feedback on successful or unsuccessful transmission.

This also adds input validation for length, tag, channel, sy, speed.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: simplify FW_CDEV_IOC_SEND_REQUEST return value

This changes the ioctl() return value of FW_CDEV_IOC_SEND_REQUEST and of
the as yet unreleased FW_CDEV_IOC_SEND_BROADCAST_REQUEST.  They used to
return
sizeof(struct fw_cdev_send_request *) + data_length

which is obviously a failed attempt to emulate the return value of
raw1394's respective interface which uses write() instead of ioctl().

However, the first summand, as size of a kernel pointer, is entirely
meaningless to clients and the second summand is already known to
clients.  And the result does not resemble raw1394's write() return
code anyway.

So simplify it to a constant non-negative value, i.e. 0.  The only
dangers here would be that future client implementations check for error
by ret != 0 instead of ret < 0 when running on top of an old kernel; or
that current clients interpret ret = 0 or more as failure.  But both are
hypothetical cases which don't justify to return irritating values.

While we touch this code, also remove "& 0x1f" from tcode in the call of
fw_send_request.  The tcode cannot be bigger than 0x1f at this point.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: fix race of ioctl_send_request with bus reset

The bus reset handler concurrently frees client->device->node. Use
device->node_id instead. This is equivalent to device->node->node_id
while device->generation is current.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: secure add_descriptor ioctl

The access permissions and ownership or ACL of /dev/fw* character device
files will typically be set based on the device type of the respective
nodes, as obtained by firewire-core from descriptors in the device's
configuration ROM.  An example policy is to deny write permission by
default but grant write permission to files of AV/C video and audio
devices and IIDC video devices.

The FW_CDEV_IOC_ADD_DESCRIPTOR ioctl could be used to partly subvert
such a policy:  Find a device file with relaxed permissions, use the
ioctl to add a descriptor with AV/C marker to the local node's ROM, thus
gain access to the local node's character device file.  (This is only
possible if there are udev scripts installed which actively relax
permissions for known device types and if there is a device of such a
type connected.)

Accessibility of the local node's device file is relevant to host
security if the host contains two or more IEEE 1394 link layer
controllers which are plugged into a single bus.

Therefore change the ABI to deny FW_CDEV_IOC_ADD_DESCRIPTOR if the file
belongs to a remote node.  (This change has no impact on known
implementers of the ABI:  None of them uses the ioctl yet.)

Also clarify the documentation:  The ioctl affects all local nodes, not
just one local node.

Cc: stable@kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: amendment to "add ioctl to query maximum transmission speed"

The as yet unreleased FW_CDEV_IOC_GET_SPEED ioctl puts only a single
integer into the parameter buffer. We can use ioctl()'s return value
instead.

(Also: Some whitespace change in firewire-cdev.h.)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: broadcast channel support

This patch adds the ISO broadcast channel support that is required of a
1394a IRM. In specific, if the local device the IRM, it allocates ISO
channel 31 and sets the broadcast channel register of all devices on the
local bus to BROADCAST_CHANNEL_INITIAL | BROADCAST_CHANNEL_VALID to indicate
that channel 31 can be use for broadcast messages.

One minor complication is that on startup the local device may become IRM
before all the devices on the bus have been enumerated by the stack. Therefore
we have to keep a "the local device is IRM" flag and possibly set the
broadcast channel register of new devices at enumeration time.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: implement asynchronous stream transmission

Allow userspace and other firewire drivers (fw-ipv4 I'm looking at
you!) to send Asynchronous Transmit Streams as described in 7.8.3 of
release 1.1 of the 1394 Open Host Controller Interface Specification.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (tweaks)

firewire: core: normalize a function argument name

It's called "payload" rather than "data" almost everywhere in
fw-transaction.c.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: normalize a variable name

Standardize on  if (err)
                        handle_error;
           and  if (ret < 0)
                        handle_error;

Don't call a variable err if we store values in it which mean success.
Also, offset some return statements by a blank line since this how we do
it in drivers/firewire.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: remove condition which is always false

reread_bus_info_block() only gets to see devices whose config_rom_length
is at least 6 (ROM header, bus info block, root directory header).

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: move some functions

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: core: clean up includes

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: simplify a schedule_delayed_work wrapper

The kernel API documentation says that queue_delayed_work() returns 0
(only) if the work was already queued. The return codes of
schedule_delayed_work() are not documented but the same.

In init_iso_resource(), the work has never been queued yet, hence we
can assume schedule_delayed_work() to be a guaranteed success there.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add ioctls for iso resource management, amendment

Some fixes:
  - Remove stale documentation.
  - Fix a != vs. == thinko that got in the way of channel management.
  - Try bandwidth deallocation even if channel deallocation failed.

A simplification:
  - fw_cdev_allocate_iso_resource.channels is now ordered like
    libdc1394's dc1394_iso_allocate_channel() channels_allowed
    argument.

By the way, I looked closer at cards from NEC, TI, and VIA, and noticed
that they all don't implement IEEE 1394a behaviour which is meant to
deviate from IEEE 1212's notion of lock compare-swap.  This means that
we have to do two lock transactions instead of one in many cases where
one transaction would already succeed on a fully 1394a compliant IRM.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: increment fw_cdev_version, update documentation

Necessary due to
    Date: Tue, 22 Jul 2008 23:23:40 -0700
    From: David Moore <dcm@acm.org>
    Subject: firewire: Include iso timestamp in headers when header_size > 4

Side note:  The lack of upwards compatibility sounds worse than it is.
All existing client implementations, libraw1394 and libdc1394, set
header_size = 4.  And since the ABI v1 behaviour does not offer any
advantages over the new behaviour, we deliberately do not provide the
old behaviour anymore.

Also add documentation about the format of fw_cdev_get_cycle_timer which
may be used in conjunction with the timestamp of iso packets but has a
different format.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: shut down iso context before freeing the buffer

DMA must be halted before we DMA-unmap and free the DMA buffer. Since
we cannot rely on the client to stop the context before it closes the
fd, we have to reorder fw_iso_buffer_destroy vs. fw_iso_context_destroy.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: replace some spin_lock_irqsave by spin_lock_irq

All of these functions are entered with IRQs enabled.
Hence the unconditional spin_unlock_irq can be used.

Function:                  Caller context:
    dequeue_event()            client process, via read(2)
    fill_bus_reset_event()     fw-device.c update worqueue job
    release_client_resource()  client process, via ioctl(2)
    fw_device_op_release()     client process, via close(2)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: extend transaction payload size check

Make the size check of ioctl_send_request and
ioctl_send_broadcast_request speed dependent. Also change the error
return code from -EINVAL to -EIO to distinguish this from other errors
concerning the ioctl parameters.

Another payload size limit for which we don't check here though is the
remote node's Bus_Info_Block.max_rec.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: restrict broadcast write requests to Units Space

We don't want random users write to Memory Space (e.g. PCs with physical
DMA filters down) or to core CSRs like Reset_Start.

This does not protect SBP-2 target CSRs. But properly behaving SBP-2
targets ignore broadcast write requests to these registers, and the
maximum damage which can happen with laxer targets is DOS. But there
are ways to create DOS situations anyway if there are devices with weak
device file permissions (like audio/video devices) present at the same
bus as an SBP-2 target.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add ioctl for broadcast write requests

Write transactions to the broadcast node ID are a convenient way to
trigger functions of multiple nodes at once.  IIDC is a protocol which
can make use of this if multiple cameras with same command_regs_base are
connected at the same bus.

Based on
    Date: Wed, 10 Sep 2008 11:32:16 -0400
    From: Jay Fenlason <fenlason@redhat.com>
    Subject: [patch] SEND_BROADCAST_REQUEST
Changes:  ioctl_send_request() and ioctl_send_broadcast_request() now
share code.  Broadcast speed corrected to S100.  Check for proper tcode.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add ioctl to query maximum transmission speed

While the speed of asynchronous transactions is automatically chosen by
the kernel, the speed of isochronous streams has to be chosen by the
initiating client.

In case of 1394a bus topologies, the maximum possible speed could be
figured out with some effort by evaluation of the remote node's link
speed field in the config ROM, the local node's link speed field, and
the PHY speeds and topologic information in the local node's or IRM's
topology map CSR. However, this does not work in case of 1394b buses.

Hence add an ioctl to export the maximum speed which the kernel already
determined.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add ioctls for manual iso resource management

This adds ioctls for allocation and deallocation of a channel or/and
bandwidth without auto-reallocation and without auto-deallocation.

The benefit of these ioctls is that libraw1394-style isochronous
resource management can be implemented without write access to the IRM's
character device file.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: cdev: add ioctls for isochronous resource management

Based on
    Date: Tue, 18 Nov 2008 11:41:27 -0500
    From: Jay Fenlason <fenlason@redhat.com>
    Subject: [Patch V4] Add ISO resource management support
with several changes to the ABI and implementation.  Only the part of
the ABI which enables auto-reallocation and auto-deallocation is
included here.

This implements ioctls for kernel-assisted allocation of isochronous
channels and isochronous bandwidth.  The benefits are:
  - The client does not have to have write access to the /dev/fw* device
    corresponding to the IRM.
  - The client does not have to perform reallocation after bus resets.
  - Channel and bandwidth are deallocated by the kernel if the file is
    closed before the client deallocated the resources.  Thus resources
    are released even if the client crashes.

It is anticipated that future in-kernel code (firewire-core IRM code;
the firewire port of firedtv), will use the fw-iso.c portions of this
code too.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: David Moore <dcm@acm.org>

firewire: core: topology header fix

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>