]> www.pilppa.org Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
16 years agofirewire: cdev: sort includes
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: sort includes

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: unify names of struct types and of their instances
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: unify names of struct types and of their instances

to indicate that they are specializations of struct event or of struct
client_resource, respectively.

struct response was both an event and a client_resource; it is now split
into struct outbound_transaction_resource and ~_event in order to
document more explicitly which types of client resources exist.

struct request and struct_request_event are renamed to struct
inbound_transaction_resource and ~_event because requests and responses
occur in outbound and in inbound transactions.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: reference-count client instances
Stefan Richter [Sun, 4 Jan 2009 15:23:29 +0000 (16:23 +0100)]
firewire: cdev: reference-count client instances

The lifetime of struct client instances must be longer than the lifetime
of any client resource.

This fixes a possible race between fw_device_op_release and transaction
completions.  It also prepares for new ioctls for isochronous resource
management which will involve delayed processing of client resources.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed-by: David Moore <dcm@acm.org>
16 years agofirewire: cdev: fix documentation of FW_CDEV_IOC_GET_INFO
Stefan Richter [Fri, 2 Jan 2009 11:47:13 +0000 (12:47 +0100)]
firewire: cdev: fix documentation of FW_CDEV_IOC_GET_INFO

The FW_CDEV_IOC_GET_INFO ioctl looks at client->device->config_rom, not
at the local node's config ROM.

We could fix the implementation or the documentation.  I believe the way
how it is currently implemented is more useful than the way how it is
currently documented.  In fact, libdc1394 uses the ABI already as
implemented, not as documented.  Hence let's change the documentation.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: prevent creation of multiple IR DMA contexts for the same channel
Stefan Richter [Sun, 21 Dec 2008 15:39:46 +0000 (16:39 +0100)]
firewire: prevent creation of multiple IR DMA contexts for the same channel

OHCI-1394 1.1 clause 10.4.3 says:  "If more than one IR DMA context
specifies receives for packets from the same isochronous channel, the
context destination for that channel's packets is undefined."

Any userspace client and in the future also kernelspace clients can
allocate IR DMA contexts for any channel.  We don't want them to
interfere with each other, hence it is preferable to return -EBUSY if
allocation of a second context for a channel is attempted.

Notes:
  - This limitation is OHCI-1394 specific, therefore its proper place of
    implementation is down in the low-level driver.

  - Since the <linux/firewire-cdev.h> ABI simply maps one userspace iso
    client context to one hardware iso context, this OHCI-1394
    limitation alas requires userspace to implement its own multiplexing
    of iso reception from the same channel and card to multiple clients
    when needed.

  - The limitation is independent of channel allocation at the IRM; the
    latter is really only important for the initiation of iso
    transmission but not of iso reception.

  - We don't need to do the same for IT DMA because OHCI-1394 does not
    have any ties between IT contexts and channels.  Only the voluntary
    channel allocation protocol via the IRM, globally to the FireWire
    bus, can ensure proper isochronous transmit behaviour anyway.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: use list_first_entry
Stefan Richter [Sun, 21 Dec 2008 15:49:57 +0000 (16:49 +0100)]
firewire: cdev: use list_first_entry

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: core: remove unused definitions
Stefan Richter [Sun, 14 Dec 2008 20:47:36 +0000 (21:47 +0100)]
firewire: core: remove unused definitions

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: remove line breaks before function names
Stefan Richter [Sun, 14 Dec 2008 20:47:04 +0000 (21:47 +0100)]
firewire: remove line breaks before function names

type
    function_name(parameters);

is nice to look at but was not used consistently.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: standardize a variable name
Stefan Richter [Sun, 14 Dec 2008 20:45:45 +0000 (21:45 +0100)]
firewire: standardize a variable name

"ret" is the new "retval".

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: core: remove obsolete assertions
Stefan Richter [Sun, 14 Dec 2008 20:45:14 +0000 (21:45 +0100)]
firewire: core: remove obsolete assertions

This code never changes.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: core: remove outdated comment
Stefan Richter [Sun, 14 Dec 2008 18:21:31 +0000 (19:21 +0100)]
firewire: core: remove outdated comment

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: address handler input validation
Stefan Richter [Sun, 14 Dec 2008 18:21:01 +0000 (19:21 +0100)]
firewire: cdev: address handler input validation

Like before my commit 1415d9189e8c59aa9c77a3bba419dcea062c145f,
fw_core_add_address_handler() does not align the address region now.
Instead the caller is required to pass valid parameters.

Since one of the callers of fw_core_add_address_handler() is the cdev
userspace interface, we now check for valid input.  If the client is
buggy, we give it a hint with -EINVAL.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: use an idr rather than a linked list for resources
Jay Fenlason [Sun, 21 Dec 2008 15:47:17 +0000 (16:47 +0100)]
firewire: cdev: use an idr rather than a linked list for resources

The current code uses a linked list and a counter for storing
resources and the corresponding handle numbers.  By changing to an idr
we can be safe from counter wrap-around giving two resources the same
handle.

Furthermore, the deallocation ioctls now check whether the resource to
be freed is of the intended type.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Some rework by Stefan R:
  - The idr API documentation says we get an ID within 0...0x7fffffff.
    Hence we can rest assured that idr handles fit into cdev handles.
  - Fix some races.  Add a client->in_shutdown flag for this purpose.
  - Add allocation retry to add_client_resource().
  - It is possible to use idr_for_each() in fw_device_op_release().
  - Fix ioctl_send_response() regression.
  - Small style changes.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: fix race of fw_device_op_release with bus reset
Stefan Richter [Sun, 14 Dec 2008 18:19:23 +0000 (19:19 +0100)]
firewire: cdev: fix race of fw_device_op_release with bus reset

Unlink the client from the fw_device earlier in order to prevent bus
reset events being added to client->event_list during shutdown.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: tcodes input validation
Stefan Richter [Fri, 5 Dec 2008 21:44:42 +0000 (22:44 +0100)]
firewire: cdev: tcodes input validation

The behaviour of fw-transaction.c::fw_send_request is ill-defined for
any other tcodes than read/ write/ lock request tcodes.  Therefore
prevent requests with wrong tcodes from entering the transaction layer.

Maybe fw_send_request should check them itself, but I am not inclined to
change it and fw_fill_request from void-valued functions to ones which
return error codes and pass those up.  Besides, maybe fw_send_request is
going to support one more tcode than ioctl_send_request in the future
(TCODE_STREAM_DATA).

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: cdev: documentation fixlet
Stefan Richter [Fri, 5 Dec 2008 21:43:41 +0000 (22:43 +0100)]
firewire: cdev: documentation fixlet

Reported-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: convert client_list_lock to mutex
Stefan Richter [Sun, 5 Oct 2008 08:37:11 +0000 (10:37 +0200)]
firewire: convert client_list_lock to mutex

So far it is only taken in non-atomic contexts.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: add a client_list_lock
Jay Fenlason [Fri, 3 Oct 2008 15:19:09 +0000 (11:19 -0400)]
firewire: add a client_list_lock

This adds a client_list_lock, which only protects the device's
client_list, so that future versions of the driver can call code that
takes the card->lock while holding the client_list_lock.  Adding this
lock is much simpler than adding __ versions of all the functions that
the future version may need.  The one ordering issue is to make sure
code never takes the client_list_lock with card->lock held.  Since
client_list_lock is only used in three places, that isn't hard.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Update fill_bus_reset_event() accordingly.  Include linux/spinlock.h.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agofirewire: Include iso timestamp in headers when header_size > 4
David Moore [Wed, 23 Jul 2008 06:23:40 +0000 (23:23 -0700)]
firewire: Include iso timestamp in headers when header_size > 4

Previously, when an iso context had header_size > 4, the iso header
(len/tag/channel/tcode/sy) was passed to userspace followed by quadlets
stripped from the payload.  This patch changes the behavior:
header_size = 8 now passes the header quadlet followed by the timestamp
quadlet.  When header_size > 8, quadlets are stripped from the payload.
The header_size = 4 case remains identical.

Since this alters the semantics of the API, the firewire API version
needs to be bumped concurrently with this change.

This change also refactors the header copying code slightly to be much
easier to read.

Signed-off-by: David Moore <dcm@acm.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
16 years agogenirq: provide old request_irq() for CONFIG_GENERIC_HARDIRQ=n
Thomas Gleixner [Tue, 24 Mar 2009 19:27:39 +0000 (20:27 +0100)]
genirq: provide old request_irq() for CONFIG_GENERIC_HARDIRQ=n

Impact: Undo compile breakage for archs with CONFIG_GENERIC_HARDIRQ=n

The threaded interrupt handler patches changed request_irq from extern
to inline. Architectures which do not use the generic irq code still
have request_irq() as a global function and therefor fail to compile.

Keep the extern declaration for CONFIG_GENERIC_HARDIRQ=n

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Felix Blyakher [Tue, 24 Mar 2009 19:25:34 +0000 (14:25 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

16 years agoucc_geth: Fix build breakage caused by a merge
Anton Vorontsov [Tue, 24 Mar 2009 19:06:46 +0000 (12:06 -0700)]
ucc_geth: Fix build breakage caused by a merge

This patch fixes following build error:

  CC      ucc_geth.o
ucc_geth.c: In function 'ucc_geth_probe':
ucc_geth.c:3644: error: implicit declaration of function 'uec_mdio_bus_name'
make[2]: *** [ucc_geth.o] Error 1

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoomap mmc: Remove power_pin
Ladislav Michl [Mon, 23 Mar 2009 17:47:39 +0000 (17:47 +0000)]
omap mmc: Remove power_pin

On Tue, Jan 13, 2009 at 03:43:44PM +0200, Tony Lindgren wrote:
> > diff --git a/arch/arm/plat-omap/include/mach/mmc.h b/arch/arm/plat-omap/include/mach/mmc.h
> > index 031250f..1129e97 100644
> > --- a/arch/arm/plat-omap/include/mach/mmc.h
> > +++ b/arch/arm/plat-omap/include/mach/mmc.h
> > @@ -51,7 +51,6 @@ struct omap_mmc_platform_data {
> >    * not supported */
> >   int (* init)(struct device *dev);
> >   void (* cleanup)(struct device *dev);
> > - void (* shutdown)(struct device *dev);
> >
> >   /* To handle board related suspend/resume functionality for MMC */
> >   int (*suspend)(struct device *dev, int slot);
> > @@ -77,10 +76,6 @@ struct omap_mmc_platform_data {
> >
> >   /* use the internal clock */
> >   unsigned internal_clock:1;
> > - s16 power_pin;
> > -
> > - int switch_pin; /* gpio (card detect) */
> > - int gpio_wp; /* gpio (write protect) */
> >
> >   int (* set_bus_mode)(struct device *dev, int slot, int bus_mode);
> >   int (* set_power)(struct device *dev, int slot, int power_on, int vdd);
>
> Hmm, aren't switch_pin and gpio_wp  used at least in the
> mmc-twl4030.c?

Yes, they are. I missed them completely. Sorry.

> I guess they could be internal to mmc-twl4030.c if not used
> in the drivers directly.

They could, but that's a bit more complicated. Will look at it later.

> > diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c
> > index 67d7b7f..84de289 100644
> > --- a/drivers/mmc/host/omap.c
> > +++ b/drivers/mmc/host/omap.c
> > @@ -157,8 +157,6 @@ struct mmc_omap_host {
> >   struct timer_list dma_timer;
> >   unsigned dma_len;
> >
> > - short power_pin;
> > -
> >   struct mmc_omap_slot    *slots[OMAP_MMC_MAX_SLOTS];
> >   struct mmc_omap_slot    *current_slot;
> >   spinlock_t              slot_lock;
> >
>
> Looks like power_pin could go though.

Updated patch follows

Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
16 years agotracing: use union for multi-usages field
Lai Jiangshan [Tue, 24 Mar 2009 05:38:06 +0000 (13:38 +0800)]
tracing: use union for multi-usages field

Impact: cleanup

struct dyn_ftrace::ip has different usages in his lifecycle,
we use union for it. And also for struct dyn_ftrace::flags.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C871BE.3080405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoftrace: show virtual PID
Lai Jiangshan [Tue, 24 Mar 2009 03:03:01 +0000 (11:03 +0800)]
ftrace: show virtual PID

Impact: fix PID output under namespaces

When current namespace is not the global namespace,
pid read from set_ftrace_pid is no correct.

 # ~/newpid_namespace_run bash
 # echo $$
 1
 # echo 1 > set_ftrace_pid
 # cat set_ftrace_pid
 3756

Since we write virtual PID to set_ftrace_pid, we need get
virtual PID when we read it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C84D65.9050606@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agofunction-graph: add option for include sleep times
Steven Rostedt [Tue, 24 Mar 2009 15:06:24 +0000 (11:06 -0400)]
function-graph: add option for include sleep times

Impact: give user a choice to show times spent while sleeping

The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.

 echo sleep-time > /debug/tracing/trace_options

produces:

 ------------------------------------------
 2)  avahi-d-3428  =>    <idle>-0
 ------------------------------------------

 2)               |      finish_task_switch() {
 2)   0.621 us    |        _spin_unlock_irq();
 2)   2.202 us    |      }
 2) ! 1002.197 us |    }
 2) ! 1003.521 us |  }

where as,

 echo nosleep-time > /debug/tracing/trace_options

produces:

 0)    <idle>-0    =>  yum-upd-3416
 ------------------------------------------

 0)               |              finish_task_switch() {
 0)   0.643 us    |                _spin_unlock_irq();
 0)   2.342 us    |              }
 0) + 41.302 us   |            }
 0) + 42.453 us   |          }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
16 years agopowerpc/83xx: Update ranges in gianfar node to match other dts
Kumar Gala [Tue, 24 Mar 2009 13:23:13 +0000 (08:23 -0500)]
powerpc/83xx: Update ranges in gianfar node to match other dts

The gianfar@25000 node was missing its ranges prop for the mdio bus
and provided an explicit ranges property on gianfar@24000 to match
change from commit:

commit 70b3adbba056f5d9081f1ec9b4a629e3c7502072
Author: Anton Vorontsov <avorontsov@ru.mvista.com>
Date:   Thu Mar 19 21:01:45 2009 +0300

    powerpc/83xx: Move gianfar mdio nodes under the ethernet nodes

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agoMerge branch 'x86/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder...
Ingo Molnar [Tue, 24 Mar 2009 14:20:51 +0000 (15:20 +0100)]
Merge branch 'x86/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6-tiptop into x86/cleanups

16 years agoMerge branches 'x86/apic', 'x86/cleanups', 'x86/mm', 'x86/pat', 'x86/setup' and ...
Ingo Molnar [Tue, 24 Mar 2009 14:19:45 +0000 (15:19 +0100)]
Merge branches 'x86/apic', 'x86/cleanups', 'x86/mm', 'x86/pat', 'x86/setup' and 'x86/signal'; commit 'v2.6.29' into x86/core

16 years ago[MTD] ofpart: Check name property to determine partition nodes.
Benjamin Krill [Fri, 23 Jan 2009 16:18:05 +0000 (17:18 +0100)]
[MTD] ofpart: Check name property to determine partition nodes.

SLOF has a further node which could not be evaluated
by the current routine. The current routine returns
because the node hasn't the required reg property. As
fix this patch adds a check to determine the partition
child nodes. If the node is not a partition the number
of total partitions will be decreased and loop continues
with the next nodes.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
16 years agopowerpc/86xx: Move gianfar mdio nodes under the ethernet nodes
Anton Vorontsov [Thu, 19 Mar 2009 18:01:51 +0000 (21:01 +0300)]
powerpc/86xx: Move gianfar mdio nodes under the ethernet nodes

Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.

Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agopowerpc/85xx: Move gianfar mdio nodes under the ethernet nodes
Anton Vorontsov [Thu, 19 Mar 2009 18:01:48 +0000 (21:01 +0300)]
powerpc/85xx: Move gianfar mdio nodes under the ethernet nodes

Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.

Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agopowerpc/83xx: Move gianfar mdio nodes under the ethernet nodes
Anton Vorontsov [Thu, 19 Mar 2009 18:01:45 +0000 (21:01 +0300)]
powerpc/83xx: Move gianfar mdio nodes under the ethernet nodes

Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.

Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agopowerpc/83xx: Add power management support for MPC837x boards
Anton Vorontsov [Thu, 19 Mar 2009 18:01:42 +0000 (21:01 +0300)]
powerpc/83xx: Add power management support for MPC837x boards

This patch adds pmc nodes to the device tree files so that the boards
will able to use standby capability of MPC837x processors. The MPC837x
PMC controllers are compatible with MPC8349 ones (i.e. no deep sleep).

sleep = <> properties are used to specify SCCR masks as described
in "Specifying Device Power Management Information (sleep property)"
chapter in Documentation/powerpc/booting-without-of.txt.

Since I2C1 and eSDHC controllers share the same clock source, they
are now placed under sleep-nexus nodes.

A processor is able to wakeup the boards on LAN events (Wake-On-Lan),
console events (with no_console_suspend kernel command line), GPIO
events and external IRQs (IRQ1 and IRQ2).

The processor can also wakeup the boards by the fourth general purpose
timer in GTM1 block, but the GTM wakeup support isn't yet implemented
(it's tested to work, but it's unclear how can we use the quite short
GTM timers, and how do we want to expose the GTM to userspace).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agofunction-graph: ignore times across schedule
Steven Rostedt [Tue, 24 Mar 2009 05:10:15 +0000 (01:10 -0400)]
function-graph: ignore times across schedule

Impact: more accurate timings

The current method of function graph tracing does not take into
account the time spent when a task is not running. This shows functions
that call schedule have increased costs:

 3) + 18.664 us   |      }
 ------------------------------------------
 3)    <idle>-0    =>  kblockd-123
 ------------------------------------------

 3)               |      finish_task_switch() {
 3)   1.441 us    |        _spin_unlock_irq();
 3)   3.966 us    |      }
 3) ! 2959.433 us |    }
 3) ! 2961.465 us |  }

This patch uses the tracepoint in the scheduling context switch to
account for time that has elapsed while a task is scheduled out.
Now we see:

 ------------------------------------------
 3)    <idle>-0    =>  edac-po-1067
 ------------------------------------------

 3)               |      finish_task_switch() {
 3)   0.685 us    |        _spin_unlock_irq();
 3)   2.331 us    |      }
 3) + 41.439 us   |    }
 3) + 42.663 us   |  }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
16 years agofunction-graph: prevent more than one tracer registering
Steven Rostedt [Tue, 24 Mar 2009 04:18:31 +0000 (00:18 -0400)]
function-graph: prevent more than one tracer registering

Impact: prevent crash due to multiple function graph tracers

The function graph tracer can currently only handle a single tracer
being registered. If another tracer registers with the function
graph tracer it can crash the system.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
16 years agofunction-graph: moved the timestamp from arch to generic code
Steven Rostedt [Tue, 24 Mar 2009 03:38:49 +0000 (23:38 -0400)]
function-graph: moved the timestamp from arch to generic code

This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
16 years agonetfilter: nf_conntrack: Reduce conntrack count in nf_conntrack_free()
Eric Dumazet [Tue, 24 Mar 2009 13:26:50 +0000 (14:26 +0100)]
netfilter: nf_conntrack: Reduce conntrack count in nf_conntrack_free()

We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
accumulate about 10.000 elements per CPU in its internal queues. To get accurate
conntrack counts (at the expense of slightly more RAM used), we might consider
conntrack counter not taking into account "about to be freed elements, waiting
in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
callback.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
16 years ago[ARM] Kirkwood: fail the probe if internal RTC does not work
Nicolas Pitre [Tue, 24 Mar 2009 00:42:29 +0000 (20:42 -0400)]
[ARM] Kirkwood: fail the probe if internal RTC does not work

Having a RTC that doesn't maintain proper time across a reboot is one
thing.  But a RTC that doesn't work at all and only causes timeouts is
another.

Tested-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
16 years agotracing: fix memory leak in trace_stat
Steven Rostedt [Sat, 21 Mar 2009 06:44:50 +0000 (02:44 -0400)]
tracing: fix memory leak in trace_stat

If the function profiler does not have any items recorded and one were
to cat the function stat file, the kernel would take a BUG with a NULL
pointer dereference.

Looking further into this, I found that returning NULL from stat_start
did not stop the stat logic, and would later call stat_next. This breaks
from the way seq_file works, so I looked into fixing the stat code.

This is where I noticed that the last next_entry is never freed.
It is allocated, and if the stat_next returns NULL, the code breaks out
of the loop, unlocks the mutex and exits. We never link the next_entry
nor do we free it. Thus it is a real memory leak.

This patch rearranges the code a bit to not only fix the memory leak,
but also to act more like seq_file where nothing is printed if there
is nothing to print. That is, stat_start returns NULL.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
16 years agoblktrace: print human-readable act_mask
Li Zefan [Tue, 24 Mar 2009 09:43:30 +0000 (17:43 +0800)]
blktrace: print human-readable act_mask

Impact: new feature, allow symbolic values in /debug/tracing/act_mask

Print stringified act_mask instead of hex value:

 # cat act_mask
 read,write,barrier,sync,queue,requeue,issue,complete,fs,pc,ahead,meta,
 discard,drv_data
 # echo "meta,write" > act_mask
 # cat act_mask
 write,meta

Also:
 - make act_mask accept "ahead", "meta", "discard" and "drv_data"
 - use strsep() instead of strchr() to parse user input
 - return -EINVAL if a token is not found in the mask map
 - fix a bug that 'value' is unsigned, so it can < 0
 - propagate error value of blk_trace_mask2str() to userspace, but not
   always return -ENXIO.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49C8AB42.1000802@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoblktrace: fix t_error()
Li Zefan [Tue, 24 Mar 2009 08:05:51 +0000 (16:05 +0800)]
blktrace: fix t_error()

Impact: fix error flag output

t_error() should return t->error but not t->sector.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49C8945F.5020802@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoblktrace: fix wrong calculation of RWBS
Li Zefan [Tue, 24 Mar 2009 08:05:06 +0000 (16:05 +0800)]
blktrace: fix wrong calculation of RWBS

Impact: fix the output of IO type category characters

Trace categories are the upper 16 bits, not the lower 16 bits.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49C89432.8010805@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoblktrace: mark ddir_act[] const
Li Zefan [Tue, 24 Mar 2009 08:04:37 +0000 (16:04 +0800)]
blktrace: mark ddir_act[] const

Impact: cleanup

ddir_act and what2act always stay immutable.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49C89415.5080503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoloop: support barrier writes
Nikanth Karthikesan [Tue, 24 Mar 2009 11:29:54 +0000 (12:29 +0100)]
loop: support barrier writes

Honour barrier requests in the loop back block device driver.
In case of barrier bios, flush the backing file once before processing the
barrier and once after to guarantee ordering. In case of filesystems that
does not support fsync, barrier bios would be failed with -EOPNOTSUPP.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agobsg: add support for tail queuing
Boaz Harrosh [Tue, 24 Mar 2009 11:23:40 +0000 (12:23 +0100)]
bsg: add support for tail queuing

Currently inherited from sg.c bsg will submit asynchronous request
 at the head-of-the-queue, (using "at_head" set in the call to
 blk_execute_rq_nowait()). This is bad in situation where the queues
 are full, requests will execute out of order, and can cause
 starvation of the first submitted requests.

The sg_io_v4->flags member is used and a bit is allocated to denote the
Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old
code at the write/read path. SG_IO code path behavior was changed so to
be the same as write/read behavior. SG_IO was very rarely used and breaking
compatibility with it is OK at this stage.

sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first
nibble and one bit from the last nibble. Even though none of these bits
are supported by bsg, The second nibble is allocated for use by bsg. Just
in case.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
CC: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocpqarray: enable bus mastering
Dave Jones [Mon, 16 Mar 2009 09:06:05 +0000 (10:06 +0100)]
cpqarray: enable bus mastering

We've been carrying this patch for the last 3 years in Fedora,
long past time we got it upstream...

Call pci_set_master to enable bus-mastering if the BIOS hasn't
done it already.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: genhd.h cleanup patch
Petros Koutoupis [Wed, 11 Mar 2009 09:49:35 +0000 (10:49 +0100)]
block: genhd.h cleanup patch

In include/linux/genhd.h: Line 335 has a comment that needs to be updated from: /* drivers/block/ll_rw_blk.c */ to /* block/blk-core.c */. Also as of kernel 2.6.16, the function definition for get_blkdev_list was removed from block/genhd.c but the function declaration is still present on line 339. This patch addresses both those fixes, by updating the comment and removing the declaration.

Signed-off-by: Petros Koutoupis <pkoutoupis@hydrasystemsllc.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: add private bio_set for bio integrity allocations
Martin K. Petersen [Tue, 10 Mar 2009 07:27:39 +0000 (08:27 +0100)]
block: add private bio_set for bio integrity allocations

The integrity bio allocation needs its own bio_set to avoid violating
the mempool allocation rules and risking deadlocks.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: genhd.h comment needs updating
Petros Koutoupis [Tue, 10 Mar 2009 07:25:54 +0000 (08:25 +0100)]
block: genhd.h comment needs updating

The include/linux/genhd.h file, on line 338-352 declares some function
prototypes in which the comment on line 338 states that the definition of
these prototypes are to be found at drivers/block/genhd.c. The problem is
that genhd.c has been relocated to block/genhd.c. See attached patch to
correct this minor cosmetic typo.

Signed-off-by: Petros Koutoupis <pkoutoupis@hydrasystemsllc.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: get rid of unused blkdev_free_rq() define
Jens Axboe [Fri, 6 Mar 2009 10:12:17 +0000 (11:12 +0100)]
block: get rid of unused blkdev_free_rq() define

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: remove various blk_queue_*() setting functions in blk_init_queue_node()
Jens Axboe [Fri, 6 Mar 2009 07:48:33 +0000 (08:48 +0100)]
block: remove various blk_queue_*() setting functions in blk_init_queue_node()

It calls blk_queue_make_request(), which sets the identical set of limits.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocciss: add BUILD_BUG_ON() for catching bad CommandList_struct alignment
Jens Axboe [Fri, 27 Feb 2009 19:14:20 +0000 (20:14 +0100)]
cciss: add BUILD_BUG_ON() for catching bad CommandList_struct alignment

The hardware requires 64-bit alignment of commands, so add a build bug
check for that. The recent commit 8a3173de4ab4cdacc43675dc5c077f9a5bf17f5f
didn't change the size of the command, but other additions/changes may and
thus break badly at runtime.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: don't create bio_vec slabs of less than the inline number
Jens Axboe [Fri, 5 Dec 2008 15:10:29 +0000 (16:10 +0100)]
block: don't create bio_vec slabs of less than the inline number

If we don't have CONFIG_BLK_DEV_INTEGRITY set, then we don't have
any external dependencies on the bio_vec slabs. So don't create
the ones that we will inline anyway.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: cleanup bio_alloc_bioset()
Ingo Molnar [Sat, 21 Feb 2009 10:16:36 +0000 (11:16 +0100)]
block: cleanup bio_alloc_bioset()

this warning (which got fixed by commit b2bf968):

  fs/bio.c: In function â€˜bio_alloc_bioset’:
  fs/bio.c:305: warning: â€˜p’ may be used uninitialized in this function

Triggered because the code flow in bio_alloc_bioset() is correct
but a bit complex for the compiler to see through.

Streamline it a bit - this also makes the code a tiny bit more compact:

   text    data     bss     dec     hex filename
   7540     256      40    7836    1e9c bio.o.before
   7539     256      40    7835    1e9b bio.o.after

Also remove an older compiler-warnings annotation from this function,
it's not needed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoGFS2: Fix freeze issue
Steven Whitehouse [Mon, 23 Mar 2009 11:38:55 +0000 (11:38 +0000)]
GFS2: Fix freeze issue

This removes some old code that was causing issues during
filesystem freeze.

Reported-by: Andrew Price <andy@andrewprice.me.uk>
Tested-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoFix a minor bug in the previous patch
Steven Whitehouse [Thu, 19 Mar 2009 13:15:44 +0000 (13:15 +0000)]
Fix a minor bug in the previous patch

The logic requires that we mark the glock dirty in page_mkwrite
otherwise we might not flush correctly in the case that no
allocation was required in the process of dirying the page.
Also we need to set the shared write flag early for the same
reason.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Clean up of glops.c
Steven Whitehouse [Mon, 9 Mar 2009 09:03:51 +0000 (09:03 +0000)]
GFS2: Clean up of glops.c

This cleans up a number of bits of code mostly based in glops.c.
A couple of simple functions have been merged into the callers
to make it more obvious what is going on, the mysterious raising
of i_writecount around the truncate_inode_pages() call has been
removed. The meta_go_* operations have been renamed rgrp_go_*
since that is the only lock type that they are used with.

The unused argument of gfs2_read_sb has been removed. Also
a bug has been fixed where a check for the rindex inode was
in the wrong callback. More comments are added, and the
debugging code is improved too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Fix locking bug in failed shared to exclusive conversion
Benjamin Marzinski [Fri, 6 Mar 2009 16:03:20 +0000 (10:03 -0600)]
GFS2: Fix locking bug in failed shared to exclusive conversion

After calling out to the dlm, GFS2 sets the new state of a glock to
gl_target in gdlm_ast().  However, gl_target is not always the lock
state that was requested. If a conversion from shared to exclusive
fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
of gl->gl_target, so that it can reacquire the lock in exlusive the next
time around.  In this case, setting the lock to gl_target in gdlm_ast()
will make GFS2 think that it has the glock in exclusive mode, when
really, it doesn't have the glock locked at all.  This patch adds a new
field to the gfs2_glock structure, gl_req, to track the mode that was
requested.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Pagecache usage optimization on GFS2
Hisashi Hifumi [Tue, 3 Mar 2009 02:45:20 +0000 (11:45 +0900)]
GFS2: Pagecache usage optimization on GFS2

I introduced "is_partially_uptodate" aops for GFS2.

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.

I did a performance test using the sysbench.

#sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1
--file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0
--file-rw-ratio=1 run

-2.6.29-rc6
Test execution summary:
    total time:                          202.6389s
    total number of events:              200000
    total time taken by event execution: 2580.0480
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0129s
         max:                            49.5852s
         approx.  95 percentile:         0.0462s

-2.6.29-rc6-patched
Test execution summary:
    total time:                          177.8639s
    total number of events:              200000
    total time taken by event execution: 2419.0199
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0121s
         max:                            52.4306s
         approx.  95 percentile:         0.0444s

arch: ia64
pagesize: 16k
blocksize: 4k

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: fix sparse warning: Should it be static?
Hannes Eder [Sat, 21 Feb 2009 01:12:05 +0000 (02:12 +0100)]
GFS2: fix sparse warning: Should it be static?

Impact: Make symbol static.

Fix this sparse warning:
  fs/gfs2/rgrp.c:188:5: warning: symbol 'gfs2_bitfit' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: fix sparse warnings: constant is so big it is ...
Hannes Eder [Sat, 21 Feb 2009 01:11:42 +0000 (02:11 +0100)]
GFS2: fix sparse warnings: constant is so big it is ...

Fix this sparse warnings:
  fs/gfs2/rgrp.c:156:23: warning: constant 0xffffffffffffffff is so big it is unsigned long long
  fs/gfs2/rgrp.c:157:23: warning: constant 0xaaaaaaaaaaaaaaaa is so big it is unsigned long long
  fs/gfs2/rgrp.c:158:23: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:194:20: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:204:44: warning: constant 0x5555555555555555 is so big it is long long

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Support quota/noquota mount arguments
Steven Whitehouse [Thu, 19 Feb 2009 10:32:35 +0000 (10:32 +0000)]
GFS2: Support quota/noquota mount arguments

This adds support for "quota" and "noquota" mount options in addition to the
existing "quota=on/off/account" so that we are compatible with the names by
which these options are more generally known.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Fix alignment issue and tidy gfs2_bitfit
Steven Whitehouse [Tue, 17 Feb 2009 14:13:35 +0000 (14:13 +0000)]
GFS2: Fix alignment issue and tidy gfs2_bitfit

An alignment issue with the existing bitfit algorithm was reported
on IA64. This patch attempts to fix that, and also to tidy up the
code a bit. There is now more documentation about how this works
and it has survived a number of different tests.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Add a "demote a glock" interface to sysfs
Steven Whitehouse [Thu, 12 Feb 2009 13:31:58 +0000 (13:31 +0000)]
GFS2: Add a "demote a glock" interface to sysfs

This adds a sysfs file called demote_rq to GFS2's
per filesystem directory. Its possible to use this
file to demote arbitrary glocks in exactly the same
way as if a request had come in from a remote node.

This is intended for testing issues relating to caching
of data under glocks. Despite that, the interface is
generic enough to send requests to any type of glock,
but be careful as its not always safe to send an
arbitrary message to an arbitrary glock. For that reason
and to prevent DoS, this interface is restricted to root
only.

The messages look like this:

<type>:<glocknumber> <mode>

Example:

echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq

Which means "please demote inode glock (type 2) number 13324 so that
I can get an EX (exclusive) lock". The lock modes are those which
would normally be sent by a remote node in its callback so if you
want to unlock a glock, you use EX, to demote to shared, use SH or PR
(depending on whether you like GFS2 or DLM lock modes better!).

If the glock doesn't exist, you'll get -ENOENT returned. If the
arguments don't make sense, you'll get -EINVAL returned.

The plan is that this interface will be used in combination with
the blktrace patch which I recently posted for comments although
it is, of course, still useful in its own right.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Expose UUID via sysfs/uevent
Steven Whitehouse [Tue, 10 Feb 2009 13:48:30 +0000 (13:48 +0000)]
GFS2: Expose UUID via sysfs/uevent

Since we have a UUID, we ought to expose it to the user via sysfs
and uevents. We already have the fs name in both of these places
(a combination of the lock proto and lock table name) so if we add
the UUID as well, we have a full set.

For older filesystems (i.e. those created before mkfs.gfs2 was writing
UUIDs by default) the sysfs file will appear zero length, and no UUID
env var will be added to the uevents.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Support generation of discard requests
Steven Whitehouse [Mon, 9 Feb 2009 09:25:01 +0000 (09:25 +0000)]
GFS2: Support generation of discard requests

This patch allows GFS2 to generate discard requests for blocks which are
no longer useful to the filesystem (i.e. those which have been freed as
the result of an unlink operation). The requests are generated at the
time which those blocks become available for reuse in the filesystem.

In order to use this new feature, you have to specify the "discard"
mount option. The code coalesces adjacent blocks into a single extent
when generating the discard requests, thus generating the minimum
number.

If an error occurs when the request has been sent to the block device,
then it will print a message and turn off the requests for that
filesystem. If the problem is temporary, then you can use remount to
turn the option back on again. There is also a nodiscard mount option
so that you can use remount to turn discard requests off, if required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Fix deadlock on journal flush
Steven Whitehouse [Thu, 5 Feb 2009 10:12:38 +0000 (10:12 +0000)]
GFS2: Fix deadlock on journal flush

This patch fixes a deadlock when the journal is flushed and there
are dirty inodes other than the one which caused the journal flush.
Originally the journal flushing code was trying to obtain the
transaction glock while running the flush code for an inode glock.
We no longer require the transaction glock at this point in time
since we know that any attempt to get the transaction glock from
another node will result in a journal flush. So if we are flushing
the journal, we can be sure that the transaction lock is still
cached from when the transaction was started.

By inlining a version of gfs2_trans_begin() (minus the bit which
gets the transaction glock) we can avoid the deadlock problems
caused if there is a demote request queued up on the transaction
glock.

In addition I've also moved the umount rwsem so that it covers
the glock workqueue, since it all demotions are done by this
workqueue now. That fixes a bug on umount which I came across
while fixing the original problem.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Fix error path ref counting for root inode
Steven Whitehouse [Tue, 20 Jan 2009 16:39:23 +0000 (16:39 +0000)]
GFS2: Fix error path ref counting for root inode

We were keeping hold of an extra ref to the root inode in one
of the error paths, that resulted in a hang.

Reported-by: Nate Straz <nstraz@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Robert Peterson <rpeterso@redhat.com>
16 years agoGFS2: Remove unused field from glock
Steven Whitehouse [Tue, 13 Jan 2009 09:53:43 +0000 (09:53 +0000)]
GFS2: Remove unused field from glock

The time stamp field is unused in the glock now that we are
using a shrinker, so that we can remove it and save sizeof(unsigned long)
bytes in each glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Merge lock_dlm module into GFS2
Steven Whitehouse [Mon, 12 Jan 2009 10:43:39 +0000 (10:43 +0000)]
GFS2: Merge lock_dlm module into GFS2

This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
 o Reducing overhead by eliminating duplicated fields between structures
 o Simplifcation of the code (reduces the code size by a fair bit)
 o The locking interface is now the DLM interface itself as proposed
   some time ago.
 o Fewer lookups of glocks when processing replies from the DLM
 o Fewer memory allocations/deallocations for each glock
 o Scope to do further optimisations in the future (but this patch is
   more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Remove "double" locking in quota
Steven Whitehouse [Thu, 8 Jan 2009 14:28:42 +0000 (14:28 +0000)]
GFS2: Remove "double" locking in quota

We only really need a single spin lock for the quota data, so
lets just use the lru lock for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
16 years agoGFS2: change gfs2_quota_scan into a shrinker
Abhijith Das [Wed, 7 Jan 2009 22:03:37 +0000 (16:03 -0600)]
GFS2: change gfs2_quota_scan into a shrinker

Deallocation of gfs2_quota_data objects now happens on-demand through a
shrinker instead of routinely deallocating through the quotad daemon.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Bring back lvb-related stuff to lock_nolock to support quotas
Abhijith Das [Wed, 7 Jan 2009 16:21:34 +0000 (10:21 -0600)]
GFS2: Bring back lvb-related stuff to lock_nolock to support quotas

The quota code uses lvbs and this is currently not implemented in
lock_nolock, thereby causing panics when quota is enabled with
lock_nolock. This patch adds the relevant bits.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agoGFS2: Fix remount argument parsing
Steven Whitehouse [Tue, 6 Jan 2009 11:52:25 +0000 (11:52 +0000)]
GFS2: Fix remount argument parsing

The following patch fixes an issue relating to remount and argument
parsing. After this fix is applied, remount becomes atomic in that
it either succeeds changing the mount to the new state, or it fails
and leaves it in the old state. Previously it was possible for the
parsing of options to fail part way though and for the fs to be left
in a state where some of the new arguments had been applied, but some
had not.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
16 years agogenirq: threaded irq handlers review fixups
Thomas Gleixner [Tue, 24 Mar 2009 10:46:22 +0000 (11:46 +0100)]
genirq: threaded irq handlers review fixups

Delta patch to address the review comments.

      - Implement warning when IRQ_WAKE_THREAD is requested and no
        thread handler installed
      - coding style fixes

Pointed-out-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agogenirq: add support for threaded interrupts to devres
Arjan van de Ven [Mon, 23 Mar 2009 17:28:16 +0000 (18:28 +0100)]
genirq: add support for threaded interrupts to devres

Some devices use devres_request_irq() for to install their interrupt
handler. Add support for threaded interrupts to devres as well.

[tglx - simplified and adapted to latest threadirq version]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agogenirq: add threaded interrupt handler support
Thomas Gleixner [Mon, 23 Mar 2009 17:28:15 +0000 (18:28 +0100)]
genirq: add threaded interrupt handler support

Add support for threaded interrupt handlers:

A device driver can request that its main interrupt handler runs in a
thread. To achive this the device driver requests the interrupt with
request_threaded_irq() and provides additionally to the handler a
thread function. The handler function is called in hard interrupt
context and needs to check whether the interrupt originated from the
device. If the interrupt originated from the device then the handler
can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is
returned when no further action is required. IRQ_WAKE_THREAD causes
the genirq code to invoke the threaded (main) handler. When
IRQ_WAKE_THREAD is returned handler must have disabled the interrupt
on the device level. This is mandatory for shared interrupt handlers,
but we need to do it as well for obscure x86 hardware where disabling
an interrupt on the IO_APIC level redirects the interrupt to the
legacy PIC interrupt lines.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
16 years agointel-iommu: VT-d page table to support snooping control bit
Sheng Yang [Wed, 18 Mar 2009 07:33:07 +0000 (15:33 +0800)]
intel-iommu: VT-d page table to support snooping control bit

The user can request to enable snooping control through VT-d page table.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
16 years agoiommu: Add domain_has_cap iommu_ops
Sheng Yang [Wed, 18 Mar 2009 07:33:06 +0000 (15:33 +0800)]
iommu: Add domain_has_cap iommu_ops

This iommu_op can tell if domain have a specific capability, like snooping
control for Intel IOMMU, which can be used by other components of kernel to
adjust the behaviour.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
16 years agointel-iommu: Snooping control support
Sheng Yang [Wed, 18 Mar 2009 07:33:05 +0000 (15:33 +0800)]
intel-iommu: Snooping control support

Snooping control enabled IOMMU to guarantee DMA cache coherency and thus reduce
software effort (VMM) in maintaining effective memory type.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
16 years agox86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot
Pallipadi, Venkatesh [Mon, 23 Mar 2009 19:07:20 +0000 (12:07 -0700)]
x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot

While looking at the issue in the thread:

  http://marc.info/?l=dri-devel&m=123606627824556&w=2

noticed a bug in pci PAT code and memory type setting.

PCI mmap code did not set the proper protection in vma, when it
inherited protection in reserve_memtype. This bug only affects
the case where there exists a WC mapping before X does an mmap
with /proc or /sys pci interface. This will cause X userlevel
mmap from /proc or /sysfs to fail on fork.

Reported-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090323190720.GA16831@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years ago[MTD] [NAND] sh_flctl: fix hardware ecc handling for 2048 byte page
Yoshihiro Shimoda [Tue, 24 Mar 2009 09:27:24 +0000 (18:27 +0900)]
[MTD] [NAND] sh_flctl: fix hardware ecc handling for 2048 byte page

Signed-off-by: Jeremy Baker <Jeremy.Baker@renesas.com>
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
16 years agoKVM: VMX: Don't allow uninhibited access to EFER on i386
Avi Kivity [Mon, 23 Mar 2009 20:13:44 +0000 (22:13 +0200)]
KVM: VMX: Don't allow uninhibited access to EFER on i386

vmx_set_msr() does not allow i386 guests to touch EFER, but they can still
do so through the default: label in the switch.  If they set EFER_LME, they
can oops the host.

Fix by having EFER access through the normal channel (which will check for
EFER_LME) even on i386.

Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: Correct deassign device ioctl to IOW
Sheng Yang [Tue, 17 Mar 2009 11:27:19 +0000 (19:27 +0800)]
KVM: Correct deassign device ioctl to IOW

It's IOR by mistake, so fix it before release.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: ppc: e500: Fix the bug that KVM is unstable in SMP
Liu Yu [Tue, 17 Mar 2009 08:57:46 +0000 (16:57 +0800)]
KVM: ppc: e500: Fix the bug that KVM is unstable in SMP

TLB entry should enable memory coherence in SMP.

And like commit 631fba9dd3aca519355322cef035730609e91593,
remove guard attribute to enable the prefetch of guest memory.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: ppc: e500: Fix the bug that mas0 update to wrong value when read TLB entry
Liu Yu [Tue, 17 Mar 2009 08:57:45 +0000 (16:57 +0800)]
KVM: ppc: e500: Fix the bug that mas0 update to wrong value when read TLB entry

Should clear and then update the next victim area here.

Guest kernel only read TLB1 when startup kernel,
this bug result in an extra 4K TLB1 mapping in guest from 0x0 to 0x0.

As the problem has no impact to bootup a guest,
we didn't notice it before.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: Fix missing smp tlb flush in invlpg
Andrea Arcangeli [Thu, 12 Mar 2009 17:18:43 +0000 (18:18 +0100)]
KVM: Fix missing smp tlb flush in invlpg

When kvm emulates an invlpg instruction, it can drop a shadow pte, but
leaves the guest tlbs intact.  This can cause memory corruption when
swapping out.

Without this the other cpu can still write to a freed host physical page.
tlb smp flush must happen if rmap_remove is called always before mmu_lock
is released because the VM will take the mmu_lock before it can finally add
the page to the freelist after swapout. mmu notifier makes it safe to flush
the tlb after freeing the page (otherwise it would never be safe) so we can do
a single flush for multiple sptes invalidated.

Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: Get support IRQ routing entry counts
Sheng Yang [Mon, 16 Mar 2009 08:33:43 +0000 (16:33 +0800)]
KVM: Get support IRQ routing entry counts

In capability probing ioctl.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: fix sparse warnings: Should it be static?
Hannes Eder [Sat, 21 Feb 2009 01:19:13 +0000 (02:19 +0100)]
KVM: fix sparse warnings: Should it be static?

Impact: Make symbols static.

Fix this sparse warnings:
  arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
  arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
  arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
  arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
  arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
  virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: fix sparse warnings: context imbalance
Hannes Eder [Sat, 21 Feb 2009 01:18:13 +0000 (02:18 +0100)]
KVM: fix sparse warnings: context imbalance

Impact: Attribute function with __acquires(...) resp. __releases(...).

Fix this sparse warnings:
  arch/x86/kvm/i8259.c:34:13: warning: context imbalance in 'pic_lock' - wrong count at exit
  arch/x86/kvm/i8259.c:39:13: warning: context imbalance in 'pic_unlock' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: is_long_mode() should check for EFER.LMA
Amit Shah [Thu, 28 Feb 2008 10:36:15 +0000 (16:06 +0530)]
KVM: is_long_mode() should check for EFER.LMA

is_long_mode currently checks the LongModeEnable bit in
EFER instead of the LongModeActive bit. This is wrong, but
we survived this till now since it wasn't triggered. This
breaks guests that go from long mode to compatibility mode.

This is noticed on a solaris guest and fixes bug #1842160

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Update necessary state when guest enters long mode
Amit Shah [Fri, 20 Feb 2009 17:23:37 +0000 (22:53 +0530)]
KVM: VMX: Update necessary state when guest enters long mode

setup_msrs() should be called when entering long mode to save the
shadow state for the 64-bit guest state.

Using vmx_set_efer() in enter_lmode() removes some duplicated code
and also ensures we call setup_msrs(). We can safely pass the value
of shadow_efer to vmx_set_efer() as no other bits in the efer change
while enabling long mode (guest first sets EFER.LME, then sets CR0.PG
which causes a vmexit where we activate long mode).

With this fix, is_long_mode() can check for EFER.LMA set instead of
EFER.LME and 5e23049e86dd298b72e206b420513dbc3a240cd9 can be reverted.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: ia64: Fix the build errors due to lack of macros related to MSI.
Xiantao Zhang [Mon, 16 Feb 2009 07:24:05 +0000 (15:24 +0800)]
KVM: ia64: Fix the build errors due to lack of macros related to MSI.

Include the newly introduced msidef.h to solve the build issues.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoia64: Move the macro definitions related to MSI to one header file.
Xiantao Zhang [Mon, 16 Feb 2009 07:14:48 +0000 (15:14 +0800)]
ia64: Move the macro definitions related to MSI to one header file.

For kvm's MSI support, it needs these macros defined in ia64_msi.c, and
to avoid duplicate them, move them to one header file and share with
kvm.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: fix kvm_vm_ioctl_deassign_device
Weidong Han [Fri, 13 Feb 2009 09:27:51 +0000 (17:27 +0800)]
KVM: fix kvm_vm_ioctl_deassign_device

only need to set assigned_dev_id for deassignment, use
match->flags to judge and deassign it.

Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: define KVM_CAP_DEVICE_DEASSIGNMENT
Weidong Han [Fri, 13 Feb 2009 02:50:56 +0000 (10:50 +0800)]
KVM: define KVM_CAP_DEVICE_DEASSIGNMENT

define KVM_CAP_DEVICE_DEASSIGNMENT and KVM_DEASSIGN_PCI_DEVICE
for device deassignment.

the ioctl has been already implemented in the
commit: 0a920356748df4fb06e86c21c23d2ed6d31d37ad

Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: ppc: Add emulation of E500 register mmucsr0
Liu Yu [Tue, 17 Feb 2009 08:52:08 +0000 (16:52 +0800)]
KVM: ppc: Add emulation of E500 register mmucsr0

Latest kernel flushes TLB via mmucsr0.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: Report IRQ injection status for MSI delivered interrupts
Gleb Natapov [Mon, 23 Feb 2009 10:57:11 +0000 (12:57 +0200)]
KVM: Report IRQ injection status for MSI delivered interrupts

Return number of CPUs interrupt was successfully injected into or -1 if
none.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
16 years agoKVM: MMU: Fix another largepage memory leak
Joerg Roedel [Thu, 19 Feb 2009 11:18:56 +0000 (12:18 +0100)]
KVM: MMU: Fix another largepage memory leak

In the paging_fetch function rmap_remove is called after setting a large
pte to non-present. This causes rmap_remove to not drop the reference to
the large page. The result is a memory leak of that page.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>