Use a separate buffer for writing zeroes to the on-disk snapshot
exception store, make the updating of ps->current_area explicit and
refactor the code in preparation for the fix in the next patch.
Mikulas Patocka [Tue, 21 Oct 2008 16:44:51 +0000 (17:44 +0100)]
dm snapshot: fix primary_pe race
Fix a race condition with primary_pe ref_count handling.
put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.
__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.
This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.
CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
origin_bios = bio_list_get(&primary_pe->origin_bios);
... decrements primary_pe->ref_count to 1. Doesn't load origin_bios
CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
flush_bios(bio_list_get(&primary_pe->origin_bios));
free_pending_exception(primary_pe);
/* If we got here, pe_queue is necessarily empty. */
return r;
}
... decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.
CPU1:
if (!primary_pe || primary_pe != pe)
free_pending_exception(pe);
... this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
free_pending_exception(primary_pe);
... sees ref_count == 0 (written by CPU 2), does double free !!
This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.
If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).
If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.
The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).
This race may be reason for bugs:
http://bugzilla.kernel.org/show_bug.cgi?id=11636
https://bugzilla.redhat.com/show_bug.cgi?id=465825
The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.
Kazuo Ito [Tue, 21 Oct 2008 16:44:50 +0000 (17:44 +0100)]
dm kcopyd: avoid queue shuffle
Write throughput to LVM snapshot origin volume is an order
of magnitude slower than those to LV without snapshots or
snapshot target volumes, especially in the case of sequential
writes with O_SYNC on.
The following patch originally written by Kevin Jamieson and
Jan Blunck and slightly modified for the current RCs by myself
tries to improve the performance by modifying the behaviour
of kcopyd, so that it pushes back an I/O job to the head of
the job queue instead of the tail as process_jobs() currently
does when it has to wait for free pages. This way, write
requests aren't shuffled to cause extra seeks.
I tested the patch against 2.6.27-rc5 and got the following results.
The test is a dd command writing to snapshot origin followed by fsync
to the file just created/updated. A couple of filesystem benchmarks
gave me similar results in case of sequential writes, while random
writes didn't suffer much.
dd if=/dev/zero of=<somewhere on snapshot origin> bs=4096 count=...
[conv=notrunc when updating]
1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
average throughput (MB/s)
10M 100M 1000M
create,dd 511.46 610.72 11.81
create,dd+fsync 7.10 6.77 8.13
update,dd 431.63 917.41 12.75
update,dd+fsync 7.79 7.43 8.12
compared with write throughput to LV without any snapshots,
all dd+fsync and 1000 MiB writes perform very poorly.
Although still not on par with plain LV performance -
cannot be avoided because it's copy on write anyway -
this simple patch successfully improves throughtput
of dd+fsync while not affecting the rest.
Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Kazuo Ito <ito.kazuo@oss.ntt.co.jp> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
Darron Broad [Tue, 21 Oct 2008 14:28:46 +0000 (11:28 -0300)]
V4L/DVB (9335): videobuf: split unregister bus creating self-contained frontend de-allocator
This creates a self contained frontend de-allocator
for the instances where an adapter has not been
registered yet frontend de-allocation may
be required.
V4L/DVB (9333): cx88: Not all boards that requires cx88-mpeg has frontends
The multifrontend changes on cx88 assumed that all boards that use cx88-mpeg
supports DVB. This is not true. There also a few analog-only boards based on
Blackboard design that also uses cx88-mpeg. For those boards, there's no need
to allocate dvb frontends.
This patch fixes videobuf allocation for those devices.
V4L/DVB (9331): Remove unused inode parameter from video_ioctl2
inode is never used on video_ioctl2. Remove it and rename the function to
__video_ioctl2. This allows its usage directly as a callback at
fops.unlocked_ioctl.
Since we still need a callback with inode to be used with fops.ioctl,
this patch adds video_ioctl2() that is just a call to __video_ioctl2().
Also, this patch adds some comments about video_ioctl2 and __video_ioctl2
usage at v4l2-ioctl.h.
V4L/DVB (9330): Get rid of inode parameter at v4l_compat_translate_ioctl()
The inode parameter at v4l_compat_translate_ioctl() were just passed over several
places just to keep compatible with fops.ioctl. However, it weren't used anywere.
Ian Armstrong [Sun, 19 Oct 2008 21:58:26 +0000 (18:58 -0300)]
V4L/DVB (9328): ivtvfb: FB_BLANK_POWERDOWN turns off video output
When using FBIOBLANK, FB_BLANK_POWERDOWN will now switch off the video output.
Since some televisions turn themselves off after a while with no signal, this
is the closest we can get to power-saving.
Signed-off-by: Ian Armstrong <ian@iarmst.demon.co.uk> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Hans Verkuil [Sun, 19 Oct 2008 21:54:26 +0000 (18:54 -0300)]
V4L/DVB (9327): v4l: use video_device.num instead of minor in video%d
The kernel number of a v4l2 node (e.g. videoX, radioX or vbiX) is now
independent of the minor number. So instead of using the minor field
of the video_device struct one has to use the num field: this always
contains the kernel number of the device node.
I forgot about this when I did the v4l2 core change, so this patch
converts all drivers that use it in one go. Luckily the change is
trivial.
Sakari Ailus [Sat, 18 Oct 2008 15:23:45 +0000 (12:23 -0300)]
V4L/DVB (9318): v4l2-int-if: Add command to get slave private data.
vidioc_int_g_priv is used to get master's slave-related private data
structure. The structure can contain for example master's configuration
specific to slave.
Signed-off-by: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L/DVB (9315): s5h1411: Skip reconfiguring demod modulation if already at the desired modulation
If we are already at the desired modulation, there is no need to reconfigure
the demod (at a tuning time cost)
Note that this change revealed that although the datasheet says the demod
starts out in VSB-8 mode, the first tuning was failing consistently unless
we went through the work of setting the registers. So add a field to denote
this case so we always do the enable_frontend call, even if the first tuning
request is for VSB-8.
Signed-off-by: Devin Heitmueller <devin.heitmueller@gmail.com> Reviewed-by: Michael Krufky <mkrufky@linuxtv.org> Acked-by: Steven Toth <stoth@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L/DVB (9314): s5h1411: Perform s5h1411 soft reset after tuning
If you instruct the tuner to change frequencies, it can take up to 2500ms to
get a demod lock. By performing a soft reset after the tuning call (which
is consistent with how the Pinnacle 801e Windows driver behaves), you get
a demod lock inside of 300ms
Signed-off-by: Devin Heitmueller <devin.heitmueller@gmail.com> Reviewed-by: Michael Krufky <mkrufky@linuxtv.org> Acked-by: Steven Toth <stoth@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mike Isely [Sun, 19 Oct 2008 19:26:05 +0000 (16:26 -0300)]
V4L/DVB (9300): pvrusb2: Fix deadlock problem
Fix deadlock problem in 2.6.27 caused by new USB core behavior in
response to a USB device reset request. With older kernels, the USB
device reset was "in line"; the reset simply took place and the driver
retained its association with the hardware. However now this reset
triggers a disconnect, and worse still the disconnect callback happens
in the context of the caller who asked for the device reset. This
results in an attempt by the pvrusb2 driver to recursively take a
mutex it already has, which deadlocks the driver's worker thread.
(Even if the disconnect callback were to happen on a different thread
we'd still have problems however - because while the driver should
survive and correctly disconnect / reconnect, it will then trigger
another device reset during the repeated initialization, which will
then cause another disconect, etc, forever.) The fix here is simply
to not attempt the device reset (it was of marginal value anyway).
Signed-off-by: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Andy Walls [Sat, 18 Oct 2008 13:20:25 +0000 (10:20 -0300)]
V4L/DVB (9299): cx18: Don't mask many real init error codes by mapping them to ENOMEM
Changes to let error return codes bubble up to the user visible
error message on card initialization. A number of them were being remapped to
ENOMEM when no memory or array resource shortage existed. That hampered
diagnosis of user trouble reports.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Vedran Miletic [Tue, 21 Oct 2008 15:42:54 +0000 (17:42 +0200)]
ALSA: emu10k1: fix device names for Live!/Audigy1/2/4/E-mu
* added missing SBxxxx, CTxxxx, PCxxx and MAEMxxxx where they were missing,
and fixed some of them which were wrong (according to kx.inf, which is pretty
accurate compared to anything out there)
* fixed device names to make them more consistent across various cards
* fixed order of devices where appropriate
Takashi Iwai [Tue, 21 Oct 2008 15:01:47 +0000 (17:01 +0200)]
ALSA: hda - Fix conflicting volume controls on ALC260
ALC260 auto-parsing mode may create multiple controls for the same volume
widget (0x08 and 0x09) depending on the pin. For example, Front and
Headphone volumes may control the same volume, just the latter one wins.
This patch adds a proper check of the existing of the volume control
and avoid the doulbed creation of the same volume controls.
Eric Miao [Mon, 20 Oct 2008 02:35:19 +0000 (10:35 +0800)]
[ARM] pxa: update {corgi,spitz}_defconfig to favor SPI-based drivers
Recent changes have been made to use the generic SPI-based ads7846
touch screen driver and a generic SPI-based corgi-type LCD/backlight
driver. Update the {corgi,spitz}_defconfig to favor the use of these
drivers instead of the legacy ones.
Jeff Layton [Tue, 21 Oct 2008 14:42:13 +0000 (14:42 +0000)]
[CIFS] fix saving of resume key before CIFSFindNext
We recently fixed the cifs readdir code so that it saves the resume key
before calling CIFSFindNext. Unfortunately, this assumes that we have
just done a CIFSFindFirst (or FindNext) and have resume info to save.
This isn't necessarily the case. Fix the code to save resume info if we
had to reinitiate the search, and after a FindNext.
This fixes connectathon basic test6 against NetApp filers.
Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
Lai Jiangshan [Fri, 17 Oct 2008 06:40:30 +0000 (14:40 +0800)]
rcupdate: fix bug of rcu_barrier*()
current rcu_barrier_bh() is like this:
void rcu_barrier_bh(void)
{
BUG_ON(in_interrupt());
/* Take cpucontrol mutex to protect against CPU hotplug */
mutex_lock(&rcu_barrier_mutex);
init_completion(&rcu_barrier_completion);
atomic_set(&rcu_barrier_cpu_count, 0);
/*
* The queueing of callbacks in all CPUs must be atomic with
* respect to RCU, otherwise one CPU may queue a callback,
* wait for a grace period, decrement barrier count and call
* complete(), while other CPUs have not yet queued anything.
* So, we need to make sure that grace periods cannot complete
* until all the callbacks are queued.
*/
rcu_read_lock();
on_each_cpu(rcu_barrier_func, (void *)RCU_BARRIER_BH, 1);
rcu_read_unlock();
wait_for_completion(&rcu_barrier_completion);
mutex_unlock(&rcu_barrier_mutex);
}
The inconsistency of the code and the comments show a bug here.
rcu_read_lock() cannot make sure that "grace periods for RCU_BH
cannot complete until all the callbacks are queued".
it only make sure that race periods for RCU cannot complete
until all the callbacks are queued.
so we must use rcu_read_lock_bh() for rcu_barrier_bh().
like this:
and also rcu_barrier() rcu_barrier_sched() are implemented like this.
it will bring a lot of duplicate code. My patch uses another way to
fix this bug, please see the comment of my patch.
Thank Paul E. McKenney for he rewrote the comment.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dean Nelson [Sat, 18 Oct 2008 23:06:56 +0000 (16:06 -0700)]
genirq: NULL struct irq_desc's member 'name' in dynamic_irq_cleanup()
If the member 'name' of the irq_desc structure happens to point to a
character string that is resident within a kernel module, problems ensue
if that module is rmmod'd (at which time dynamic_irq_cleanup() is called)
and then later show_interrupts() is called by someone.
It is also not a good thing if the character string resided in kmalloc'd
space that has been kfree'd (after having called dynamic_irq_cleanup()).
dynamic_irq_cleanup() fails to NULL the 'name' member and
show_interrupts() references it on a few architectures (like h8300, sh and
x86).
Signed-off-by: Dean Nelson <dcn@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Magnus Damm [Tue, 21 Oct 2008 12:31:07 +0000 (21:31 +0900)]
sh: Update gpio_set_value() pin value handling
This patch updates the pinmux code to use the boolean value for
the function gpio_set_value(). Without this patch values other
than 0 and 1 will result in incorrect GPIO settings.
Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Al Viro [Thu, 18 Sep 2008 19:53:24 +0000 (15:53 -0400)]
[PATCH] get rid of blkdev_locked_ioctl()
Most of that stuff doesn't need BKL at all; expand in the (only) caller,
merge the switch into one there and leave BKL only around the stuff that
might actually need it.