Steven Rostedt [Thu, 12 Mar 2009 18:23:17 +0000 (14:23 -0400)]
tracing: export trace formats to user space
The binary printk saves a pointer to the format string in the ring buffer.
On output, the format is processed. But if the user is reading the
ring buffer through a binary interface, the pointer is meaningless.
This patch creates a file called printk_formats that maps the pointers
to the formats.
Steven Rostedt [Thu, 12 Mar 2009 18:19:25 +0000 (14:19 -0400)]
tracing: have event_trace_printk use static tracer
Impact: speed up on event tracing
The event_trace_printk is currently a wrapper function that calls
trace_vprintk. Because it uses a variable for the fmt it misses out
on the optimization of using the binary printk.
This patch makes event_trace_printk into a macro wrapper to use the
fmt as the same as the trace_printks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
tracing/core: bring back raw trace_printk for dynamic formats strings
Impact: fix callsites with dynamic format strings
Since its new binary implementation, trace_printk() internally uses static
containers for the format strings on each callsites. But the value is
assigned once at build time, which means that it can't take dynamic
formats.
So this patch unearthes the raw trace_printk implementation for the callers
that will need trace_printk to be able to carry these dynamic format
strings. The trace_printk() macro will use the appropriate implementation
for each callsite. Most of the time however, the binary implementation will
still be used.
The other impact of this patch is that mmiotrace_printk() will use the old
implementation because it calls the low level trace_vprintk and we can't
guess here whether the format passed in it is dynamic or not.
Some parts of this patch have been written by Steven Rostedt (most notably
the part that chooses the appropriate implementation for each callsites).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 12 Mar 2009 17:53:25 +0000 (13:53 -0400)]
tracing: show that buffer size is not expanded
Impact: do not confuse user on small trace buffer sizes
When the system boots up, the trace buffer is small to conserve memory.
It is only two pages per online CPU. When the tracer is used, it expands
to the default value.
This can confuse the user if they look at the buffer size and see only
7, but then later they see 1408.
Steven Rostedt [Thu, 12 Mar 2009 17:13:49 +0000 (13:13 -0400)]
ring-buffer: remove unneeded get_online_cpus
Impact: speed up and remove possible races
The get_online_cpus was added to the ring buffer because the original
design would free the ring buffer on a CPU that was being taken
off line. The final design kept the ring buffer around even when the
CPU was taken off line. This is to allow a user to still read the
information on that ring buffer.
Most of the get_online_cpus are no longer needed since the ring buffer will
not disappear from the use cases.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 12 Mar 2009 15:33:20 +0000 (11:33 -0400)]
tracing: protect ring_buffer_expanded with trace_types_lock
Impact: prevent races with ring_buffer_expanded
This patch places the expanding of the tracing buffer under the
protection of the trace_types_lock mutex. It is highly unlikely
that there would be any contention, but better safe than sorry.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Darren Hart [Thu, 12 Mar 2009 22:11:18 +0000 (15:11 -0700)]
futex: remove the pointer math from double_unlock_hb
Impact: simplify code
I mistakenly included the pointer value ordering in the
double_unlock_hb() in my previous patch. It's only necessary
in the double_lock_hb() function. This patch removes it.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312221118.11146.68610.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tao Ma [Thu, 12 Mar 2009 00:37:34 +0000 (08:37 +0800)]
ocfs2: Use xs->bucket to set xattr value outside
A long time ago, xs->base is allocated a 4K size and all the contents
in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to
abstract xattr bucket and xs->base is initialized to the start of the
bu_bhs[0]. So xs->base + offset will overflow when the value root is
stored outside the first block.
Then why we can survive the xattr test by now? It is because we always
read the bucket contiguously now and kernel mm allocate continguous
memory for us. We are lucky, but we should fix it. So just get the
right value root as other callers do.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
kbuild: remove unused -r option for module-init-tool depmod
kbuild: fix 'make rpm' when CONFIG_LOCALVERSION_AUTO=y and using SCM tree
kbuild: fix mkspec to cleanup RPM_BUILD_ROOT
kbuild: fix C libary confusion in unifdef.c due to getline()
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
cpumask: mm_cpumask for accessing the struct mm_struct's cpu_vm_mask.
cpumask: tsk_cpumask for accessing the struct task_struct's cpus_allowed.
Linus Torvalds [Thu, 12 Mar 2009 23:25:04 +0000 (16:25 -0700)]
Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
hwmon: (f75375s) Remove unnecessary and confusing initialization
hwmon: (it87) Properly decode -128 degrees C temperature
hwmon: (lm90) Document support for the MAX6648/6692 chips
hwmon: (abituguru3) Fix I/O error handling
Linus Torvalds [Thu, 12 Mar 2009 23:22:51 +0000 (16:22 -0700)]
Merge branch 'fixes-20090312' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/pci
* 'fixes-20090312' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/pci:
PCIe: portdrv: call pci_disable_device during remove
pci: Fix typo in message while disabling HT MSI mapping
pci: don't disable too many HT MSI mapping
powerpc/pseries: The RPA PCI hotplug driver depends on EEH
PCIe: AER: during disable, check subordinate before walking
PCI: Add PCI quirk to disable L0s ASPM state for 82575 and 82598
Faisal Latif [Thu, 12 Mar 2009 21:34:59 +0000 (14:34 -0700)]
RDMA/nes: Don't allow userspace QPs to use STag zero
STag zero is a special STag that allows consumers to access any bus
address without registering memory. The nes driver unfortunately
allows STag zero to be used even with QPs created by unprivileged
userspace consumers, which means that any process with direct verbs
access to the nes device can read and write any memory accessible to
the underlying PCI device (usually any memory in the system). Such
access is usually given for cluster software such as MPI to use, so
this is a local privilege escalation bug on most systems running this
driver.
The driver was using STag zero to receive the last streaming mode
data; to allow STag zero to be disabled for unprivileged QPs, the
driver now registers a special MR for this data.
Nick Piggin [Thu, 12 Mar 2009 21:31:38 +0000 (14:31 -0700)]
fs: new inode i_state corruption fix
There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121. There is a script included to
reproduce the problem.
During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem. I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode. This points to memory scribble, or
synchronisation problme.
i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW. Subsequently modifying i_state causes corruption. In
my case it would look like this:
Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.
Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.
After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running. Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.
I'm also testing on ext3 now, and so far no problems there either. I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.
Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net> Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Cc: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Thu, 12 Mar 2009 21:31:36 +0000 (14:31 -0700)]
mfd: add support for WM8351 revision B
No software visible difference from revision A.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Samuel Ortiz <sameo@openedhand.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Spang [Thu, 12 Mar 2009 21:31:34 +0000 (14:31 -0700)]
acer-wmi: fix regression in backlight detection
Currently we disable the Acer WMI backlight device if there is no ACPI
backlight device. As a result, we end up with no backlight device at all.
We should instead disable it if there is an ACPI device, as the other
laptop drivers do. This regression was introduced in febf2d9 ("Acer-WMI:
fingers off backlight if video.ko is serving this functionality").
Each laptop driver with backlight support got a similar change around febf2d9. The changes to the other drivers look correct; see e.g. a598c82f for a similar but correct change. The regression is also in
2.6.28.
Signed-off-by: Michael Spang <mspang@csclub.uwaterloo.ca> Acked-by: Thomas Renninger <trenn@suse.de> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Carlos Corbacho <carlos@strangeworlds.co.uk> Cc: Len Brown <len.brown@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Thu, 12 Mar 2009 21:31:33 +0000 (14:31 -0700)]
mmc: s3cmci: fix s3c2410_dma_config() arguments.
The s3cmci driver is calling s3c2410_dma_config with incorrect data for
the DCON register. The S3C2410_DCON_HWTRIG is implicit in the channel
configuration and the device selection of S3C2410_DCON_CH0_SDI is
incorrect as the DMA system may not select channel 0.
Signed-off-by: Ben Dooks <ben@simtec.co.uk> Acked-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Kerrisk [Thu, 12 Mar 2009 21:31:32 +0000 (14:31 -0700)]
MAINTAINERS: downgrade support for man-pages
Unfortunately, Linux Foundation funding for my work on
man-pages/testing/doc under the auspices of the LF documentation
fellowship unfortunately ran out a short while ago (after earlier attempts
to seek funding, only Google stepped forward with a bit of further funding
for the position), so the patch below acknowledges something closer to
reality.
Unfortunately, there will (probably very) soon be a further downgrade from
"Maintained" to "Odd Fixes" or "Orphan", unless some funding miracle
occurs. So, if anyone is looking to become man-pages maintainer, there
may soon be an opening (okay, don't trample me in the rush ;-).)
Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Mack [Thu, 12 Mar 2009 21:31:30 +0000 (14:31 -0700)]
ds2760_battery.c: fix division by zero
The 'battery remaining capacity' calculation in
drivers/power/ds2760_battery.c lacks a parameter check to a division
operation which causes the kernel to oops on my board.
[ 21.233750] Division by zero in kernel.
[ 21.237646] [<c002955c>] (__div0+0x0/0x20) from [<c012561c>] (Ldiv0+0x8/0x10)
[ 21.244816] [<c01bef34>] (ds2760_battery_read_status+0x0/0x2a4) from [<c01bf3a4>] (ds2760_battery_get_property+0x30/0xdc)
[ 21.255803] r8:c03a22c0 r7:c7886100 r6:00000009 r5:c782fe7c r4:c7886084
[ 21.262518] [<c01bf374>] (ds2760_battery_get_property+0x0/0xdc) from [<c01bde98>] (power_supply_show_property+0x48/0x114)
[ 21.273480] r6:c7996000 r5:00000009 r4:00000000
[ 21.278111] [<c01bde50>] (power_supply_show_property+0x0/0x114) from [<c01be158>] (power_supply_uevent+0x188/0x280)
[ 21.288537] r8:00000001 r7:c7886100 r6:c7996000 r5:000000b4 r4:00000000
[ 21.295222] [<c01bdfd0>] (power_supply_uevent+0x0/0x280) from [<c015c664>] (dev_uevent+0xd4/0x10c)
[ 21.304199] [<c015c590>] (dev_uevent+0x0/0x10c) from [<c0128440>] (kobject_uevent_env+0x180/0x390)
[ 21.313170] r5:00000000 r4:c78860ac
[ 21.316725] [<c01282c0>] (kobject_uevent_env+0x0/0x390) from [<c0128664>] (kobject_uevent+0x14/0x18)
[ 21.325850] [<c0128650>] (kobject_uevent+0x0/0x18) from [<c01bdc34>] (power_supply_changed_work+0x5c/0x70)
[ 21.335506] [<c01bdbd8>] (power_supply_changed_work+0x0/0x70) from [<c004d290>] (run_workqueue+0xbc/0x144)
[ 21.345167] r4:c7812040
[ 21.347716] [<c004d1d4>] (run_workqueue+0x0/0x144) from [<c004d94c>] (worker_thread+0xa8/0xbc)
[ 21.356296] r7:c7812040 r6:c7820b00 r5:c782ffa4 r4:c7812048
[ 21.361957] [<c004d8a4>] (worker_thread+0x0/0xbc) from [<c0051008>] (kthread+0x5c/0x94)
[ 21.369971] r7:00000000 r6:c004d8a4 r5:c7812040 r4:c782e000
[ 21.375612] [<c0050fac>] (kthread+0x0/0x94) from [<c00403d0>] (do_exit+0x0/0x688)
Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Acked-by: Matt Reimer <mreimer@vpop.net> Acked-by: Anton Vorontsov <cbou@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 12 Mar 2009 21:31:29 +0000 (14:31 -0700)]
vfs: add missing unlock in sget()
In sget(), destroy_super(s) is called with s->s_umount held, which makes
lockdep unhappy.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Thu, 12 Mar 2009 21:31:28 +0000 (14:31 -0700)]
pipe_rdwr_fasync: fix the error handling to prevent the leak/crash
If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
but leaves the file on ->fasync_readers.
This was always wrong, but since 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c
"saner FASYNC handling on file close" we have the new problem. Because in
this case setfl() doesn't set FASYNC bit, __fput() will not do
->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
freed file.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Mack [Thu, 12 Mar 2009 21:31:25 +0000 (14:31 -0700)]
drivers/w1/masters/w1-gpio.c: fix read_bit()
W1 master implementations are expected to return 0 or 1 from their
read_bit() function. However, not all platforms do return these values
from gpio_get_value() - namely PXAs won't. Hence the w1 gpio-master needs
to break the result down to 0 or 1 itself.
Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Ville Syrjala <syrjala@sci.fi> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Renzo Davoli [Thu, 12 Mar 2009 21:31:23 +0000 (14:31 -0700)]
UML on UML fixed: it did not start
It is currently impossible to run a user-mode linux machine inside another
user-mode linux (UML on UML). It breaks after a few instructions. When
it tries to check whether SYSEMU is installed (the inner) UML receives an
inconsistent result (from the outer UML).
This is the output of a broken attempt:
$ ./linux mem=256m ubd0=cow
Locating the bottom of the address space ... 0x0
Locating the top of the address space ... 0xc0000000
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...OK
Checking ptrace new tags for syscall emulation...unsupported
Checking syscall emulation patch for ptrace...check_sysemu : expected SIGTRAP, got status = 256
$
The problem is the following:
PTRACE_SYSCALL/SINGLESTEP is currently managed inside arch_ptrace for ARCH=um.
PTRACE_SYSEMU/SUSEMU_SINGLESTEP is not captured in arch_ptrace's switch,
therefore it is erroneously passed back to ptrace_request (in
kernel/ptrace).
This simple patch simply forces ptrace to return an error on
PTRACE_SYSEMU/SUSEMU_SINGLESTEP as it is unsupported on ARCH=um, and fixes
the problem.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Renzo Davoli <renzo@cs.unibo.it> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Thu, 12 Mar 2009 17:03:48 +0000 (17:03 +0000)]
[ARM] Fix virtual to physical translation macro corner cases
The current use of these macros works well when the conversion is
entirely linear. In this case, we can be assured that the following
holds true:
__va(p + s) - s = __va(p)
However, this is not always the case, especially when there is a
non-linear conversion (eg, when there is a 3.5GB hole in memory.)
In this case, if 's' is the size of the region (eg, PAGE_SIZE) and
'p' is the final page, the above is most definitely not true.
So, we must ensure that __va() and __pa() are only used with valid
kernel direct mapped RAM addresses. This patch tweaks the code
to achieve this.
Tested-by: Charles Moschel <fred99@carolina.rr.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5421/1: ftrace: fix crash due to tracing of __naked functions
This is a fix for the following crash observed in 2.6.29-rc3:
http://lkml.org/lkml/2009/1/29/150
On ARM it doesn't make sense to trace a naked function because then
mcount is called without stack and frame pointer being set up and there
is no chance to restore the lr register to the value before mcount was
called.
Reported-by: Matthias Kaehlcke <matthias@kaehlcke.net> Tested-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: Abhishek Sagar <sagar.abhishek@gmail.com> Cc: Steven Rostedt <rostedt@home.goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
H. Peter Anvin [Thu, 12 Mar 2009 20:43:14 +0000 (13:43 -0700)]
x86: use targets in the boot Makefile instead of CLEAN_FILES
Impact: cleanup
Instead of using CLEAN_FILES in arch/x86/Makefile, add generated files
to targets in arch/x86/boot/Makefile, so they will get naturally
cleaned up by "make clean".
Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Thu, 12 Mar 2009 19:50:33 +0000 (12:50 -0700)]
x86: remove additional vestiges of the zImage/bzImage split
Impact: cleanup
Remove targets that were used for zImage only, and Makefile
infrastructure that was there to support the zImage/bzImage split.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
LKML-Reference: <1236879901.24144.26.camel@test.thuisdomein> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Paul Walmsley [Thu, 12 Mar 2009 19:11:43 +0000 (20:11 +0100)]
[ARM] 5422/1: ARM: MMU: add a Non-cacheable Normal executable memory type
This patch adds a Non-cacheable Normal ARM executable memory type,
MT_MEMORY_NONCACHED.
On OMAP3, this is used for rapid dynamic voltage/frequency scaling in
the VDD2 voltage domain. OMAP3's SDRAM controller (SDRC) is in the
VDD2 voltage domain, and its clock frequency must change along with
voltage. The SDRC clock change code cannot run from SDRAM itself,
since SDRAM accesses are paused during the clock change. So the
current implementation of the DVFS code executes from OMAP on-chip
SRAM, aka "OCM RAM."
If the OCM RAM pages are marked as Cacheable, the ARM cache controller
will attempt to flush dirty cache lines to the SDRC, so it can fill
those lines with OCM RAM instruction code. The problem is that the
SDRC is paused during DVFS, and so any SDRAM access causes the ARM MPU
subsystem to hang.
TI's original solution to this problem was to mark the OCM RAM
sections as Strongly Ordered memory, thus preventing caching. This is
overkill: since the memory is marked as non-bufferable, OCM RAM writes
become needlessly slow. The idea of "Strongly Ordered SRAM" is also
conceptually disturbing. Previous LAKML list discussion is here:
Alex Chiang [Fri, 6 Mar 2009 02:28:40 +0000 (19:28 -0700)]
PCIe: AER: during disable, check subordinate before walking
Commit 47a8b0cc (Enable PCIe AER only after checking firmware
support) wants to walk the PCI bus in the remove path to disable
AER, and calls pci_walk_bus for downstream bridges.
Unfortunately, in the remove path, we remove devices and bridges
in a depth-first manner, starting with the furthest downstream
bridge and working our way backwards.
The furthest downstream bridges will not have a dev->subordinate,
and we hit a NULL deref in pci_walk_bus.
Check for dev->subordinate first before attempting to walk the
PCI hierarchy below us.
Acked-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Alexander Duyck [Thu, 5 Mar 2009 18:57:28 +0000 (13:57 -0500)]
PCI: Add PCI quirk to disable L0s ASPM state for 82575 and 82598
This patch is intended to disable L0s ASPM link state for 82598 (ixgbe)
parts due to the fact that it is possible to corrupt TX data when coming
back out of L0s on some systems. The workaround had been added for 82575
(igb) previously, but did not use the ASPM api. This quirk uses the ASPM
api to prevent the ASPM subsystem from re-enabling the L0s state.
Instead of adding the fix in igb to the ixgbe driver as well it was
decided to move it into a pci quirk. It is necessary to move the fix out
of the driver and into a pci quirk in order to prevent the issue from
occuring prior to driver load to handle the possibility of the device being
passed to a VM via direct assignment.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Chauhan, Vijay [Wed, 4 Mar 2009 06:47:54 +0000 (12:17 +0530)]
[SCSI] scsi_dh_rdac: Retry mode select for NO_SENSE, ABORTED_COMMAND, UNIT_ATTENTION, NOT_READY(02/04/01)
This patch is to add retry for mode select if mode select command is
returned with sense NO_SENSE, UNIT_ATTENTION, ABORTED_COMMAND,
NOT_READY(02/04/01). This patch reorganise the sense keys from if-else
to switch-case format for better maintainability.
Report the fc_host_port_type as FC_PORTTYPE_NPIV when the subchannel
is running in NPIV mode. This allows to see the correct type with
lsscsi -H -t --list
Swen Schillig [Mon, 2 Mar 2009 12:09:11 +0000 (13:09 +0100)]
[SCSI] zfcp: Ensure all work is cancelled on adapter dequeue
A scheduled work might still be pending, running while the adapter is
in progress to get dequeued from the system. This can lead to an
invalid pointer dereference (Oops). Once the adpater is set online
again, ensure the nameserver environment is initialized to the
appropriate values again.
Swen Schillig [Mon, 2 Mar 2009 12:09:10 +0000 (13:09 +0100)]
[SCSI] zfcp: fix queue, scheduled work processing.
Ensure the refcounting is correct even if we were not able to
schedule a work. In addition we have to make sure no scheduled
work is pending while we're dequeing the adapter from the
systems environment.
[SCSI] zfcp: erp failed status bit will not be set
It will not be necessary to set the erp failed status bit
in case a SCSI device is removed by the SCSI mid layer.
In the case a SCSI device is unavailable for a short time
(15 to 20 seconds) a FCP unit will not get on-line again.
Signed-off-by: Martin Petermann <martin@linux.vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] zfcp: Block FC transport rports early on errors
Use the I/O blocking mechanism in the FC transport class to allow
faster failovers for multipathing:
- Call fc_remote_port_delete early to set the rport to BLOCKED.
- Check the rport status in queuecommand with fc_remote_portchkready
to no longer accept new I/O for this port and fail the I/O with the
appropriate scsi_cmnd result.
- Implement the terminate_rport_io handler to abort all pending I/O
requests
- Return SCSI commands with DID_TRANSPORT_DISRUPTED while erp is
running.
- When updating the remote port status, check for late changes and
update the remote ports status accordingly.
Swen Schillig [Mon, 2 Mar 2009 12:09:07 +0000 (13:09 +0100)]
[SCSI] zfcp: incorrect reaction on incoming RSCN
After an error condition resolved a remote storage port was never
re-opened. The incoming RSCN was not processed accordingly due
to a misinterpreted status flag / return value combination.
The usage of the PCI flag to trigger interrupts is optional. Even
without setting the flag, qdio still receives interrupts to continue
working on the queue. Remove the PCI flag from zfcp, it is not
necessary.
Swen Schillig [Mon, 2 Mar 2009 12:09:04 +0000 (13:09 +0100)]
[SCSI] zfcp: replace current ERP logging with a more convenient version
The current number based id ERP logging is replaced by a string
based tag version. The benefit is an easier location of the code in
question and the removal of the lengthy array referencing the
individual messages.
The string (7 bytes) based version does not use more space since those
bytes were "used" anyway due to the alignment of the structure.
The encoding of the 7 byte string is as follows
[0-1] = filename
[2-5] = task/function
[6] = section
Due to the character of this string (fixed length) a string
termination is not required here.
Swen Schillig [Mon, 2 Mar 2009 12:09:03 +0000 (13:09 +0100)]
[SCSI] zfcp: prevent adapter close on initial adapter open
An adapter close was always performed whether it was required,
(e.g. in an error scenario) or not (e.g. initial open).
This patch is changing the process in only doing an
adapter close when it is required.
Swen Schillig [Mon, 2 Mar 2009 12:09:02 +0000 (13:09 +0100)]
[SCSI] zfcp: remove undefined subtype for status read response
The status read response FSF_STATUS_READ_SUB_ERROR_PORT is not
defined in the specs and therefore not valid.
All occurrences are removed from the code.
Issue ELS ADISC requests from workqueue. This allows the link test
request to be sent when the request queue is full due to I/O load for
other remote ports. It also simplifies request queue locking,
zfcp_fsf_send_fcp_command_task is now the only function that has
interrupts disabled from the caller. This is also a prereq for the FC
passthrough support that issues ELS requests from userspace.
[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp
When the SCSI midlayer is running error recovery, the low-level error
recovery in zfcp could be running and preventing the SCSI midlayer to
issue error recovery requests. To avoid unnecessary error recovery
escalation, wait for the zfcp erp to finish and retry if necessary.
While reworking the SCSI eh handlers, alsa cleanup the code and
simplify the interface from zfcp_scsi to the fsf layer.
The lock only needs to protect the softirq context called from qdio
against the userspace context called from sysfs. spin_lock and
spin_lock_bh is enough.
Martin Peschke [Mon, 2 Mar 2009 12:08:56 +0000 (13:08 +0100)]
[SCSI] zfcp: add measurement data for average qdio queue utilisation
Provide measurement data for the utilisation of the QDIO outbound queue.
The additional value allows to calculate an average queue utilisation
by looking at the deltas per time unit. Needed for capacity planning.
It is up to user space to handle wrap-arounds of the 64 bit value.
The new counter neatly complements the existing counter for queue full
conditions. That is why, both statistics counter have been integrated.
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the device pointer in zfcp_unit for tracking if we have a
registered SCSI device. With this approach, the flag
ZFCP_STATUS_UNIT_REGISTERED is only redundant and can be removed.
PORT_PHYS_CLOSING is only set and cleared, but not actually used
for status checking.
PORT_INVALID_WWPN is set when the GID_PN request does not return
a d_id for a remote port, e.g. when a remote port has been
unplugged. For this case, the d_id is zero. In the erp we can
check the d_id and use the normal escalation procedure that gives
up after three retries and remove the special case.
PORT_NO_WWPN is unused: Each port in the remote port list has a
valid wwpn. The WKA ports are now tracked outside the port
list. Remove the PORT_NO_WWPN flag, since this is no longer set
for any port.
Alan Stern [Fri, 27 Feb 2009 21:51:42 +0000 (16:51 -0500)]
[SCSI] fix /proc memory leak in the SCSI core
The SCSI core calls scsi_proc_hostdir_add() from within
scsi_host_alloc(), but the corresponding scsi_proc_hostdir_rm()
routine is called from within scsi_remove_host(). As a result, if a
host is allocated and then deallocated without ever being registered,
the host's directory in /proc is leaked.
This patch (as1181b) fixes this bug in the SCSI core by moving
scsi_proc_hostdir_rm() into scsi_host_dev_release().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tony Battersby [Thu, 8 Jan 2009 17:59:08 +0000 (12:59 -0500)]
[SCSI] sym53c8xx: don't flood syslog with negotiation messages
sym53c8xx prints a negotiation message after every check condition.
This can add up to a lot of messages for removable-medium devices
(CD-ROM, tape drives, etc.) that are being polled, since they return
check condition when no medium is present. This patch suppresses the
negotiation message if it would be the same as the last one printed.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tony Battersby [Thu, 8 Jan 2009 17:58:04 +0000 (12:58 -0500)]
[SCSI] sym53c8xx: use a queue depth of 1 for untagged devices
sym53c8xx uses a command queue depth of 2 for untagged devices,
without good reason. This _mostly_ seems to work ok, but it has
caused me some subtle problems. For example, I have an application
where one thread sends write commands to a tape drive, and another
thread sends log sense polling commands. With a queue depth of
2, the polling commands end up being starved for long periods of
time while multiple write commands are serviced (this may also be
related to the fact the the sg driver queues commands in LIFO order).
This problem is fixed by changing the queue depth to 1 for untagged
devices. I have tested this change extensively with many different
tape drives, medium changers, and disk drives (disk drives of course
use tagged commands and are therefore unaffected by this patch).
Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tony Battersby [Thu, 8 Jan 2009 17:56:58 +0000 (12:56 -0500)]
[SCSI] sym53c8xx: handle pci_iomap() failures
sym_init_device() doesn't check if pci_iomap() fails. It also tries
to map device RAM without first checking FE_RAM.
1) Move some initialization from sym_init_device() to the top of
sym2_probe().
2) Rename sym_init_device() to sym_iomap_device().
3) Call sym_iomap_device() after sym_check_supported() instead of
before so that device->chip.features will be set.
4) Check FE_RAM in sym_iomap_device() before mapping RAM.
5) If sym_iomap_device() cannot map registers, then abort.
6) If sym_iomap_device() cannot map RAM, then fall back to not using
RAM and continue.
7) Remove the check for FE_RAM in sym_attach() since dev->ram_base
is now always set correctly.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tony Battersby [Thu, 8 Jan 2009 17:55:52 +0000 (12:55 -0500)]
[SCSI] sym53c8xx: unmap pci memory after probe errors
During sym2_probe(), sym_init_device() does pci_iomap(), but there is
no corresponding pci_iounmap() if an error occurs before sym_attach()
copies sym_device::s.{ioaddr,ramaddr} to np.
1) Add the sym_iounmap_device() function.
2) Call sym_iounmap_device() if an error occurs between
sym_init_device() and the time sym_attach() allocates np.
3) Make sym_attach() copy sym_device::s.{ioaddr,ramaddr} to np before
calling any function that can fail so that sym_free_resources()
will do the unmap instead of sym_iounmap_device().
Also fixed by this patch:
During sym2_probe(), if sym_check_raid() returns nonzero, then
pci_release_regions() is never called.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Wayne Boyer [Tue, 24 Feb 2009 19:36:00 +0000 (11:36 -0800)]
[SCSI] ipr: Expose debug and fastfail parameters
Expose the debug and fastfail parameters to /sys/module/ipr/parameters such
that they can be enabled/disabled at run time.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Alan Stern [Wed, 18 Feb 2009 15:54:44 +0000 (10:54 -0500)]
[SCSI] sd: tell the user when a disk's capacity is adjusted
This patch (as1188) combines the tests for decrementing a drive's
reported capacity and expands the comment. It also adds an
informational message to the system log, informing the user when the
reported value has been changed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Julia Lawall [Wed, 4 Feb 2009 21:17:29 +0000 (22:17 +0100)]
[SCSI] libfc: Correct use of ! and &
!ep->esb_stat is either 1 or 0, and the rightmost bit of ESB_ST_COMPLETE is
always 0, making the result of !ep->esb_stat & ESB_ST_COMPLETE always 0.
Thus parentheses around the argument to ! seem needed.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
Boaz Harrosh [Sun, 8 Feb 2009 16:02:22 +0000 (18:02 +0200)]
[SCSI] libosd: Fix NULL dereference BUG when target is not OSD conformant
Very old OSC's Target had a BUG in the Get/Set attributes where
it was looking in the wrong places for attribute lists length.
If used with the open-osd initiator, the initiator would dereference
a NULL pointer when retrieving system_information attributes.
Checks are added that retrieval of each attribute is successful
before accessing its value.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
You can reproduce this bug by interrupting a program before a sg
response completes. This leads to the special sg state (the orphan
state), then sg calls blk_put_request in interrupt (rq->end_io).
The above bug report shows the recursive lock problem because sg calls
blk_put_request in interrupt. We could call __blk_put_request here
instead however we also need to handle blk_rq_unmap_user here, which
can't be called in interrupt too.
In the orphan state, we don't need to care about the data transfer
(the program revoked the command) so adding 'just free the resource'
mode to blk_rq_unmap_user is a possible option.
I prefer to avoid complicating the blk mapping API when possible. I
change the orphan state to call sg_finish_rem_req via
execute_in_process_context. We hold sg_fd->kref so sg_fd doesn't go
away until keventd_wq finishes our work. copy_from_user/to_user fails
so blk_rq_unmap_user just frees the resource without the data
transfer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Brian King [Tue, 3 Feb 2009 00:39:40 +0000 (18:39 -0600)]
[SCSI] ibmvfc: Better handle other FC initiators
The ibmvfc driver currently always sets the role of all rports
to FC_PORT_ROLE_FCP_TARGET, which is not correct for other initiators.
This can cause problems if other initiators are on the fabric
when we then try to scan the rport for LUNs. Fix this by looking
at the service parameters returned in the PRLI to set the roles
appropriately. Also look at the returned service parameters to
decide whether or not we were actually able to successfully log into
the target.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Chauhan, Vijay [Mon, 26 Jan 2009 15:59:37 +0000 (21:29 +0530)]
[SCSI] scsi_dh_rdac: Retry for Quiescence in Progress in rdac device handler
During device discovery read capacity fails with 0x068b02 and sets the
device size to 0. As a reason any I/O submitted to this path gets
killed at sd_prep_fn with BLKPREP_KILL. This patch is to retry for
0x068b02
Wayne Boyer [Wed, 28 Jan 2009 16:24:50 +0000 (08:24 -0800)]
[SCSI] ipr: add message to error table
Adds a message to the error table for an error that wasn't previously handled.
In some cases the I/O Adapter will detect an error condition and mark a block
as "logically bad".
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Ilgu Hong [Fri, 30 Jan 2009 23:00:11 +0000 (17:00 -0600)]
[SCSI] scsi dh alua: handle report luns data changed in check sense callout
When we switch controllers the Intel Multi-Flex reports
REPORTED_LUNS_DATA_HAS_CHANGED. This patch just has us
retry the command.
Signed-off-by: Ilgu Hong <ilgu.hong@promise.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Ilgu Hong [Fri, 30 Jan 2009 23:00:10 +0000 (17:00 -0600)]
[SCSI] scsi dh alua: add intel Multi-Flex device
This adds the Intel Multi-Flex device to scsi_dh_alua's
scsi_dh_devlist, so the module attaches to these devs.
Signed-off-by: Ilgu Hong <ilgu.hong@promise.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Ilgu Hong [Fri, 30 Jan 2009 23:00:09 +0000 (17:00 -0600)]
[SCSI] scsi dh alua: fix group id masking
The buf[i] is a byte but we are only asking 4 bits off the
group_id. This patch has us take off a byte.
Signed-off-by: Ilgu Hong <ilgu.hong@promise.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 13 Dec 2008 15:55:18 +0000 (00:55 +0900)]
[SCSI] osst: replace scsi_execute_async with the block layer API
This replaces scsi_execute_async with the block layer API. st does the
same thing so it might make sense to have something like libst (there
are other things that os and osst can share).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Willem Riede <osst@riede.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 13 Dec 2008 15:55:17 +0000 (00:55 +0900)]
[SCSI] osst: make all the buffer the same size
This simiplifies the buffer management; all the buffers in osst_buffer
become the same size. This is necessary to use the block layer API (sg
driver was modified in the same way) since the block layer API takes
the same size page frames instead of scatter gatter.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Willem Riede <osst@riede.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>