[SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out
Yanling Qi from LSI found the root cause of the panic, below is his
analysis:
Problem description: the open iscsi driver installs eh_timed_out handler
to the
blank_transport_template of the scsi middle level that causes panic of
timed
out command of other host
Here are the details
Iscsi Session creation
During iscsi session creation time, the iscsi_tcp_session_create() of
iscsi_tpc.c will create a scsi-host for the session. See the statement
marked
with the label A. The statement B replaces the shost->transportt point
with a
local struct variable.
A shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth);
if (!shost)
return NULL;
B shost->transportt = iscsi_tcp_scsi_transport;
shost->max_lun = iscsi_max_lun;
Please note the scsi host is allocated by invoking isccsi_host_alloc()
in
libiscsi.c
Polluting the middle level blank_transport_template in
iscsi_host_alloc() of
libiscsi.c
The iscsi_host_alloc() invokes the middle level function
scsi_host_alloc() in
hosts.c for allocating a scsi_host. Then the statement marked with C
assigns
the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback
function.
C shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
Please note the shost->transport is the middle level
blank_transport_template
as shown in the code segment below. We see two problems here. 1.
iscsi_eh_cmd_timed_out is installed to the blank_transport_template that
will
cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be
invoked
when iscsi command gets timeout because the statement B resets the
pointer.
Middle level blank_transport_template
In the middle level function scsi_host_alloc() of hosts.c, the middle
level
assigns a blank_transport_template for those hosts not implementing its
transport layer. All HBAs without supporting a specific scsi_transport
will
share the middle level blank_transport_template. Please see the
statement D
struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int
privsize)
{
struct Scsi_Host *shost;
gfp_t gfp_mask = GFP_KERNEL;
int rval;
if (sht->unchecked_isa_dma && privsize)
gfp_mask |= __GFP_DMA;
shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
shost->dma_channel = 0xff;
/* These three are default values which can be overridden */
shost->max_channel = 0;
shost->max_id = 8;
shost->max_lun = 8;
/* Give each shost a default transportt */
D shost->transportt = &blank_transport_template;
Why we see panic at iscsi_eh_cmd_timed_out()
The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore,
the
blank_transport_template will be assigned to the virtual host of the MPP
virtual HBA by SCSI middle level. Please note that the statement C has
assigned
iscsi-transport eh_timedout handler to the blank_transport_template.
When a mpp
virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be
invoked to
handle mpp virtual command timeout from the middle level
scsi_times_out()
function of the scsi_error.c.
It is very easy to understand why we get panic in the
iscsi_eh_cmd_timed_out().
A scsi_cmnd from a no-iscsi device definitely can not resolve out a
session and
session->lock. The panic can be happed anywhere during the differencing.
This patch fixes the problem by moving the setting of the
iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs
have set their transport template to shost->transportt.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>