aboutsummaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2010-03-15SCSI: qla1280: Drop host_lock while requesting firmwareBen Hutchings
commit 2cec802980727f1daa46d8c31b411e083d49d7a2 upstream. request_firmware() may sleep and it appears to be safe to release the spinlock here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15SCSI: qla2xxx: Obtain proper host structure during response-queue processing.Anirban Chakraborty
commit a67093d46e3caed1a42d694a7de452b61db30562 upstream. Original code incorrectly assumed only status-type-0 IOCBs would be queued to the response-queue, and thus all entries would safely reference a VHA from the IOCB 'handle.' Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15mpt2sas: Delete volume before HBA detach.Kashyap, Desai
commit d7384b28afb2bf2b7be835ddc8c852bdc5e0ce1c upstream. The driver hangs when doing `rmmod mpt2sas` if there are any IR volumes present.The hang is due the scsi midlayer trying to access the IR volumes after the driver releases controller resources. Perhaps when scsi_remove_host is called,the scsi mid layer is sending some request. This doesn't occur for bare drives becuase the driver is already reporting those drives deleted prior to calling mpt2sas_base_detach. To solve this issue, we need to delete the volumes as well. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15ARM: 5944/1: scsi: fix timer setup in fas216.cGuennadi Liakhovetski
commit b857df1acc634b18db1db2a40864af985100266e upstream. mod_timer() takes an absolute time and not a delay as its argument. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09scsi_lib: Fix bug in completion of bidi commandsBoaz Harrosh
commit 63c43b0ec1765b74c734d465ba6345ef4f434df8 upstream. Because of the terrible structuring of scsi-bidi-commands it breaks some of the life time rules of a scsi-command. It is now not allowed to free up the block-request before cleanup and partial deallocation of the scsi-command. (Which is not so for none bidi commands) The right fix to this problem would be to make bidi command a first citizen by allocating a scsi_sdb pointer at scsi command just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated as part of the get/put_command (Again like the prot_sdb) and the current decoupling of scsi_cmnd and blk-request should be kept. For now make sure scsi_release_buffers() is called before the call to blk_end_request_all() which might cause the suicide of the block requests. At best the leak of bidi buffers, at worse a crash, as there is a race between the existence of the bidi_request and the free of the associated bidi_sdb. The reason this was never hit before is because only OSD has the potential of doing asynchronous bidi commands. (So does bsg but it is never used) And OSD clients just happen to do all their bidi commands synchronously, up until recently. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: Fix getting san mac for VLAN interfaceYi Zou
commit 5bab87e6d465d54a2b5899e0f583d42f00dbee2e upstream. Make sure we are get the SAN MAC address from the real netdev if the input netdev is a VLAN device. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: Fix checking san mac addressYi Zou
commit bf361707c81f8e8e43e332bfc8838bae76ae021a upstream. This was fixed before in 7a7f0c7 but it's introduced again recently. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe, libfc: fix an libfc issue with queue ramp down in libfcVasu Dev
commit 14caf44c69184ed72d46a2f883311daf27a4192f upstream. The cmd_per_lun value is used by scsi-ml as fall back lowest queue_depth value but in case of libfc cmd_per_lun is set to same value as max queue_depth = 32. So this patch reduces cmd_per_lun value to 3 and configures each lun with default max queue_depth 32 in fc_slave_alloc. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: remote port gets stuck in restart state without really restartingAbhijeet Joglekar
commit 5543c72e2bbb30e5ba5938b18ec26617b8b3fb04 upstream. We ran into a scenario where a remote port goes into RESTART state, but never gets added to scsi transport. The running vmcore showed the following: a) Port was in RESTART state b) rdata->event was STOP c) no work gets scheduled for the remote work to fc_rport_work After this point, shut/no-shut of the remote port did not cause the port to get re-discovered. The port would move betwen DELETE and RESTART states, but the event would always be STOP, no work would get scheduled to fc_rport_work and the port would not get added to scsi_transport. The problem is that rdata->event is not set to NONE after a port is restarted. After this point, no more work gets scheduled for the remote port since new work is scheduled only if rdata->event is non-NONE. So, the event and state keep changing, but fc_rport_work does not get scheduled to actually handle the event. Here's a transition of states that explains the above observation: ) Port is first in READY State, event is NONE 2) RSCN on shut, port goes to DELETED, event is stop 3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is still STOP 4) fc_rport_work gets scheduled, removes the port from transport, sees state as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT changed to NONE, this is the bug) 5) Plogi state machine completes, port state goes to READY, event goes to READY, but no work is scheduled since event was STOP (non-NONE) before. Fc_rport_work is not scheduled, port remains in READY state, but is not added to transport. Things are broken at this point. Libfc rport is ready, but no transport rport created. 6) now a shut causes port state to change to DELETE, event to change to STOP, no work gets scheduled 7) no-shut causes port state to change to RESTART, event remains at STOP, no work gets scheduled (6) and (7) now get repeated everytime we do shut/no-shut. No way to get out of this state. Fcc reset does not help too. Only way to get out is to load/unload module. Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED events, inside the discovery and rport locks. Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix free of fc_rport_priv with timer pendingJoe Eykholt
commit b4a9c7ede96e90f7b1ec009ce7256059295e76df upstream. Timer crashes were caused by freeing a struct fc_rport_priv with a timer pending, causing the timer facility list to be corrupted. This was during FC uplink flap tests with a lot of targets. After discovery, we were doing an PLOGI on an rdata that was in DELETE state but not yet removed from the lookup list. This moved the rdata from DELETE state to PLOGI state. If the PLOGI exchange allocation failed and needed to be retried, the timer scheduling could race with the free being done by fc_rport_work(). When fc_rport_login() is called on a rport in DELETE state, move it to a new state RESTART. In fc_rport_work, when handling a LOGO, STOPPED or FAILED event, look for restart state. In the RESTART case, don't take the rdata off the list and after the transport remote port is deleted and exchanges are reset, re-login to the remote port. Note that the new RESTART state also corrects a problem we had when re-discovering a port that had moved to DELETE state. In that case, a new rdata was created, but the old rdata would do an exchange manager reset affecting the FC_ID for both the new rdata and old rdata. With the new state, the new port isn't logged into until after any old exchanges are reset. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix memory corruption caused by double frees and bad error handlingChris Leech
commit 8f550f937e9fdafa5c37e348e214aecec851ef3f upstream. I was running into several different panics under stress, which I traced down to a few different possible slab corruption issues in error handling paths. I have not yet looked into why these exchange sends fail, but with these fixes my test system is much more stable under stress than before. fc_elsct_send() could fail and either leave the passed in frame intact (failure in fc_ct/els_fill) or the frame could have been freed if the failure was is fc_exch_seq_send(). The caller had no way of knowing, and there was a potential double free in the error handling in fc_fcp_rec(). Make fc_elsct_send() always free the frame before returning, and remove the fc_frame_free() call in fc_fcp_rec(). While fc_exch_seq_send() did always consume the frame, there were double free bugs in the error handling of fc_fcp_cmd_send() and fc_fcp_srr() as well. Numerous calls to error handling routines (fc_disc_error(), fc_lport_error(), fc_rport_error_retry() ) were passing in a frame pointer that had already been freed in the case of an error. I have changed the call sites to pass in a NULL pointer, but there may be more appropriate error codes to use. Question: Why do these error routines take a frame pointer anyway? I understand passing in a pointer encoded error to the response handlers, but the error routines take no action on a valid pointer and should never be called that way. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_dataYi Zou
commit d37322a43ebac79eef417149f5696390cf8872db upstream. In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info() called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[], exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc page. And send will be broken into multiple large sends if the frame already contains more frags than skb handle. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: initialize return value in fcoe_destroyMike Christie
commit 8eca355fa8af660557fbdd5506bde1392eee9bfe upstream. When doing echo ethX > /sys..../destroy I am getting errors when the tear down succeeds. It looks like the reason for this is because the rc var is not getting set when the destruction works. This just sets it to zero. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: don't WARN_ON in lport_timeout for RESET stateJoe Eykholt
commit 22655ac22289d7b7def8ef2d72eafe5024bd57fe upstream. It's possible and harmless to get FLOGI timeouts while in RESET state. Don't do a WARN_ON in that case. Also, split out the other WARN_ONs in fc_lport_timeout, so we can tell which one is hit by its line number. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: lport: fix minor documentation errorsJoe Eykholt
commit 1b69bc062c2a4c8f3e15ac69f487afec3aa8d774 upstream. Fix minor errors. A debug message said an RLIR was received instead of ECHO. "Expected" was misspelled in several places. Fix a type cast from u32 to __be32. Rob, Some of these may have been also taken care of in your other doc cleanup patch. Feel free to fold them in. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix wrong scsi return status under FC_DATA_UNDRUNYi Zou
commit 4347fa66878e079766258bc0d077c350cb31a799 upstream. This bug is exposed when there is a link flap in LLD. Particularly, when it happens right after a SCSI write command is sent out, no FCP_DATA is sent, causing fsp->status_code to be set as FC_DATA_UNDRUN in fc_fcp_complete_locked even no SCSI status is received. Consequently, fc_io_compl treats this as DID_OK. This results in SCSI returning successful to the initial I/O request even there is no DATA actually sent. Particularly, if you run an I/O tool w/ data verification on, the read back for verification is gonna fail. This is fixed here by checking when FC_DATA_UNDRUN happens, SCSI status is received w/ FC_SRB_RCV_STATUS set in fsp->state. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: remove redundant checking of netdev->netdev_opsYi Zou
commit b04d023cf5b7f4113cc4a09405c2fe8003bfe37d upstream. Remove the redundant checking of netdev->netdev_ops as it will never be NULL. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix ddp in fc_fcp for 0 xidYi Zou
commit 5e472d077f45de4f37365171bd742f18b3ef20de upstream. xid 0 was used as an indication of invalid xid before but now xid 0 can be used as a valid exchange i. This patch fixes the ddp completion in fcp layer, i.e., in fc_fcp.c:fc_fcp_ddp_done() function, to make sure it does not use xid 0 for indication of an invalid xid, instead, it now uses use FC_XID_UNKNOWN for such indication. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix typo in retry check on received PRLIJoe Eykholt
commit 85b5893ca97c69e409ecbb5ee90a5d99882369c4 upstream. A received Fibre Channel ELS PRLI request contains a bit that indicates whether the remote port supports certain retry processing sequences. The test for this bit was somehow coded to use multiply instead of AND! This case would apply only for target mode operation, and it is unlikely to be noticed as an initiator. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28lpfc: fix hang on SGI ia64 platformMichael Reed
commit 8e68597d087977d3e4fd3e735d290ab45fd0b5ea upstream. In testing 2.6.31 on one of our ia64 platforms I've encountered a hang due to the driver using hardware ATEs which are a limited resource. This is because the driver does not set the dma consistent mask to 64 bits. Signed-off-by: Michael Reed <mdr@sgi.com> Acked-by: James Smart <James.Smart@Emulex.Com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28scsi_transport_fc: remove invalid BUG_ONMichael Reed
commit 8798a694da59486e4a3ff0abeec183202fb34c20 upstream. I was doing some large lun count testing with 2.6.31 and hit a BUG_ON() in fc_timeout_deleted_rport(), and it seems like it should have been just a matter of time before someone did. It seems invalid to set port_state under lock, then expect it to remain set after releasing the lock. Another thread called fc_remote_port_add() when the lock was released, changing the port_state. This patch removes the BUG_ON and moves the test of the port_state to inside the host_lock. It's been running for several weeks now with no ill effect. Signed-off-by: Michael Reed <mdr@sgi.com> Acked-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28scsi_dh: create sysfs file, dh_state for all SCSI disk devicesChandra Seetharaman
commit 5917290ce9b376866b165d02a5ed88d5ecdb32d0 upstream. Create the sysfs file, dh_state even if the new SCSI device is not in the any of the device handler's internal lists. Signed-Off-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28scsi_devinfo: update Hitachi entries (v2)Takahiro Yasui
commit 627511e3e67553b04f6917c03e39b797df210e04 upstream. Four models, OPEN-/DF400/DF500/DISK-SUBSYSTEM, can handle REPORT_LUN, and the BLIST_REPORTLUN2 flag needs to be set. And DF600 doesn't require any flags because it returns ANSI 03h (SPC). Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28iscsi class: modify handling of replacement timeoutMike Christie
commit fdd46dcbe4468a1f47a2cc9be442d11c3d21dd68 upstream. This patch modifies the replacement/recovery_timeout so it works more like the fc fast io fail tmo. If userspace tries to set the replacement/recovery_timeout to less than zero, we will turn off the forced recovery cleanup. If userspace sets the value to 0 then we will force the recovery cleanup immediately. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28mpt2sas: New device SAS2208 support is addedKashyap, Desai
commit db27136a89d061bf9dceb28953a61a8ef862ca7f upstream. Added device ids range for { 0x80 - 87 } , modified mpi/mpi2_cnfg.h containing MPI2_MFGPAGE_DEVID_SAS2208_X. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22megaraid_sas: remove sysfs poll_mode_io world writeable permissionsBryn M. Reeves
commit bb7d3f24c71e528989501617651b669fbed798cb upstream. /sys/bus/pci/drivers/megaraid_sas/poll_mode_io defaults to being world-writable, which seems bad (letting any user affect kernel driver behavior). This turns off group and user write permissions, so that on typical production systems only root can write to it. Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06SCSI: fc class: fix fc_transport_init error handlingMike Christie
commit 48de68a40aef032a2e198437f4781a83bfb938db upstream. If transport_class_register fails we should unregister any registered classes, or we will leak memory or other resources. I did a quick modprobe of scsi_transport_fc to test the patch. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06SCSI: st: fix mdata->page_order handlingFUJITA Tomonori
commit c982c368bb90adbd312faa05d0cfd842e9ab45a7 upstream. dio transfer always resets mdata->page_order to zero. It breaks high-order pages previously allocated for non-dio transfer. This patches adds reserved_page_order to st_buffer structure to save page order for non-dio transfer. http://bugzilla.kernel.org/show_bug.cgi?id=14563 When enlarge_buffer() allocates 524288 from 0, st uses six-order page allocation. So mdata->page_order is 6 and frp_seg is 2. After that, if st uses dio, sgl_map_user_pages() sets mdata->page_order to 0 for st_do_scsi(). After that, when we call normalize_buffer(), it frees only free frp_seg * PAGE_SIZE (2 * 4096) though we should free frp_seg * PAGE_SIZE << 6 (2 * 4096 << 6). So we see buffer_size is set to 516096 (524288 - 8192). Reported-by: Joachim Breuer <linux-kernel@jmbreuer.net> Tested-by: Joachim Breuer <linux-kernel@jmbreuer.net> Acked-by: Kai Makisara <kai.makisara@kolumbus.fi> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06SCSI: qla2xxx: dpc thread can execute before scsi host has been addedMichael Reed
commit 1486400f7edd009d49550da968d5744e246dc7f8 upstream. Fix crash in qla2x00_fdmi_register() due to the dpc thread executing before the scsi host has been fully added. Unable to handle kernel NULL pointer dereference (address 00000000000001d0) qla2xxx_7_dpc[4140]: Oops 8813272891392 [1] Call Trace: [<a000000100016910>] show_stack+0x50/0xa0 sp=e00000b07c59f930 bsp=e00000b07c591400 [<a000000100017180>] show_regs+0x820/0x860 sp=e00000b07c59fb00 bsp=e00000b07c5913a0 [<a00000010003bd60>] die+0x1a0/0x2e0 sp=e00000b07c59fb00 bsp=e00000b07c591360 [<a0000001000681a0>] ia64_do_page_fault+0x8c0/0x9e0 sp=e00000b07c59fb00 bsp=e00000b07c591310 [<a00000010000c8e0>] ia64_native_leave_kernel+0x0/0x270 sp=e00000b07c59fb90 bsp=e00000b07c591310 [<a000000207197350>] qla2x00_fdmi_register+0x850/0xbe0 [qla2xxx] sp=e00000b07c59fd60 bsp=e00000b07c591290 [<a000000207171570>] qla2x00_configure_loop+0x1930/0x34c0 [qla2xxx] sp=e00000b07c59fd60 bsp=e00000b07c591128 [<a0000002071732b0>] qla2x00_loop_resync+0x1b0/0x2e0 [qla2xxx] sp=e00000b07c59fdf0 bsp=e00000b07c5910c0 [<a000000207166d40>] qla2x00_do_dpc+0x9a0/0xce0 [qla2xxx] sp=e00000b07c59fdf0 bsp=e00000b07c590fa0 [<a0000001000d5bb0>] kthread+0x110/0x140 sp=e00000b07c59fe00 bsp=e00000b07c590f68 [<a000000100014a30>] kernel_thread_helper+0xd0/0x100 sp=e00000b07c59fe30 bsp=e00000b07c590f40 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40 sp=e00000b07c59fe30 bsp=e00000b07c590f40 crash> dis a000000207197350 0xa000000207197350 <qla2x00_fdmi_register+2128>: [MMI] ld1 r45=[r14];; crash> scsi_qla_host.host 0xe00000b058c73ff8 host = 0xe00000b058c73be0, crash> Scsi_Host.shost_data 0xe00000b058c73be0 shost_data = 0x0, <<<<<<<<<<< The fc_transport fc_* workqueue threads have yet to be created. crash> ps | grep _7 3891 2 2 e00000b075c80000 IN 0.0 0 0 [scsi_eh_7] 4140 2 3 e00000b07c590000 RU 0.0 0 0 [qla2xxx_7_dpc] The thread creating adding the Scsi_Host is blocked due to other activity in sysfs. crash> bt 3762 PID: 3762 TASK: e00000b071e70000 CPU: 3 COMMAND: "modprobe" #0 [BSP:e00000b071e71548] schedule at a000000100727e00 #1 [BSP:e00000b071e714c8] __mutex_lock_slowpath at a0000001007295a0 #2 [BSP:e00000b071e714a8] mutex_lock at a000000100729830 #3 [BSP:e00000b071e71478] sysfs_addrm_start at a0000001002584f0 #4 [BSP:e00000b071e71440] create_dir at a000000100259350 #5 [BSP:e00000b071e71410] sysfs_create_subdir at a000000100259510 #6 [BSP:e00000b071e713b0] internal_create_group at a00000010025c880 #7 [BSP:e00000b071e71388] sysfs_create_group at a00000010025cc50 #8 [BSP:e00000b071e71368] dpm_sysfs_add at a000000100425050 #9 [BSP:e00000b071e71310] device_add at a000000100417d90 #10 [BSP:e00000b071e712d8] scsi_add_host at a00000010045a380 #11 [BSP:e00000b071e71268] qla2x00_probe_one at a0000002071be950 #12 [BSP:e00000b071e71248] local_pci_probe at a00000010032e490 #13 [BSP:e00000b071e71218] pci_device_probe at a00000010032ecd0 #14 [BSP:e00000b071e711d8] driver_probe_device at a00000010041d480 #15 [BSP:e00000b071e711a8] __driver_attach at a00000010041d6e0 #16 [BSP:e00000b071e71170] bus_for_each_dev at a00000010041c240 #17 [BSP:e00000b071e71150] driver_attach at a00000010041d0a0 #18 [BSP:e00000b071e71108] bus_add_driver at a00000010041b080 #19 [BSP:e00000b071e710c0] driver_register at a00000010041dea0 #20 [BSP:e00000b071e71088] __pci_register_driver at a00000010032f610 #21 [BSP:e00000b071e71058] (unknown) at a000000207200270 #22 [BSP:e00000b071e71018] do_one_initcall at a00000010000a9c0 #23 [BSP:e00000b071e70f98] sys_init_module at a0000001000fef00 #24 [BSP:e00000b071e70f98] ia64_ret_from_syscall at a00000010000c740 So, it appears that qla2xxx dpc thread is moving forward before the scsi host has been completely added. This patch moves the setting of the init_done (and online) flag to after the call to scsi_add_host() to hold off the dpc thread. Found via large lun count testing using 2.6.31. Signed-off-by: Michael Reed <mdr@sgi.com> Acked-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06SCSI: ipr: fix EEH recoveryKleber Sacilotto de Souza
commit 99c965dd9ee1a004efc083c3d760ba982bb76adf upstream. After commits c82f63e411f1b58427c103bd95af2863b1c96dd1 (PCI: check saved state before restore) and 4b77b0a2ba27d64f58f16d8d4d48d8319dda36ff (PCI: Clear saved_state after the state has been restored) PCI drivers are prevented from restoring the device standard configuration registers twice in a row. These changes introduced a regression on ipr EEH recovery. The ipr device driver saves the PCI state only during the device probe and restores it on ipr_reset_restore_cfg_space() during IOA resets. This behavior is causing the EEH recovery to fail after the second error detected, since the registers are not being restored. One possible solution would be saving the registers after restoring them. The problem with this approach is that while recovering from an EEH error if pci_save_state() results in an EEH error, the adapter/slot will be reset, and end up back in ipr_reset_restore_cfg_space(), but it won't have a valid saved state to restore, so pci_restore_state() will fail. The following patch introduces a workaround for this problem, hacking around the PCI API by setting pdev->state_saved = true before we do the restore. It fixes the EEH regression and prevents that we hit another EEH error during EEH recovery. [jejb: fix is a hack ... Jesse and Rafael will fix properly] Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14SCSI: megaraid_sas: fix 64 bit sense pointer truncationYang, Bo
commit 7b2519afa1abd1b9f63aa1e90879307842422dae upstream. The current sense pointer is cast to a u32 pointer, which can truncate on 64 bits. Fix by using unsigned long instead. Signed-off-by Bo Yang<bo.yang@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objectsJames Bottomley
commit d139b9bd0e52dda14fd13412e7096e68b56d0076 upstream. Some of our virtual SCSI hosts don't have a proper bus parent at the top, which can be a problem for doing DMA on them This patch makes the host device cache a pointer to the physical bus device and provides an extra API for setting it (the normal API picks it up from the parent). This patch also modifies the qla2xxx and lpfc vport logic to use the new DMA host setting API. Acked-By: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-26[SCSI] fix crash when disconnecting usb storageAlexey Kuznetsov
__scsi_remove_device() in scsi_forget_host() is executed out of scan_mutex and races with scsi_destroy_sdev() <- scsi_sysfs_add_devices() <- scsi_finish_async_scan(). The result is use after free and/or double free, oops. The fix is simple, move scsi_forget_host() under scan_mutex. scsi_forget_host() is just sequence of __scsi_remove_device(). All another calls of __scsi_remove_device() are made under scan_mutex. So that it is safe. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-26[SCSI] fix async scan add/remove race resulting in an oopsJames Bottomley
Async scanning introduced a very wide window where the SCSI device is up and running but has not yet been added to sysfs. We delay the adding until all scans have completed to retain the same ordering as sync scanning. This delay in visibility causes an oops if a device is removed before we make it visible because the SCSI removal routines have an inbuilt assumption that if a device is in SDEV_RUNNING state, it must be visible (which is not necessarily true in the async scanning case). Fix this by introducing an additional is_visible flag which we can use to condition the tear down so we do the right thing for running but not yet made visible. Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-26[SCSI] sd: Return correct error code for DIFMartin K. Petersen
sd_dif.c was not updated to return -EILSEQ, leading to error handling failures in applications which provide their own integrity metadata (as opposed to being protected by the block layer functions). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-11[SCSI] bfa: declare MODULE_FIRMWAREBen Hutchings
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jing Huang <huangj@brocade.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-11[SCSI] gdth: Prevent negative offsets in ioctl CVE-2009-3080Dave Jones
A negative offset could be used to index before the event buffer and lead to a security breach. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-06[SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()jack wang
We should not set res to 0 in function sas_ex_discover_dev in order to let it discover it further when wide port hotplug in . Signed-off-by: Tom Peng <tom_peng@usish.com> Signed-off-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-06[SCSI] pmcraid: Fix ppc64 driver build for using cpu_to_le32 on U8 data typeAnil Ravindranath
Fix a reported ppc64 driver build issue. Removed cpu_to_le32 conversion usage for flags in struct pmcraid_ioadl_desc. This was breaking the driver build in ppc64. drivers/scsi/pmcraid.c: In function 'pmcraid_request_sense': drivers/scsi/pmcraid.c:2254: warning: large integer implicitly truncated to unsigned type Signed-off-by: Anil Ravindranath<anil_ravindranath@pmc-sierra.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-06[SCSI] ipr: add workaround for MSI interrupts on P7Wayne Boyer
This patch adds some additional logic to the interrupt service routine to fix a potential problem where an MSI interrupt does not get cleared the first time. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-06[SCSI] scsi_transport_fc: Fix WARN message for FC passthru failure pathsBrian King
There are three error paths in the FC passthru code where job->reply->reply_payload_rcv_len does not get initialized, resulting in the WARN_ON in fc_bsg_jobdone going off. This patch fixes this. An example of one of the WARN_ON messages seen: Badness at drivers/scsi/scsi_transport_fc.c:3424 NIP: d000000000bf21ac LR: d000000000bf2684 CTR: c0000000003f753c REGS: c00000004eb03430 TRAP: 0700 Not tainted (2.6.32-rc4-git) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24008444 XER: 00000012 TASK = c00000004c3fc9c0[3243] 'fcping' THREAD: c00000004eb00000 CPU: 0 GPR00: 0000000000000001 c00000004eb036b0 d000000000c01da0 000000004bf17fc0 GPR04: c00000004cd256a0 c00000007e011ce0 c00000007e011d00 c00000004e718000 GPR08: c00000004cd256a0 c00000004eb03ad0 c00000004cd25a90 0000000000000020 GPR12: d000000000bf7848 c000000000b62600 0000000000000060 fffffffffffffff4 GPR16: ffffffffffffffd6 c00000004c7a3060 ffffffff80000003 c00000004b0f0310 GPR20: c00000004e71b180 c00000004c7a3060 0000000000000004 0000000000000000 GPR24: c00000004e71b000 c00000004c7a3000 c00000004b0f0000 c00000004e718000 GPR28: c00000004cd256a0 c00000004cd25a90 d000000000c01db0 c00000004e01d680 NIP [d000000000bf21ac] .fc_bsg_jobdone+0x64/0x9c [scsi_transport_fc] LR [d000000000bf2684] .fc_bsg_request_handler+0x4a0/0x564 [scsi_transport_fc] Call Trace: [c00000004eb036b0] [c0000000003f755c] .get_device+0x20/0x38 (unreliable) [c00000004eb03720] [d000000000bf2684] .fc_bsg_request_handler+0x4a0/0x564 [scsi_transport_fc] [c00000004eb03820] [c0000000002c9b5c] .__generic_unplug_device+0x58/0x70 [c00000004eb038a0] [c0000000002ce9fc] .blk_execute_rq_nowait+0x70/0xf4 [c00000004eb03930] [c0000000002ceb2c] .blk_execute_rq+0xac/0x100 [c00000004eb03a60] [c0000000002d51b4] .bsg_ioctl+0x1fc/0x264 [c00000004eb03c10] [c00000000018a89c] .vfs_ioctl+0x54/0xec [c00000004eb03ca0] [c00000000018b01c] .do_vfs_ioctl+0x640/0x6a8 [c00000004eb03d80] [c00000000018b0fc] .SyS_ioctl+0x78/0xbc [c00000004eb03e30] [c0000000000085b4] syscall_exit+0x0/0x40 Instruction dump: 8003004c 2fa80000 90090104 38000000 900a0108 419e0038 e9230040 81680108 80690004 7f835840 7c101026 5400f7fe <0b000000> 7d605b78 7f8b1840 409d0008 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-By: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-03[SCSI] bfa: fix test in bfad_os_fc_host_init()Roel Kluin
BFA_PORT_ROLE_FCP_IPFC is 0x04 so this always evaluates to true Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Jing Huang <huangj@Brocade.COM> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-31dpt_i2o: Fix typo of EINVALOGAWA Hirofumi
Commit ef7562b7f28319e6dd1f85dc1af87df2a7a84832 ("dpt_i2o: Fix up copy*user") had a silly typo: EINVAL should be -EINVAL. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: stable@kernel.org Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] zfcp: Flush SCSI registration work when adding unit [SCSI] zfcp: Fix timer initialization for ct and els requests [SCSI] zfcp: Warn about storage devices with broken PLOGI data [SCSI] zfcp: Handle WWPN mismatch in PLOGI payload [SCSI] zfcp: fix kfree handling in zfcp_init_device_setup [SCSI] fix memory leak in initialization
2009-10-29dpt_i2o: Fix up copy*userAlan Cox
Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-13[SCSI] fix memory leak in initializationJames Bottomley
The root cause of the problem is the fact that dev_set_name() now allocates storage instead of using the original array within the kobj. That means that the SCSI assumption that if you haven't made the containing object or any sub objects visible, you can just destroy it (and its component devices) lock stock and barrel becomes false. Fix this by doing the get of sdev_dev at parent time and thus do an extra put of it in scsi_destroy_sdev() (and all other destruction without add paths). Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (34 commits) [SCSI] qla2xxx: Fix NULL ptr deref bug in fail path during queue create [SCSI] st: fix possible memory use after free after MTSETBLK ioctl [SCSI] be2iscsi: Moving to pci_pools v3 [SCSI] libiscsi: iscsi_session_setup to allow for private space [SCSI] be2iscsi: add 10Gbps iSCSI - BladeEngine 2 driver [SCSI] zfcp: Fix hang when offlining device with offline chpid [SCSI] zfcp: Fix lockdep warning when offlining device with offline chpid [SCSI] zfcp: Fix oops during shutdown of offline device [SCSI] zfcp: Fix initial device and cfdc for delayed adapter allocation [SCSI] zfcp: correctly initialize unchained requests [SCSI] mpt2sas: Bump version 02.100.03.00 [SCSI] mpt2sas: Support dev remove when phy status is MPI2_EVENT_SAS_TOPO_PHYSTATUS_VACANT [SCSI] mpt2sas: Timeout occurred within the HANDSHAKE logic while waiting on firmware to ACK. [SCSI] mpt2sas: Call init_completion on a per request basis. [SCSI] mpt2sas: Target Reset will be issued from Interrupt context. [SCSI] mpt2sas: Added SCSIIO, Internal and high priority memory pools to support multiple TM [SCSI] mpt2sas: Copyright change to 2009. [SCSI] mpt2sas: Added mpi2_history.txt for MPI2 headers. [SCSI] mpt2sas: Update driver to MPI2 REV K headers. [SCSI] bfa: Brocade BFA FC SCSI driver ...
2009-10-02[SCSI] qla2xxx: Fix NULL ptr deref bug in fail path during queue createAnirban Chakraborty
Current code attempts to clean up resources when queue create fails and there it invokes queue free call with a (NULL) pointer to the queue which could not be allocated in the first place. Fix it by returning directly without invoking the queue free call as no resources has been allocated at that point of time. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-02[SCSI] st: fix possible memory use after free after MTSETBLK ioctlDavid Jeffery
A memory use after free bug can manifest if the MTSETBLK or SET_DENS_AND_BLK ioctl features are used to set the tape's blocksize from 0 to non-zero. After the driver sets the new block size, in this one case it calls normalize_buffer() to free the device's internal data buffers. However, the ioctl code assumes there is always a buffer and does not check or allocate a buffer if there isn't one. So any following ioctl calls can corrupt a part of memory by writing data to memory that the st driver had freed. This patch removes the normalize_buffer() call and the specialness of changing from a 0 to non-zero blocksize to fix the possible use of memory after it has been freed by the st driver. signed-off-by: David Jeffery <djeffery@redhat.com> Acked-by: Kai Makisara <kai.makisara@kolumbus.fi> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-02[SCSI] be2iscsi: Moving to pci_pools v3Jayamohan Kallickal
This patch contains changes to use pci_pools for iscsi hdr instead of pci_alloc_consistent. Here we alloc and free to pool for every IO v3: - Remove cleanup loop in beiscsi_session_destroy - Fixup for allocation failure handling in beiscsi_alloc_pdu - Removed unused variable in beiscsi_session_destroy. [jejb: fix up pci_pool_alloc address sizing problem] Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>