tape drive flakiness
Michael George
yellowdog-general@lists.terrasoftsolutions.com
Tue Oct 15 05:29:00 2002
I have an XServe running YDL2.3 and I'm having some trouble with an
Ecrix tape drive attached to it.
This drive worked fine when it was connected to an Intel RedHat box
with only the occasional weirdness (once every 2-3 months and then a
retry would get things right). However, we've had this deck fail doing
backups 3x in the past 6 days.
This time is the worst, as I cannot even access the device anymore.
When trying to do "mt -f /dev/st0 status" I get:
/dev/st0: No such device or address
But I know the device is there, because when the machine booted, I got:
Oct 8 09:01:01 stout kernel: SCSI subsystem driver Revision: 1.00
Oct 8 09:01:01 stout kernel: sym.18.2.0: setting PCI_COMMAND_PARITY...
Oct 8 09:01:01 stout kernel: sym.18.2.1: setting PCI_COMMAND_PARITY...
Oct 8 09:01:01 stout kernel: sym0: <896> rev 0x5 on pci bus 18 device
2 function 0 irq 52
Oct 8 09:01:01 stout kernel: sym0: No NVRAM, ID 7, Fast-40, LVD,
parity checking
Oct 8 09:01:01 stout kernel: sym0: SCSI BUS has been reset.
Oct 8 09:01:01 stout kernel: sym1: <896> rev 0x5 on pci bus 18 device
2 function 1 irq 52
Oct 8 09:01:01 stout kernel: sym1: No NVRAM, ID 7, Fast-40, LVD,
parity checking
Oct 8 09:01:01 stout kernel: sym1: SCSI BUS has been reset.
Oct 8 09:01:01 stout kernel: scsi0 : sym-2.1.17a
Oct 8 09:01:01 stout kernel: scsi1 : sym-2.1.17a
Oct 8 09:01:01 stout kernel: blk: queue dff40c28, I/O limit 4095Mb
(mask 0xffffffff)
Oct 8 09:01:01 stout kernel: Vendor: ECRIX Model: VXA-1
Rev: 2A6A
Oct 8 09:01:01 stout kernel: Type: Sequential-Access
ANSI SCSI revision: 02
Oct 8 09:01:01 stout kernel: blk: queue dff40228, I/O limit 4095Mb
(mask 0xffffffff)
Oct 8 09:01:01 stout kernel: mesh: configured for synchronous 5 MB/s
Oct 8 09:01:01 stout kernel: st: Version 20020805, bufsize 32768, wrt
30720, max init. bufs 4, s/g segs 16
Oct 8 09:01:01 stout kernel: Attached scsi tape st0 at scsi0, channel
0, id 4, lun 0
When the drive failed on 10/12/2002, I had this in /var/log/messages:
Oct 12 02:30:53 stout kernel: invalidate: busy buffer
Oct 12 02:31:08 stout last message repeated 87 times
Oct 12 02:31:44 stout kernel: st0: Error with sense data: Info
fld=0x187, Current st09:00: sense key Hardware Error
Oct 12 02:31:44 stout kernel: Additional sense indicates Mechanical
positioning error
Oct 12 02:36:20 stout kernel: st0: Error with sense data: Info
fld=0x168, Current st09:00: sense key Illegal Request
Oct 12 02:36:20 stout kernel: Additional sense indicates Write append
error
Oct 12 02:36:20 stout kernel: st0: Error on write filemark.
Yesterday, I was able to eject the tape and do the backup seemingly
okay, but this was in the logfile:
Oct 14 09:25:28 stout kernel: st0: Error with sense data: Current
st09:00: sense key Not ReadyOct 14 09:25:28 stout kernel: Additional
sense indicates Medium not present
Oct 14 09:25:55 stout kernel: invalidate: busy bufferOct 14 09:26:09
stout last message repeated 167 times
Oct 14 09:28:11 stout sshd(pam_unix)[13573]: session closed for user
georgeOct 14 09:28:22 stout kernel: sym0:4:0: ABORT operation started.
Oct 14 09:28:22 stout kernel: sym0:4:control msgout: 80 6. Oct 14
09:28:22 stout kernel: sym0:4:0: ABORT operation complete.
Oct 14 09:28:22 stout kernel: sym0: unexpected disconnectOct 14
09:28:22 stout kernel: sym0:4:0: DEVICE RESET operation started.
Oct 14 09:28:22 stout kernel: sym0:4:0: DEVICE RESET operation
failed.Oct 14 09:28:22 stout kernel: sym0:4:0: BUS RESET operation
started.
Oct 14 09:28:22 stout kernel: sym0:4:0: BUS RESET operation failed.Oct
14 09:28:27 stout kernel: sym0:4:0: HOST RESET operation started.
Oct 14 09:28:27 stout kernel: sym0:4:0: HOST RESET operation failed.
Oct 14 09:28:37 stout kernel: scsi: device set offline - command error
recover failed: host 0 channel 0 id 4 lun 0
and today /var/log/messages has only this at the time when the backup
should have been run:
Oct 15 02:30:14 stout kernel: invalidate: busy buffer
Oct 15 02:30:15 stout last message repeated 37 times
It kinda looks like hardware failure, but after rebooting, the backup
worked just fine...
We're running kernel 2.4.20-pre5 benh. I'm going to check for a newer
kernel right now...
Thanks!
-Michael