tape drive flakiness

Iain Stevenson yellowdog-general@lists.terrasoftsolutions.com
Tue Oct 15 06:42:01 2002


Someone on the list has reported success with an Ecrix but my experience 
with a tape drive was woeful.  You haven't mentioned the machine you're 
using but the driver for the built-in scsi on older Macs is a bit basic and 
does not do a soft reset.  Hence the only way to be sure of a clean start 
is to power cycle the tape drive and then reboot the machine.  This is 
clearly not great for a production machine.

The tape drive I used (Travan) was certified by the vendor as "linux 
compatible" - regrettably this often seems to mean "i386 linux compatible".

  Iain



--On Tuesday, October 15, 2002 7:28 am -0400 Michael George 
<george@auroravideosys.com> wrote:

>
> I have an XServe running YDL2.3 and I'm having some trouble with an Ecrix
> tape drive attached to it.
>
> This drive worked fine when it was connected to an Intel RedHat box with
> only the occasional weirdness (once every 2-3 months and then a retry
> would get things right).  However, we've had this deck fail doing backups
> 3x in the past 6 days.
>
> This time is the worst, as I cannot even access the device anymore.  When
> trying to do "mt -f /dev/st0 status" I get:
>
> /dev/st0: No such device or address
>
> But I know the device is there, because when the machine booted, I got:
>
> Oct  8 09:01:01 stout kernel: SCSI subsystem driver Revision: 1.00
> Oct  8 09:01:01 stout kernel: sym.18.2.0: setting PCI_COMMAND_PARITY...
> Oct  8 09:01:01 stout kernel: sym.18.2.1: setting PCI_COMMAND_PARITY...
> Oct  8 09:01:01 stout kernel: sym0: <896> rev 0x5 on pci bus 18 device 2
> function 0 irq 52 Oct  8 09:01:01 stout kernel: sym0: No NVRAM, ID 7,
> Fast-40, LVD, parity checking Oct  8 09:01:01 stout kernel: sym0: SCSI
> BUS has been reset.
> Oct  8 09:01:01 stout kernel: sym1: <896> rev 0x5 on pci bus 18 device 2
> function 1 irq 52 Oct  8 09:01:01 stout kernel: sym1: No NVRAM, ID 7,
> Fast-40, LVD, parity checking Oct  8 09:01:01 stout kernel: sym1: SCSI
> BUS has been reset.
> Oct  8 09:01:01 stout kernel: scsi0 : sym-2.1.17a
> Oct  8 09:01:01 stout kernel: scsi1 : sym-2.1.17a
> Oct  8 09:01:01 stout kernel: blk: queue dff40c28, I/O limit 4095Mb (mask
> 0xffffffff) Oct  8 09:01:01 stout kernel:   Vendor: ECRIX     Model:
> VXA-1              Rev: 2A6A Oct  8 09:01:01 stout kernel:   Type:
> Sequential-Access                   ANSI SCSI revision: 02 Oct  8
> 09:01:01 stout kernel: blk: queue dff40228, I/O limit 4095Mb (mask
> 0xffffffff) Oct  8 09:01:01 stout kernel: mesh: configured for
> synchronous 5 MB/s Oct  8 09:01:01 stout kernel: st: Version 20020805,
> bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16 Oct  8 09:01:01
> stout kernel: Attached scsi tape st0 at scsi0, channel 0, id 4, lun 0
>
> When the drive failed on 10/12/2002, I had this in /var/log/messages:
>
> Oct 12 02:30:53 stout kernel: invalidate: busy buffer
> Oct 12 02:31:08 stout last message repeated 87 times
> Oct 12 02:31:44 stout kernel: st0: Error with sense data: Info fld=0x187,
> Current st09:00: sense key Hardware Error Oct 12 02:31:44 stout kernel:
> Additional sense indicates Mechanical positioning error Oct 12 02:36:20
> stout kernel: st0: Error with sense data: Info fld=0x168, Current
> st09:00: sense key Illegal Request Oct 12 02:36:20 stout kernel:
> Additional sense indicates Write append error Oct 12 02:36:20 stout
> kernel: st0: Error on write filemark.
>
> Yesterday, I was able to eject the tape and do the backup seemingly okay,
> but this was in the logfile:
>
> Oct 14 09:25:28 stout kernel: st0: Error with sense data: Current
> st09:00: sense key Not ReadyOct 14 09:25:28 stout kernel: Additional
> sense indicates Medium not present Oct 14 09:25:55 stout kernel:
> invalidate: busy bufferOct 14 09:26:09 stout last message repeated 167
> times Oct 14 09:28:11 stout sshd(pam_unix)[13573]: session closed for
> user georgeOct 14 09:28:22 stout kernel: sym0:4:0: ABORT operation
> started. Oct 14 09:28:22 stout kernel: sym0:4:control msgout: 80 6. Oct
> 14 09:28:22 stout kernel: sym0:4:0: ABORT operation complete. Oct 14
> 09:28:22 stout kernel: sym0: unexpected disconnectOct 14 09:28:22 stout
> kernel: sym0:4:0: DEVICE RESET operation started. Oct 14 09:28:22 stout
> kernel: sym0:4:0: DEVICE RESET operation failed.Oct 14 09:28:22 stout
> kernel: sym0:4:0: BUS RESET operation started. Oct 14 09:28:22 stout
> kernel: sym0:4:0: BUS RESET operation failed.Oct 14 09:28:27 stout
> kernel: sym0:4:0: HOST RESET operation started. Oct 14 09:28:27 stout
> kernel: sym0:4:0: HOST RESET operation failed. Oct 14 09:28:37 stout
> kernel: scsi: device set offline - command error recover failed: host 0
> channel 0 id 4 lun 0
>
> and today /var/log/messages has only this at the time when the backup
> should have been run:
>
> Oct 15 02:30:14 stout kernel: invalidate: busy buffer
> Oct 15 02:30:15 stout last message repeated 37 times
>
> It kinda looks like hardware failure, but after rebooting, the backup
> worked just fine...
>
> We're running kernel 2.4.20-pre5 benh.  I'm going to check for a newer
> kernel right now...
>
> Thanks!
>
> -Michael
>
> _______________________________________________
> yellowdog-general mailing list
> yellowdog-general@lists.terrasoftsolutions.com
> http://lists.terrasoftsolutions.com/mailman/listinfo/yellowdog-general
>