Re: MoL current rsync tree on BenH 2.4.6 dies after a while of running


Subject: Re: MoL current rsync tree on BenH 2.4.6 dies after a while of running
From: Benjamin Herrenschmidt (benh@kernel.crashing.org)
Date: Fri Jul 06 2001 - 10:04:20 MDT


>Don't know how long, since it apparently croaked in the night, but I just
>ssh'd into the G3 at work, and checking the MoL logfile revealed this:
>
>RVEC Internal Error 1700
>Assertion failed in mainloop_start(), (mainloop.c, line 133)
>
>I checked emulation/mainloop.c, and that asserts on the __stop variable.
>Apparently the emulation layer died some horrible death, but I can't
>figure out from grep'ing the source exactly where RVEC_INTERNAL_ERROR is
>being returned to cause this to happen. I grep'd for '1700', and it looks
>like it _might_ have something to do with the "Thermal Management
>Interrupt". I'm not sure what.
>
>Samuel, if you know of any good places to stick debugging printf's or
>anything else of the sort, to help track down what's happening, please let
>me know.

I have this one too, and it disappears if I disable the thermal
management of the kernel.

We are not yet 100% what is the cause of the problem, it may well be
a CPU bug. In that case, that would mean we must not use the TAU
interrupt at all in the kernel as it would lock us up regulary (MOL
beeing able to recover from the error is already quite wonderful ;)

Do for now, leave thermal management off (and tell us if it ever
happens again, I doesn't for me).

Ben.



This archive was generated by hypermail 2a24 : Fri Jul 06 2001 - 09:07:43 MDT