Wednesday 23 February 2011

UKIRT remote operations - an update

It was around June last year I first wrote about our switch to remote operations, or more formally, "Minimalist Mode". I've written a few posts since about the change but do you think I'll really list them all? Oh well, here are one or two:

Remote possibility

Good and bad timing

Last of the breed

Something sort of official.

Things have gone remarkably well, so well that I'm confident enough to give a talk next week to a hundred plus astronomers and engineers although I haven't quite figured out what they want to hear or what I want to say to them. Should be an interesting meeting nevertheless but this isn't want I wanted to talk about. What I do want to point out is a remarkable statistic that has caught me by surprise. In fact I'm astonished, but this is all down to our amazing engineering crew.

Like most professional observatories UKIRT keeps track of its fault rate. It's defined as the loss of clear time available for science observing that is due to technical faults. Many observatories run at a fault rate of 5% which is seen as very good. UKIRT, for many years now, has run at around the 2% level. This is exceptional especially given its remote location on top of a 14,000 foot mountain which suffers arctic-like conditions for much of the year.

With the switch to remote operations I fully expected this fault rate to increase and quite significantly so. We did everything we could to make sure all our known failure modes could be fixed remotely but there are some that simply can't be dealt with that way. To make things worse, in an El Nina year like this, weather conditions can be so bad at the summit that 1) they cause serious faults through icing, high winds, water inundation etc., things sensitive electronic devices hate; 2) the weather causes so much lost time that any small fault invariably loses more time as a percentage of clear time than it would during the summer and 3) our reduced staff numbers would mean faults would take longer to fix.

Despite the list, the fault rate continues at around the 2% level and if I leave out one serious fault that was a one-time failure and had nothing to do with remote operations, our already exceptionally small fault rate is now down at the 1% level since remote operations began. This is not what I expected, am still somewhat befuddled by it but am proud to announce it.

This is all due to an exceptional group of people that work at UKIRT and the JAC, especially our engineers and technicians but it's also due to the community that use UKIRT. Tomorrow I hope to post a comment or two from our observers that show what a special place this is and how many great scientific careers UKIRT has launched. It's knowing that we still contribute in that way while still making new scientific discoveries each night that keeps us motivated.

Thank you.

2 comments:

Anonymous said...

Thanks, Tom. Interesting stuff. I don't suppose it is quite as romantic going to work at a console in Hilo. I was wondering if you keep track of loss of efficiency due to uniquely human error. Is there any difference in the "screw-up" factor now that operators are breathing oxygen rich air? Stay well. Chuck

Tom said...

Chuck - that's a very interesting question and I'm certainly monitoring efficiency, both technical and human (our observing software makes it quite easy to track all sorts of things). The human side of things, as you might imagine, can be very subjective though so I have to be very careful. It is also complicated by the fact that our operators now have to make all the decisions on their own whereas before there was an astronomer sat next to them making the calls.

Someone told me about a theory the other day, I don't know how true it is, but it went along the lines of "if you learn something while you're drunk, you can only perform those tasks well when you're drunk and can't perform when sober. And vice versa of course. Our operators learned everything with little oxygen and I'm seeing a little evidence of the same thing happening, i.e., some things aren't being done so well in an oxygen rich environment, but it's just too early to tell if that's the case, plus they are still learning to run things in the new mode.

It's something we'll be reviewing after a few more months operating this way and the stats should be a little more solid by then. So far, however, I'm very proud of the way our operators have handled the transition!

Tom