15 Sep 2016
08:30 UTC
Tom Hill
Upon investigation, this outage was only affecting 'stretched' services between Manchester & York. There are far fewer of these services than initially intimated, so the fault will not have been as far reaching as was initially feared.
The failure was caused by the sudden appearance of CRC errors received on a router port, at a rate of about 300/second. Enough to cause significant packet loss, and even interrupt TCP sessions.
"Normally" such errors tend to start slow and ramp up, or stay as background noise - usually the sign of a poorly, or dying optical transceiver - so this was quite surprising to see.
We've disabled the offending port, and will test and/or replace the optics involved. Further work will go toward trying to catch such severe instances faster in the future.
Thanks to all for your patience.
15 Sep 2016
08:03 UTC
Bytemark Engineer
Sorry for the brevity at this stage.
We are aware of some networking issues affecting many of our services. We are currently investigating and will update this post when we know more.