Resolved – Global Issue: At Risk: Core Router Relocation, Manchester

This was a scheduled issue; it began at 06:30 UTC on 21 Sep 2017.
Resolved at 19:58 UTC on 22 Sep 2017 (29 days ago)

I've attenuated some of the lasers involved in the problematic link, and I've been monitoring it for errors for the best part of an hour - hopefully that was the fix. (Previously the errors were incrementing every few seconds, so this is a big change in behaviour).

I'll resolve this issue now, as we've faced no other issues over the last 24hrs, following the work in question.

Tom

Posted at 19:58 UTC on 22 Sep 2017 by Tom Hill

10 Previous Issue Posts:

Issue status was Confirmed – Estimated resolution was 29 days ago

The new link installed between Reynolds House and TeleData (for our Manchester DC) is currently showing some CRC errors with regular traffic. We've taken the link out of service, and will be checking it throughout to see where the fault is.

Tom

Posted at 18:45 UTC on 22 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

The last portion of the Manchester DC network has now been re-plumbed (some fibres were questionably polarized) and we're now back to full resiliency in Manchester.

I can only apologise for the short outage of IPv6 between Manchester & London. A full outage report will be filed internally, and can be summarised as 'bad configuration'.

Thank you to all of our customers in Manchester, and particularly so if you've been waiting for this work to finish.

We'll keep the issue open we monitor the network, for a while yet, but as of this moment the 'at risk' period is over.

Tom

Posted at 12:32 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

Most of the work to re-plumb the Manchester DC network has been completed, but we're frustratingly stuck having to make a final trip back to Reynolds House to swap a fibre pair.

We hope to have this completed before 13:30. Thank you for your patience at this time.

Tom

Posted at 11:26 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

The cause of the previous Manchester to London IPv6 outage was simply due to some missing configuration on one of the new links out of Williams House.

This has been rectified now, so we will lower the preference on the links between Manchester and London again shortly.

Tom

Posted at 10:24 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

A very, very odd IPv6 outage occurred just now, for IPv6 routes learnt via London. I've depreferenced the link between Manchester & London and am investigating.

Tom

Posted at 09:52 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

The core network is now fully resilient once again.

We are having a little trouble getting the data centre's network back up and running, but this should not take too much more time. It may require a trip back to Reynolds House in the mean time.

All-clear moved forward to 12:00.

Tom

Posted at 09:40 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

Core router now racked & booted in TeleData. It's testing all of its ports as we speak, and then we'll begin bringing back connectivity over the course of the next hour.

Tom

Posted at 08:37 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

We've now de-racked the core router and will be driving it down to TeleData very shortly.

Tom

Posted at 07:44 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

Good morning,

We've finished the necessary prerequisites, and we're now at the point where one of the two Core routers in Manchester (cr4.man) will be getting turned off. All traffic should be drained nicely within a few minutes.

We'll post again when leaving site.

Tom

Posted at 06:53 UTC on 21 Sep 2017 by Tom Hill
Issue status was Confirmed – Estimated resolution was about 1 month ago

Hello,

Between 07:30 and 10:30 (UTC+1) tomorrow morning (Thursday 21st September) we will be power off & migrate one of our core network routers away from our Kilburn House PoP, in Manchester. This router will be taken down the road to a new network PoP, based at TeleData.

We have been carefully moving services away from this router over the last few months, and as a result, absolutely no customers services will be impacted as part of this move. It will mean that hosting services in Manchester are single-homed for a period of time, so please do treat this as an 'at risk' period for those services.

No work on the remaining Manchester core router (located in Williams House) - nor the connected fibre or transit services - are expected to take place during the period described. The nature of this work does mean that latency from Manchester to London will be increased by around 2-3ms, as it will be re-routed via York.

The work taking place tomorrow forms the first stage of a much larger project to migrate a number of our services to TeleData, and more information will be forthcoming about this in the coming weeks.

Updates will be posted to this page as we progress. If for any reason we cannot complete the work in a fashion that permits us to return the network to full resiliency, we will be able to revert and re-attempt at a later date.

Thank you for your patience.

Tom

Posted at 21:33 UTC on 20 Sep 2017 by Tom Hill

This issue was first reported at 21:33 UTC on 20 Sep 2017 (about 1 month ago) by Tom Hill.