10 May 2017
08:56 UTC
Jamie Nguyen
Closing this post now, with further information and updates to follow soon on our forum at https://forum.bytemark.co.uk !
9 May 2017
12:05 UTC
Jamie Nguyen
As per the previous update, machines came back up yesterday. Again, sincerest apologies for the outage.
We had a strange bug with one of our internal firewalls. We're still unsure of the exact reason, but are going to schedule some upgrades that we hope will resolve the bug.
This problem interrupted traffic between the York and Manchester, which triggered instability in our York Cloud Server infrastructure. This resulted in the outages seen last night. The firewall issue is new, but the symptoms were similar in our infrastructure. This outage was similar to issue 166 ( https://status.bytemark.org/issues/166 ), though the problem that triggered it was different.
Your frustrations are fully justified, and we have been working hard on ironing out the problems that have led to these outages. While some mitigations have already been put in place, thorough and careful engineering will continue with reliability as a top priority. There will be further information, update and plan to be posted to our forum at https://forum.bytemark.co.uk/
You can read a previous report here of the issue from last month that was similar: https://forum.bytemark.co.uk/t/incident-report-11th-april-2017/2614
9 May 2017
11:40 UTC
Jamie Nguyen
As per the previous update, machines came back up yesterday. Again, sincerest apologies for the outage.
We had a strange bug with one of our internal firewalls. We're still unsure of the exact reason, but are going to schedule some upgrades that we hope will resolve the bug.
This problem interrupted traffic between the York and Manchester, which triggered instability in our York Cloud Server infrastructure. This resulted in the outages seen last night. The firewall issue is new, but the symptoms were similar in our infrastructure. This outage was similar to issue 166 ( https://status.bytemark.org/issues/166 ), though the problem that triggered it was different.
Your frustrations are fully justified, and we have been working hard on ironing out the problems that have led to these outages. While some mitigations have already been put in place, thorough and careful engineering will continue with reliability as a top priority. There will be further information, update and plan to be posted to our forum at https://forum.bytemark.co.uk/
8 May 2017
19:04 UTC
Patrick Cherry
All machines are now back up.
If your machine is still experiencing issues, please raise an urgent ticket and our team will investigate.
8 May 2017
17:30 UTC
Matthew Bloch
This may be an overly pessimistic estimate, so I've adjusted it.
All customer VMs will start automatically, but you will be able to get yours online faster if you log on and Shutdown & Start your machine manually.
8 May 2017
17:12 UTC
Matthew Bloch
The control panel should is now restored, and we are waiting on VMs to come back up again.
This is a similar issue and cause to 166 and we'd expect the resolution time to be the same (around 2-3 hours).
8 May 2017
16:37 UTC
James Hannah
We're investigating a problem with BigV machines in York now which is manifesting as unreachability/timeouts. We'll update this status post when we know more.