BigV VM unreachability

Expected resolution: 8 May 2017, 21:00 UTC
Return to issues
Issue status: Resolved Date:

10 May 2017
08:56 UTC

Posted by:

Jamie Nguyen

Closing this post now, with further information and updates to follow soon on our forum at https://forum.bytemark.co.uk !

Issue status: Investigating Date:

9 May 2017
12:05 UTC

Posted by:

Jamie Nguyen

As per the previous update, machines came back up yesterday. Again, sincerest apologies for the outage.

We had a strange bug with one of our internal firewalls. We're still unsure of the exact reason, but are going to schedule some upgrades that we hope will resolve the bug.

This problem interrupted traffic between the York and Manchester, which triggered instability in our York Cloud Server infrastructure. This resulted in the outages seen last night. The firewall issue is new, but the symptoms were similar in our infrastructure. This outage was similar to issue 166 ( https://status.bytemark.org/issues/166 ), though the problem that triggered it was different.

Your frustrations are fully justified, and we have been working hard on ironing out the problems that have led to these outages. While some mitigations have already been put in place, thorough and careful engineering will continue with reliability as a top priority. There will be further information, update and plan to be posted to our forum at https://forum.bytemark.co.uk/

You can read a previous report here of the issue from last month that was similar: https://forum.bytemark.co.uk/t/incident-report-11th-april-2017/2614

Issue status: Investigating Date:

9 May 2017
11:40 UTC

Posted by:

Jamie Nguyen

As per the previous update, machines came back up yesterday. Again, sincerest apologies for the outage.

We had a strange bug with one of our internal firewalls. We're still unsure of the exact reason, but are going to schedule some upgrades that we hope will resolve the bug.

This problem interrupted traffic between the York and Manchester, which triggered instability in our York Cloud Server infrastructure. This resulted in the outages seen last night. The firewall issue is new, but the symptoms were similar in our infrastructure. This outage was similar to issue 166 ( https://status.bytemark.org/issues/166 ), though the problem that triggered it was different.

Your frustrations are fully justified, and we have been working hard on ironing out the problems that have led to these outages. While some mitigations have already been put in place, thorough and careful engineering will continue with reliability as a top priority. There will be further information, update and plan to be posted to our forum at https://forum.bytemark.co.uk/

Issue status: Resolved Date:

8 May 2017
19:04 UTC

Posted by:

Patrick Cherry

All machines are now back up.

If your machine is still experiencing issues, please raise an urgent ticket and our team will investigate.

Issue status: Investigating Date:

8 May 2017
17:30 UTC

Posted by:

Matthew Bloch

This may be an overly pessimistic estimate, so I've adjusted it.

All customer VMs will start automatically, but you will be able to get yours online faster if you log on and Shutdown & Start your machine manually.

Issue status: Investigating Date:

8 May 2017
17:12 UTC

Posted by:

Matthew Bloch

The control panel should is now restored, and we are waiting on VMs to come back up again.

This is a similar issue and cause to 166 and we'd expect the resolution time to be the same (around 2-3 hours).

Issue status: Investigating Date:

8 May 2017
16:37 UTC

Posted by:

James Hannah

We're investigating a problem with BigV machines in York now which is manifesting as unreachability/timeouts. We'll update this status post when we know more.

Return to issues

Issue still not addressed? Please contact support.