South Africa’s biggest forum. Discuss, discover, and connect with thousands of members.
They did send an email to the email address registered on the account..Is anyone hosting DNS records with 1-Grid?
They have an outage since yesterday but no notifications from the provider and ETR not known.
We have websites that are not resolving and e-mail not working.
Yes, we had the same problem. The world lost our email and site location data as their nameservers were dead to the world. Our service is up and running now, but it appears quite a few are not.Is anyone hosting DNS records with 1-Grid?
They have an outage since yesterday but no notifications from the provider and ETR not known.
We have websites that are not resolving and e-mail not working.
Dear Customers
As many of you are aware, 1-grid experienced large scale outage on Monday, 31 January 2022 evening affecting the vast majority of our customers. We wanted to provide a summary as to what happened, why the delay to resolve the issue and what we are doing to avoid further risks in the future.
Firstly, I would like to personally apologise to all affected customers. This is unacceptable and we let you down. This shouldn’t have happened and whilst I would like to say it was outside our control, our choice of suppliers and the extent to which we audit their setup is something we should have done better on. Saying that it affected hundreds of other companies doesn’t make it better. All we can do at this point is be brutally honest about the causes with our customers.
Current State
We were in the process of migrating from our current datacentre (owned by Old Mutual and run by Africa Datacentres) to a new site at the Africa Datacentres Diep River facility. The incident last night was not related to the migration, however its impact affected both sites, and this requires a bit of explanation
We have now moved all physical servers to our new site in Diep River which are in the process of being brought back onto our network.
What happened last night?
Last night, we saw every single link from Pinelands to our other datacentres go down at the same time, both transit (Internet connectivity) as well as two links to Teraco and the link to Diep River. This is a bit like a plane with four engines having all of them stop mid-flight at the same time. This caused an outage for most customers who were routed in Pinelands, even if their server had been moved to Diep River. The root cause was a Liquid Networks issue with a major failure in Pinelands.
We proceeded to carry out an emergency migration of routing from Pinelands to Diep River (something we were not scheduled to do quite yet). Because we are moving vendors for the equipment doing this routing, the process is a bit more complicated than it would otherwise be. Nonetheless we are doing this as we believe it will help get some customers back before Liquid address their own equipment failure. We have also made some temporary changes to bring customers up quicker in some cases. We’re working hard to get everyone up by the morning.
We have tonight also been physically moving all the remaining servers in Pinelands to Diep River as Liquid have indicated to us that they cannot fix the issue overnight. This is clearly not acceptable, however we would much rather take control of the situation and look after our customers. This is a significant undertaking as this was supposed to happen over the next couple of weeks spread over several nights. We have staff on hand to help with the move from our directors to technical team.
The technically minded among you will wonder ‘why didn’t you just move routing as you went along?’ – that’s a very good question. We have a lot of legacy setup from years of acquisitions and we’ve made iterative improvements to increase capacity, resilience and remove some legacy issues like large broadcast domains. Nevertheless servers on the same VLAN may be in different sites during the migration, so this process would have impossible to do perfectly, and we believed our three separate links between the sites would have been sufficient protection against any incidents.
We also found that out out-of-band access wasn’t working as expected; we have a setup that allows us to get into our routers even if our network was down. We have used this over the past few weeks however the setup had a glitch at the same time. This wasn’t the cause of the problem or a result of it, but it delayed us starting the diagnostic process. We will learn from that too and improve our monitoring.
The Future
We don’t propose to explain all the changes we will make in this post. We need to do a full incident analysis for that. However, in the meantime we would like to add a few comments.
Once again, I would like to personally apologise to all affected customers.
Webmail seems to be working for some domains. Luckilly I only had two clients with issues today. Up to now at least...Still cannot get any of my emails, this is really poor service, so much for companies guarantees of uptime.
09:23, seems to be up and running again, emails are working. Clientzone opens but cannot logon
The problem is you are gaining a reputation for whenever you're performing these "migrations". When you moved international domains you failed to correctly update all domain zones. I am still having hangovers from that move. Also, many of your personnel at your support center are technically inept and fail to properly read tickets. It makes me concerned that the same technical deficiencies exist in your tech department.Dear Customers
As many of you are aware, 1-grid experienced large scale outage on Monday, 31 January 2022 evening affecting the vast majority of our customers. We wanted to provide a summary as to what happened, why the delay to resolve the issue and what we are doing to avoid further risks in the future.
Firstly, I would like to personally apologise to all affected customers. This is unacceptable and we let you down. This shouldn’t have happened and whilst I would like to say it was outside our control, our choice of suppliers and the extent to which we audit their setup is something we should have done better on. Saying that it affected hundreds of other companies doesn’t make it better. All we can do at this point is be brutally honest about the causes with our customers.
Current State
We were in the process of migrating from our current datacentre (owned by Old Mutual and run by Africa Datacentres) to a new site at the Africa Datacentres Diep River facility. The incident last night was not related to the migration, however its impact affected both sites, and this requires a bit of explanation
We have now moved all physical servers to our new site in Diep River which are in the process of being brought back onto our network.
What happened last night?
Last night, we saw every single link from Pinelands to our other datacentres go down at the same time, both transit (Internet connectivity) as well as two links to Teraco and the link to Diep River. This is a bit like a plane with four engines having all of them stop mid-flight at the same time. This caused an outage for most customers who were routed in Pinelands, even if their server had been moved to Diep River. The root cause was a Liquid Networks issue with a major failure in Pinelands.
We proceeded to carry out an emergency migration of routing from Pinelands to Diep River (something we were not scheduled to do quite yet). Because we are moving vendors for the equipment doing this routing, the process is a bit more complicated than it would otherwise be. Nonetheless we are doing this as we believe it will help get some customers back before Liquid address their own equipment failure. We have also made some temporary changes to bring customers up quicker in some cases. We’re working hard to get everyone up by the morning.
We have tonight also been physically moving all the remaining servers in Pinelands to Diep River as Liquid have indicated to us that they cannot fix the issue overnight. This is clearly not acceptable, however we would much rather take control of the situation and look after our customers. This is a significant undertaking as this was supposed to happen over the next couple of weeks spread over several nights. We have staff on hand to help with the move from our directors to technical team.
The technically minded among you will wonder ‘why didn’t you just move routing as you went along?’ – that’s a very good question. We have a lot of legacy setup from years of acquisitions and we’ve made iterative improvements to increase capacity, resilience and remove some legacy issues like large broadcast domains. Nevertheless servers on the same VLAN may be in different sites during the migration, so this process would have impossible to do perfectly, and we believed our three separate links between the sites would have been sufficient protection against any incidents.
We also found that out out-of-band access wasn’t working as expected; we have a setup that allows us to get into our routers even if our network was down. We have used this over the past few weeks however the setup had a glitch at the same time. This wasn’t the cause of the problem or a result of it, but it delayed us starting the diagnostic process. We will learn from that too and improve our monitoring.
The Future
We don’t propose to explain all the changes we will make in this post. We need to do a full incident analysis for that. However, in the meantime we would like to add a few comments.
Once again, I would like to personally apologise to all affected customers.
Where did you find this message?The problem is you are gaining a reputation for whenever you're performing these "migrations". When you moved international domains you failed to correctly update all domain zones. I am still having hangovers from that move. Also, many of your personnel at your support center are technically inept and fail to properly read tickets. It makes me concerned that the same technical deficiencies exist in your tech department.
Please suggest an alternative to 1-Grip to host business email account for a small business?
Please suggest an alternative to 1-Grip to host business email account for a small business?
Those poor techies and sysadmins must be shitting bricks by now.My server is still offline. More than 24h without service