Answer for: What else could Modwest do to help its clients succeed?
#3 Keep systems running with no
chance of failure
(No description was provided.)
Browsing "All topics"
(No description was provided.)
Comments |
Leave a comment
To be perfectly honest "no chance of failure" might be a bit extreme. But I do agree that a good level of failure prevention is always good. As we all know there are no absolutes.
I sure wish we could Alison!
Perfect uptime (aka 100% Availability) is elusive.
Some multi-billion dollar military systems aim for 99.9999% availability, which equates to 30 seconds per year of unavailability. This requires incredible fault tolerance and automated self-healing capabilities, and it makes perfect sense for systems designed to protect or save human lives.
Gmail, run by a team of thousands of experts using millions of dollars of equipment with geographic redundancy has had multiple lengthy failures:
http://www.techcrunc...gmail-outage/
This doesn't let us off the hook of course -- we know your site and email are important to you, and we do strive for excellent uptime. Historically, we've done pretty well, though we can always do better.
For anything over 99.90%, I imagine the cost would go up significantly.
Many hosts claim 99.99... uptime, but the aren't as transparent as Modwest, and when I do the math, I begin to suspect that the claims are false.
Here's a summary of "nines" as they relate to annual/monthly/weekly service availability:
http://en.wikipedia....e_calculation
It would be interesting to see what percentage of downtime was during peak hours - say 7:00 a.m. EST to 9:00 PST - vs. off-peak.
Entropy rules!!
Fault prevention requires planned maintenance and planned maintenance requires downtime. For it to go unnoticed requires redundancy, which involves multiplications of costs and complexity.
Leaving well alone means waiting for failure to take you by surprise and then repairing whatever, which may involve the replacement of something that Murphy omitted from the spares inventory.
Hardware and software upgrades are opportunities for failure, but they can be mitigated by applying the correct level of thinking to conduct the necessary research and proper planning (with fall back).
Poorly written software, and especially that which hasn't been extensively tested can result in all manner of issues, from kernel panic to performance degradation, and may be introduced by a user of the system.
So best practice involves good *NIX System Administration, paying attention to Security, Connectivity and Efficiency in a way that ensures integrity at no loss.
In the end it all comes down to whether the people have the ability to resolve anything that comes their way and I'd say you're in good hands.
Ironically, I'd spotted a couple of typo's and clean forgot about the low battery warning until my machine blacked out without sleeping. By the time I'd plugged in the one minute window of opportunity to edit had passed.
Why only 1 minute of edit time? Any chance more?