ITEC-4203 Fault Tolerant Systems
System Failures: Human Intervention
In December of 2008 a multinational bank was presented with a simple fault on their production system that supposedly had an easy work around. Support staff, in conjunction with vendor approval, applied the work around to the faulty system which seemed to fix the issue. This fix was then applied to the disaster recovery system and the development system, without first taking due diligence and testing that the fix was actually fully functional. The end result was that insufficient testing lead to all three nodes crashing and the loss of millions of customer transactions, for four months. DBS, a bank located in Singapore, contracted with IBM to support its data centers. In July of 2010 a simple communications instability fault began on a storage system maintained by IBM. A technician was sent to review the fault after hours so as to not interrupt business. However, the technician did not follow correct procedures in resolving the fault, actually making the fault worse by repeatedly doing the wrong thing thinking to get a different result (the definition of insanity). This lead to the entire DBS system being brought offline, all ATMs and POS machine were unable to operate. The faulty procedure was eventually identified and resolved. However, DBS were fined by audit agencies and forced to implement even further redundancy safeguards. In 2003 JournalSpace (the original iteration, not the current version) had an internal dispute with their IT Manager which resulted in summary dismissal. This manager being disgruntled decided to take negative actions towards the organization and overwrote their entire database with random information. What JournalSpace was not aware of before this occurred was that this manager had never backed up their data, ever. As such, the overwrite meant that JournalSpace lost their entire user’s data store which eventually lead to them having to shut down the organization. All three of these failures had a single thing in common, human intervention. The first example was lack of human foresight in haphazardly rolling out a supposed “fix.” DBS’s failure came from a single technician’s lack of training and inability to see the insanity of their actions. Finally, JournalSpace was caused by a single point of failure with their IT manager not being trustworthy and competent. If all these organizations had taken due diligence in training, supervising, and encouraging employee engagement, it is likely these failures would not have occurred, or occurred in a much less spectacular fashion. There is no resolving human error one hundred percent. However, there is no excuse for allowing human error to be the single point of failure in a given system. Reference Inc., S. A., & Highlyman, W. (2008, December). Innocuous Fault LEads to Weeks of Recovery. Retrieved April 16, 2014, from Availability Digest: http://www.availabilitydigest.com Inc., S. A., & Highlyman, W. (2009, April). Why Back Up? Retrieved April 16, 2014, from Availability Digest: http://www.availabilitydigest.com Inc., S. A., & Highlyman, W. (2010, August). Singapore Bank Downed by IBM Error. Retrieved April 16, 2014, from Availability Digest: http://www.availabilitydigest.com |
High Availability: Conjectural Architecture Proposal for Wikipedia
Wikipedia (www.wikipedia.org) is an open, community editable, encyclopedia which has millions of user submitted facts that are constantly being updated and improved upon. Their service is used around the world, and is available in 287 different languages. Moreover, in some countries, it is likely the only source of unbiased (ostensibly) information available. As the data it possesses is truly the knowledge of humanity, Wikipedia greatly benefits from having a highly available service. As a checkpoint, the recovery time Wikipedia requires is in the area of seconds, with a recovery point acceptable within an hour’s failure. That being said, because the recovery time is in the seconds range, the recovery point will likely be in this range regardless. That is, if a Wikipedia server fails, their recovery time objective (RTO) shall be no more than thirty seconds, meaning their recovery point objective (RPO) will be no more than thirty seconds, or even less. This type of redundancy requires a clustered architecture in the N-Plus-X category, where N is the number of active nodes in the cluster and X is the failover nodes reserved for redundancy. As the data Wikipedia stores is information sensitive to the entire human population, this clustered redundancy should have a ratio of no less than ten percent (e.g. one failover node for every ten active nodes in a cluster for a total of eleven nodes per failover cluster). Moreover, the data itself should be multiplexed across several clusters, geographically dispersed, so as to ensure data availability even in a natural disaster or political interference. This type of setup will recover in the required thirty seconds or less RTO due mainly in part to the way the Failover Management Software (FMS) operates and the physical architecture used to build the clusters. The FMS will need to be configured to immediately failover an offline system to a standby node within thirty seconds of node failure. Moreover, the infrastructure on each node should be identical with each being capable of operating as a singular system if any of the others go offline. That being said, the FMS can act as a single point of failure. This means that Failover testing will need to occur on an annual basis to ensure the software operates as intended. Since Wikipedia is a web based application, its data entry is sent via web protocols. This means that as soon as a service is not online, any data being sent to the application will halt with a 401 message being returned to the client. Since the data itself is multiplexed (asynchronously), if a node goes down the failover node will already contain all data required to continue operating the site. The only data lost then is the data which would have been submitted to the site during the brief 30 second failover window. The pros and cons of such a system are as follows. Pro, the system maintains all data in a high state of availability with redundancy built in, in case of system, environmental, or political disaster. Con, the system will be quite expensive depending on the number of clusters required to maintain the entire Wikipedia database. In this instance, the costs may be high, but the rewards for ensuring the vast amount of human knowledge remains intact is definitely worth the expenditure. Reference Stern, E. M. (2003). Blueprints for High Availability (2nd ed.). Indianapolis: Wiley Publishing, Inc. Wikipedia. (2014, April 14). Wikipedia:About. Retrieved April 22, 2014, from Wikipedia: http://en.wikipedia.org/wiki/Wikipedia:About |
Failure Mitigation: Return on Investment | |
File Size: | 195 kb |
File Type: | docx |
Extract: Case study A has two types of downtime to consider: a workday outage, and an after-hours outage. A workday outage is likely going to cost the organization a lot more due to having staff on hand being paid to wait for the system to come back online, and then being paid overtime to catch up on orders. On the other hand, an after-hours outage will have fewer staff on hand to deal with the outage (likely majority IT) and thus being paid overtime only. However, transactions occur around the clock, so profit is lost no matter what time of day the outage occurs.
Calculating the Availability of a Real Estate System | |
File Size: | 186 kb |
File Type: | docx |
This paper details the steps taken to calculate the availability of an imaginary real estate system based on its individual nodes' availability. Note that this is the first step in calculating this system's overall availability, and further papers dig deeper into the real meaning of increasing this system's availability status.
Invest in Your Employees to Ensure High Availability
The single largest point of failure in any information technology system, regardless of environmental, social, economic, political, or physical factors are human beings. As such, the one component in which I would invest the heaviest time and capital to ensure high availability is the staff maintaining that system. That is, ensure they are trained, prepared, and fairly paid according to their job roles and responsibilities. Investing in these people will do more for ensuring high availability then all the redundancy and backups the same money could buy. While I would still invest in both redundancy and backups, their operation and upkeep are only as good as the people who ensure they function. Yes they are definitely key components in high availability, and will be implemented, but they will be implemented by highly trained, well paid, happy, technicians who are competent and able to ensure their upkeep. It makes no sense to build a system and then have underpaid low skilled staff maintaining those systems. They will not be engaged in what they are doing, and they will not have the skills required to ensure continued availability. Reference Klebanoff, J. (2009, September). End-to-End High Availability and Disaster Recovery. System iNews, 35-40. ________________________________ System Replication Protection: Vision Solutions Double-Take 7.0 The protection, integrity, and availability of core applications and data are of utmost importance to any organization. Vision Solutions’ Double-Take 7.0 continuous replication system embodies these ideals which are so highly coveted. Double-Take 7.0 is hardware independent, layer neutral, and spatially available. More importantly, it supports any size enterprise, with the ability to grow replication requirements as network and database administrators require. Double-Take 7.0 is hardware independent and layer independent. This means it has no specialized hardware requirements, will operate in virtual environments or cloud environments, and will replicate to and from any server architecture setup. If a physical server needs to be backed up to multiple virtual servers, or a cloud system needs to be stored on a combination of virtual machines and physical machines, or if multiple servers need to replicate to a single machine (or vice versa) Double-Take 7.0 is fully functional and capable of meeting any of these requirements. Moreover, Double-Take is compatible with Windows Server 2003 through 2012 (x86 and x64), as well as many Linux distros, Oracle distros, VMWare, vSphere, and Hyper-V. Cont... |
This solution is more than just platform independent, however. Double-Take 7.0 offers continuous replication, using Asynchronous replication with a minimum synchronous time in the seconds range. Moreover, snapshots can be scheduled such that RPO can be set to whatever is required. That is, if the system needs to be restored from a time further back then live data, a snapshot can be used to restore the system rather than restoring from the most recent recovery point. This offers the additional protection from point in time data corruption or virus infestation. While the continuous replication is occurring at a high rate, it is still asynchronous as it occurs on a synch schedule, rather than on a per transaction basis. This is actually preferred in a transactional system which may have many users connecting and operating on the application at any given time. Were a synchronous replication to be implemented, the network load would likely overwhelm the system and bring it to a halt anyway. However, using continuous replication means systems are backed up at synch time, based on any changes which have occurred since the previous synch time. Meaning data integrity is maintained from synch point to synch point, and replication sizes are kept at a minimum. In terms of weaknesses, Double-Take 7.0 is highly configurable and robust, meaning that it does have some learning curves to get the most use out of its functionality. Moreover, it requires an administrator who has access to all the systems which need to be replicated, meaning there may be some security concerns if servers are segregated based on per administrator rights. Finally, because the continuous replication can occur at a constant rate, it will cause quite a bit of network bandwidth traffic, regardless of asynchronous replication, if not handled correctly. However, all of these issues would be encountered by any replication system as capable as Vision Solutions’ offering. Reference Vision Solutions. (2013). Double-Take 7.0: Overview. Retrieved May 6, 2014, from Vision Solutions: http://www.visionsolutions.com/downloads/ Vision Solutions. (2013). Double-Take 7.0: Solution Suite. Retrieved May 6, 2014, from Vision Solutions: http://www.visionsolutions.com/downloads Vision Solutions; Eric Payne. (2013, October 31). Double-Take Avaialbility 7.0 Demo - vSphere. Retrieved May 6, 2014, from YouTube: https://www.youtube.com/watch?v=v8Abwv8IwZ4 |
Increasing System Availability: Understanding Failover and Recovery Times | |
File Size: | 94 kb |
File Type: | docx |
This is a continuation of the real estate system explored in an earlier paper.
Extract: Taking into account the failover time of any system will greatly increase the accuracy of the availability index of that system. In other words, without knowing the true time it will take to failover to a given redundant system, the availability of that system is questionable in the overall scheme of things. Given the knowledge of how fast each node will fail over to its redundant counterpart, the real estate system’s (from assignment two) availability can be more accurately defined. This in turn will allow the organization to better utilize its resources to increase the overall system availability.
Extract: Taking into account the failover time of any system will greatly increase the accuracy of the availability index of that system. In other words, without knowing the true time it will take to failover to a given redundant system, the availability of that system is questionable in the overall scheme of things. Given the knowledge of how fast each node will fail over to its redundant counterpart, the real estate system’s (from assignment two) availability can be more accurately defined. This in turn will allow the organization to better utilize its resources to increase the overall system availability.
Increasing Availability: Evaluating a Cost Cutting Proposal | |
File Size: | 207 kb |
File Type: | docx |
This paper reviews an imaginary proposal to increase availability on a simple three node system.
Extract: A review of our current system setup is required in order to first fully understand what the proposed solution offers to our organization. Our current system is composed of primarily three distinct nodes: the users who operate on the system, the server which provides our application services, and the backup system which ensure our data integrity for posterity. This solution was built during a time when our organization was much smaller and had fewer needs then what our current environment envisions. A downtime of just four hours results in a capital loss to our company of $40,000. Moreover, we have a history of having at least two of these occurrences a year, which is plainly unacceptable. To really drive this home, a total system failure would result in a 24 hour downtime or a loss of more than $240,000 to the company.
Extract: A review of our current system setup is required in order to first fully understand what the proposed solution offers to our organization. Our current system is composed of primarily three distinct nodes: the users who operate on the system, the server which provides our application services, and the backup system which ensure our data integrity for posterity. This solution was built during a time when our organization was much smaller and had fewer needs then what our current environment envisions. A downtime of just four hours results in a capital loss to our company of $40,000. Moreover, we have a history of having at least two of these occurrences a year, which is plainly unacceptable. To really drive this home, a total system failure would result in a 24 hour downtime or a loss of more than $240,000 to the company.
Risk Assessment: Migration from Active/Passive to Active/Active
In terms of risk, the biggest issues the organization is facing are the availability affects the active/active setup will cause, the capital requirements of the project, and the location requirements of the hardware. Each of these must be addressed before the project can be fully realized. That being said, there are definitely ways to resolve each of the issues generated. Moreover, the resolution of these issues will leave the organization in a much better position to handle strategic goals.
Availability falls under three distinct parts: user availability, database availability, and network availability. While a given cutover time during failover of a few seconds to a minute may be acceptable from a user standpoint, anything longer will actually begin to affect user productivity and thus organizational revenue. Concurrent transactions on the database will incur issues whereby data may become locked or corrupted if not handled correctly by the cluster management system. As transactions are happening concurrently, network traffic will increase based on the number of nodes active on the system. Higher traffic can have an effect on both user access and data availability. Not to mention, high lag caused by increased network traffic could lead to false positives in the FMS, thus incurring even further delays to user and data processing.
Capital requirements come from licensing of the active/active software, licensing of additional core application software to run on each node, and the hardware required to run the system itself. While the active/active software is a large consideration for the risk of this project, the organization cannot forget that core application software cannot run on more than one node at a time without additional licensing. Moreover, some software may need to be specially licensed to run in this configuration, thus costing extra for the company. The hardware, on the other hand, will need to be beefed up at the network infrastructure level and the nodes themselves in order to handle the increased traffic the active/active configuration will cause. This means that a further capital outlay will need to be considered before approval can be granted for project spending.
Finally, the location of our data centers must be considered. While they are geographically dispersed, their bandwidth and physical capabilities may not be able to cope with the increased traffic of the system. For instance, the electrical grid, cooling/heating, and bandwidth will all need to be adjusted to account for increased traffic on servers. That is, the data centers will be using more electricity, causing more heat, and using more of the network then our current active/passive setup. All these feed back to the capital requirements, and the availability of the system itself.
The solutions to consider for each of these are as follows:
1. Plan for redundant availability of all system. An active/active setup needs to be over planned as it will increase the throughput of the database and network dramatically. System models and class diagrams explaining how dead locks, unique identifiers, and increased traffic will all be handled must be made available.
2. Ensure that all system plans are passed through an ROI check to ensure they are actually viable and relevant towards organizational strategic goals. Moreover, leave room in the budget for expansion as there will likely be requirement for system upgrades which are not foreseen at the very start of the project.
3. Review the data centers and ensure they meet all our system requirements. If they do not meet our requirements either work to upgrade these locations or begin a secondary project to move to a more robust data center prior to beginning this active/active project.
Reference
Dhillon, G. (2007). Principles of Information Systems Security: Text and Cases. John Wiley & Sons, Inc.
Digest, A. (2007, March). Migrating Your Application to Active/Active. Retrieved May 13, 2014, from Availability Digest: http://www.availabilitydigest.com/private/0203/aa_ready.pdf
Digest, A. (2009, November). Is the Cost of Converting to Active/Active Worth It? Retrieved May 13, 2014, from Availability Digest: http://www.availabilitydigest.com/public_articles/0411/cost-benefit.pdf
Stern, E. M. (2003). Blueprints for High Availability (2nd ed.). Indianapolis: Wiley Publishing, Inc.
___________________________________________________________________
In terms of risk, the biggest issues the organization is facing are the availability affects the active/active setup will cause, the capital requirements of the project, and the location requirements of the hardware. Each of these must be addressed before the project can be fully realized. That being said, there are definitely ways to resolve each of the issues generated. Moreover, the resolution of these issues will leave the organization in a much better position to handle strategic goals.
Availability falls under three distinct parts: user availability, database availability, and network availability. While a given cutover time during failover of a few seconds to a minute may be acceptable from a user standpoint, anything longer will actually begin to affect user productivity and thus organizational revenue. Concurrent transactions on the database will incur issues whereby data may become locked or corrupted if not handled correctly by the cluster management system. As transactions are happening concurrently, network traffic will increase based on the number of nodes active on the system. Higher traffic can have an effect on both user access and data availability. Not to mention, high lag caused by increased network traffic could lead to false positives in the FMS, thus incurring even further delays to user and data processing.
Capital requirements come from licensing of the active/active software, licensing of additional core application software to run on each node, and the hardware required to run the system itself. While the active/active software is a large consideration for the risk of this project, the organization cannot forget that core application software cannot run on more than one node at a time without additional licensing. Moreover, some software may need to be specially licensed to run in this configuration, thus costing extra for the company. The hardware, on the other hand, will need to be beefed up at the network infrastructure level and the nodes themselves in order to handle the increased traffic the active/active configuration will cause. This means that a further capital outlay will need to be considered before approval can be granted for project spending.
Finally, the location of our data centers must be considered. While they are geographically dispersed, their bandwidth and physical capabilities may not be able to cope with the increased traffic of the system. For instance, the electrical grid, cooling/heating, and bandwidth will all need to be adjusted to account for increased traffic on servers. That is, the data centers will be using more electricity, causing more heat, and using more of the network then our current active/passive setup. All these feed back to the capital requirements, and the availability of the system itself.
The solutions to consider for each of these are as follows:
1. Plan for redundant availability of all system. An active/active setup needs to be over planned as it will increase the throughput of the database and network dramatically. System models and class diagrams explaining how dead locks, unique identifiers, and increased traffic will all be handled must be made available.
2. Ensure that all system plans are passed through an ROI check to ensure they are actually viable and relevant towards organizational strategic goals. Moreover, leave room in the budget for expansion as there will likely be requirement for system upgrades which are not foreseen at the very start of the project.
3. Review the data centers and ensure they meet all our system requirements. If they do not meet our requirements either work to upgrade these locations or begin a secondary project to move to a more robust data center prior to beginning this active/active project.
Reference
Dhillon, G. (2007). Principles of Information Systems Security: Text and Cases. John Wiley & Sons, Inc.
Digest, A. (2007, March). Migrating Your Application to Active/Active. Retrieved May 13, 2014, from Availability Digest: http://www.availabilitydigest.com/private/0203/aa_ready.pdf
Digest, A. (2009, November). Is the Cost of Converting to Active/Active Worth It? Retrieved May 13, 2014, from Availability Digest: http://www.availabilitydigest.com/public_articles/0411/cost-benefit.pdf
Stern, E. M. (2003). Blueprints for High Availability (2nd ed.). Indianapolis: Wiley Publishing, Inc.
___________________________________________________________________
Cloud Computing: Risks vs. Benefits
Cloud computing in its current form is the successor of all client/terminal applications of the past. It has taken what was thought to be a bygone technology and made it relevant once again. Foremost, it eases administrative pressure from Technology business units thus decreasing costs and increasing business revenue. Moreover, it adds a level of always on, always connected applications to organizations. This means employees are unshackled from their desks, allowed to roam within the business, and interact in ways standard desktop applications cannot allow.
However, cloud computing does come with some major risk which organizations must accept, if they wish to leverage all the benefits it has to offer. Primarily is the constant risk of data loss. For instance, when Digital Railroad shut down in 2008, many customers were abruptly left without any data, with little warning (Murabayashi, 2008). This in turn leads to the possibility of loss of ownership. In 2011, Sony Online Entertainment lost the personal information of 24.6 million customers due to an online security breach (Schreier, 2011). While not directly related to their cloud computing, this type of hack is still feasible for any system in the cloud. Finally, while cloud computing can increase availability, it can also be decreased. A great example of this was when Amazon’s S3 servers went offline, many organizations, including Twitter, were left without services (Availability Digest, 2009).
Given these benefits and risks, the types of applications which cloud computing can support are extensive. These include productivity software, web hosting, and IP Telephone to name a few. Each has an availability level which is conducive to the type of atmosphere cloud computing hosts. Moreover, the data loss associated with any failures would be minimal for each of these applications.
There are two major cloud based productivity suites on the market any organization should consider, Google Apps and Microsoft 365 (Google, 2014) (Microsoft, 2014). Each of these offers the full gamut of office applications any corporate employee has come to value and use on a daily basis. They both offer low overhead by allowing licensing to be on a per user basis. There is no need to install anything on any workstation as everything is done through the cloud. However, loss of access does not mean an organization will be brought to a halt as each suite offers an offline mode, if a user so wishes.
When hosting a website, for most businesses, the days of running your own web server are long gone. The sheer number of host providers is staggering. Moreover, having your site hosted in the cloud leads to fewer administrative costs and fewer pieces of infrastructure. In the instance that high availability is an absolute must; a website can be hosted on several providers, each using host redirection in the instance a failure occurs.
There is a level of expectation among most corporate staff that they will have access to a desk phone. However, PSTN is hardly cheap to maintain, and usually incurs large long distance fees if communicating internationally. IP Telephones remove both of these barriers by placing your telephone infrastructure in a cloud. This means that, just like cloud based applications, phone numbers can follow users around an organization. Linking directly to their currently logged in session, ensuring all calls are routed to the correct individual.
However, quoting Shakespeare “all that glitters is not gold.” That is, there are definitely some applications which should not be placed in the cloud. Two main applications being anything proprietarily built by an organization or time sensitive applications such as those used by brokerage firms. The risks for these two types of applications far outweigh the benefits.
Proprietary applications are usually built by an organization to solve a specific issue and place the organization at a competitive advantage to their rivals. By placing these types of applications in a cloud environment, organizations risk losing ownership of their software and/or data. Moreover, having the application out in the public means that the government can at any time subpoena information from the cloud host. If the proprietary information is sensitive to government interference, this is definitely not a good solution.
Brokerage firms operate on a time scale that can be as tiny as a microsecond (0.001millisecond).This means that their applications must be as close to the stock exchange as possible. Cloud computing is not conducive to this type of software as it, by definition, hosts the information on any available infrastructure in the cloud. There are some instances where companies like Amazon have tried to allow organizations to specify where their application is hosted. However, these are still not responsive enough to meet the demands of microsecond transactions.
Cloud computing definitely brings advantages to any organization, so long as the application in question is not time sensitive, data sensitive, or requires an absolute high availability guarantee. The cloud is not new, but it does bring new methods of use which prior technologies could not afford. It unshackles users and allows access to highly used applications anywhere at any time. Nevertheless, use of the cloud must still be treated with caution as there is too great a chance to be burnt when dealing with organizational core applications.
Reference
Availability Digest. (2009, June). The Fragile Cloud. Retrieved May 23, 2014, from The Availability Digest: http://www.availabilitydigest.com/public_articles/0406/fragile_cloud.pdf
Google. (2014). Google Apps. Retrieved May 23, 2014, from Google: http://www.google.com/intx/en_au/enterprise/apps/business/
Husby, M. (2012, May 3). How Cloud Computing Reduces Costs and INcreases Value. Retrieved May 23, 2014, from golime: http://www.golime.co/blog/bid/136271/How-Cloud-Computing-Reduces-Costs-and-Increases-Value
Karena, C. (2012, December 6). Cloud computing costs: do they stack up? Retrieved May 23, 2014, from The Sydney Morning Herald: itpro: http://www.smh.com.au/it-pro/cloud/cloud-computing-costs-do-they-stack-up-20121206-2awwy.html
Microsoft. (2014). Office 365. Retrieved May 23, 2014, from Microsoft: http://office.microsoft.com/en-au/business/what-is-office-365-for-business-FX102997580.aspx
Moore Stephens. (na). The benefits and challenges of cloud computing. Retrieved May 23, 2014, from MOORE STEPHENS: http://www.moorestephens.com/cloud_computing_benefits_challenges.aspx
Murabayashi, A. (2008, November 24). What Happened to Digital Railroad? Retrieved May 23, 2014, from Photoshelter: http://blog.photoshelter.com/2008/11/what-happened-to-digital-railr/
Schreier, J. (2011, February 5). Sony Hacked Again; 25 Million Entertainment Users' Info at Risk. Retrieved May 23, 2014, from WIRED: http://www.wired.com/2011/05/sony-online-entertainment-hack/
Steier, S. (2013, May 29). To Cloud or Not to Cloud: Where Does Your Data Warehouse Belong? Retrieved May 23, 2014, from WIRED: http://www.wired.com/2013/05/to-cloud-or-not-to-cloud-where-does-your-data-warehouse-belong/
___________________________________________________________________
Cloud computing in its current form is the successor of all client/terminal applications of the past. It has taken what was thought to be a bygone technology and made it relevant once again. Foremost, it eases administrative pressure from Technology business units thus decreasing costs and increasing business revenue. Moreover, it adds a level of always on, always connected applications to organizations. This means employees are unshackled from their desks, allowed to roam within the business, and interact in ways standard desktop applications cannot allow.
However, cloud computing does come with some major risk which organizations must accept, if they wish to leverage all the benefits it has to offer. Primarily is the constant risk of data loss. For instance, when Digital Railroad shut down in 2008, many customers were abruptly left without any data, with little warning (Murabayashi, 2008). This in turn leads to the possibility of loss of ownership. In 2011, Sony Online Entertainment lost the personal information of 24.6 million customers due to an online security breach (Schreier, 2011). While not directly related to their cloud computing, this type of hack is still feasible for any system in the cloud. Finally, while cloud computing can increase availability, it can also be decreased. A great example of this was when Amazon’s S3 servers went offline, many organizations, including Twitter, were left without services (Availability Digest, 2009).
Given these benefits and risks, the types of applications which cloud computing can support are extensive. These include productivity software, web hosting, and IP Telephone to name a few. Each has an availability level which is conducive to the type of atmosphere cloud computing hosts. Moreover, the data loss associated with any failures would be minimal for each of these applications.
There are two major cloud based productivity suites on the market any organization should consider, Google Apps and Microsoft 365 (Google, 2014) (Microsoft, 2014). Each of these offers the full gamut of office applications any corporate employee has come to value and use on a daily basis. They both offer low overhead by allowing licensing to be on a per user basis. There is no need to install anything on any workstation as everything is done through the cloud. However, loss of access does not mean an organization will be brought to a halt as each suite offers an offline mode, if a user so wishes.
When hosting a website, for most businesses, the days of running your own web server are long gone. The sheer number of host providers is staggering. Moreover, having your site hosted in the cloud leads to fewer administrative costs and fewer pieces of infrastructure. In the instance that high availability is an absolute must; a website can be hosted on several providers, each using host redirection in the instance a failure occurs.
There is a level of expectation among most corporate staff that they will have access to a desk phone. However, PSTN is hardly cheap to maintain, and usually incurs large long distance fees if communicating internationally. IP Telephones remove both of these barriers by placing your telephone infrastructure in a cloud. This means that, just like cloud based applications, phone numbers can follow users around an organization. Linking directly to their currently logged in session, ensuring all calls are routed to the correct individual.
However, quoting Shakespeare “all that glitters is not gold.” That is, there are definitely some applications which should not be placed in the cloud. Two main applications being anything proprietarily built by an organization or time sensitive applications such as those used by brokerage firms. The risks for these two types of applications far outweigh the benefits.
Proprietary applications are usually built by an organization to solve a specific issue and place the organization at a competitive advantage to their rivals. By placing these types of applications in a cloud environment, organizations risk losing ownership of their software and/or data. Moreover, having the application out in the public means that the government can at any time subpoena information from the cloud host. If the proprietary information is sensitive to government interference, this is definitely not a good solution.
Brokerage firms operate on a time scale that can be as tiny as a microsecond (0.001millisecond).This means that their applications must be as close to the stock exchange as possible. Cloud computing is not conducive to this type of software as it, by definition, hosts the information on any available infrastructure in the cloud. There are some instances where companies like Amazon have tried to allow organizations to specify where their application is hosted. However, these are still not responsive enough to meet the demands of microsecond transactions.
Cloud computing definitely brings advantages to any organization, so long as the application in question is not time sensitive, data sensitive, or requires an absolute high availability guarantee. The cloud is not new, but it does bring new methods of use which prior technologies could not afford. It unshackles users and allows access to highly used applications anywhere at any time. Nevertheless, use of the cloud must still be treated with caution as there is too great a chance to be burnt when dealing with organizational core applications.
Reference
Availability Digest. (2009, June). The Fragile Cloud. Retrieved May 23, 2014, from The Availability Digest: http://www.availabilitydigest.com/public_articles/0406/fragile_cloud.pdf
Google. (2014). Google Apps. Retrieved May 23, 2014, from Google: http://www.google.com/intx/en_au/enterprise/apps/business/
Husby, M. (2012, May 3). How Cloud Computing Reduces Costs and INcreases Value. Retrieved May 23, 2014, from golime: http://www.golime.co/blog/bid/136271/How-Cloud-Computing-Reduces-Costs-and-Increases-Value
Karena, C. (2012, December 6). Cloud computing costs: do they stack up? Retrieved May 23, 2014, from The Sydney Morning Herald: itpro: http://www.smh.com.au/it-pro/cloud/cloud-computing-costs-do-they-stack-up-20121206-2awwy.html
Microsoft. (2014). Office 365. Retrieved May 23, 2014, from Microsoft: http://office.microsoft.com/en-au/business/what-is-office-365-for-business-FX102997580.aspx
Moore Stephens. (na). The benefits and challenges of cloud computing. Retrieved May 23, 2014, from MOORE STEPHENS: http://www.moorestephens.com/cloud_computing_benefits_challenges.aspx
Murabayashi, A. (2008, November 24). What Happened to Digital Railroad? Retrieved May 23, 2014, from Photoshelter: http://blog.photoshelter.com/2008/11/what-happened-to-digital-railr/
Schreier, J. (2011, February 5). Sony Hacked Again; 25 Million Entertainment Users' Info at Risk. Retrieved May 23, 2014, from WIRED: http://www.wired.com/2011/05/sony-online-entertainment-hack/
Steier, S. (2013, May 29). To Cloud or Not to Cloud: Where Does Your Data Warehouse Belong? Retrieved May 23, 2014, from WIRED: http://www.wired.com/2013/05/to-cloud-or-not-to-cloud-where-does-your-data-warehouse-belong/
___________________________________________________________________
Comment Box is loading comments...