- November 18, 2022
- Posted by: Indium
- Category: AWS
Undisrupted, continuous service is a must in today’s world for customer satisfaction, even during calamities and disasters. Therefore, building and managing resilient applications is a business need, albeit building and maintaining distributed systems are just as challenging. And, being prepared for failures at a critical hour is just as essential. Not only should there be no downtime of the application, referring to the software or the code, but also the entire infrastructure stack consisting of networking, databases, and virtual machines, among others, needed to host the application.
Keeping track of the resilience of the system helps ensure its robustness even in case of disasters and other disruptions. There are two measures used to assess the resiliency of the apps. These include:
- Recovery Time Objective (RTO): the time needed to recover from a failure
- Recovery Point Objective (RPO): in case of an accident, the maximum window of time during which the data might be lost.
Based on the needs of the business and the nature of the application, the two metrics can be measured in terms the seconds, minutes, hours, or days.
AWS Resilience Hub
With AWS Resilience Hub, the RTO and RPO objectives can be defined for each of the applications an organization runs. It facilitates assessing the applications’ configuration to ensure the requirements are met. Actionable recommendations and a resilience score help to finetune the application and track its resiliency progress over time. An AWS Management Console provides customizable single dashboard access that allows:
- Running assessments,
- Executing prebuilt tests
- Configuring alarms to determine the issues
- Alerting the operators
With AWS Resilience Hub, applications deployed by AWS CloudFormation, such as SAM and CDK, can be discovered, even across regions and in cross-account stacks. Applications can be discovered either from Resource Groups and tags or those already defined in the AWS Service Catalog AppRegistry
Check this out: Cloud Computing On Aws
Some of the benefits of AWS Resilience Hub include:
Assessment and Recommendations: AWS Resilience Hub uses AWS Well-Architected Framework best practices for resilience assessment. This helps analyze the application components and discover possible resilience weaknesses caused by:
– Incomplete infrastructure setup
It also helps to identify additional configuration improvement opportunities. To improve the application’s resilience, Resilience Hub provides actionable recommendations.
Resilience Hub validates the Amazon Relational Database Service (RDS), Amazon Elastic File System (Amazon EFS) backup schedule, and Amazon Elastic Block Store (EBS) of the application to meet the RPO and RTO as defined in the resilience policy. If not, then it recommends appropriate improvements.
Resilience assessment facilitates recovery procedures by generating code snippets. As part of the standard operating procedures (SOPs), AWS Systems Manager creates documents for the applications. Moreover, a list of recommended Amazon CloudWatch monitors and alarms is created to enable quickly identifying any changing the application’s resilience posture on deployment.
Continuous Validation Resilience
Once the recommendations and SOPs from the resilience assessment are updated, the next step is to test and verify to ensure that the application meets the resilience targets before being released into production. AWS Fault Injection Simulator (FIS) is a fully managed service that allows Resilience Hub to run experiments on AWS to detect real-world failures, including network errors or several open connections to a database. Development teams can also integrate their resilience assessment and testing into their CI/CD pipelines using APIs available in the Resilience Hub for validating ongoing resilience. This prevents any compromise to resilience in the underlying infrastructure.
The AWS Resilience Hub dashboard provides a holistic view of the application portfolio resilience status, enabling tracking of the resilience of applications. It also aggregates and organizes resilience events, alerts, and insights from services such as AWS Fault Injection Simulator (FIS) and Amazon CloudWatch. A resilience score generated by the Resilience Hub provides insights into the level of implementation for recommended resilience tests, recovery SOPs, and alarms. This can help measure improvements to resilience over time.
You might be interested in this: Using AWS for Your SaaS application–Here’s What You Need to Do for Data Security
Resilience Hub Best Practices On deploying an AWS partner application into production, Resilience Hub helps to track the resiliency posture of the application, notifies in case of an outage, and helps to launch the associated recovery process. For its effective implementation, the best practices include:
Step 1-Define: The first step is to identify and describe the existing AWS application that needs to be protected from disruptions and then define the resiliency goals. To form the structural basis of the application in Resilience Hub, resources need to be imported from:
– AWS CloudFormation stacks
– Terraform state files
– Resource groups
An existing application can be used to build an existing structure and then attach the resiliency policy. The policy should include information and objectives required to assess the application’s ability to recover from a disruption type, either software or hardware. The resiliency policy should include a definition of the RTO and RPO for the disruption types, which will help evaluate the application’s ability to meet the resiliency policy.
Step 2-Assessing: Run a resiliency assessment on describing the application and attaching the resiliency policy to it to evaluate the application configuration and generates a report. This report reveals how the application meets the resiliency policy goals.
Step 3-Recommendations: The Resilience Hub generates recommendations based on the assessment report that can be used to update the application and the resiliency policy. These could be regarding configurations for components, tests, alarms, and recovery SOPs. The improvement can be assessed by running another assessment and comparing the results with the earlier report. By reiterating this process, the RTO and RPO goals can be achieved.
Step 4-Validation: To measure the resiliency of the AWS resources and the time needed to recover from outages to application, infrastructure, Availability Zone, and AWS Region, run simulation tests such as failovers, network unavailable errors, stopped processes, problems with your Availability Zone, and Amazon RDS boot recovery. This can help assess the application’s ability to recover from the different outage types.
Step5-Tracking: Resilience Hub can continue to track the AWS application posture after deploying it into production. In case of an outage, it can be viewed in Resilience Hub and the associated recovery process launched.
Step 6-Recovery After Disruption: During application disruption, Resilience Hub can help detect the type of disruption and alert the operator, who can launch the SOP associated with the type for recovery.
Indium Software, an AWS partner, can help you ensure undisrupted application performance by implementing an effective AWS Resilience Hub for your applications based on your business objectives.