I am currently in the process of setting up an ECS on EC2 service. My goal is to implement a blue/green deployment strategy where, upon deploying a new version of my application, ECS will momentarily double the number of tasks to accommodate the new version before terminating the old tasks once the new ones are running smoothly.
This method is widely used for deploying applications to ensure minimal downtime for users.
Following recommendations from various AWS blogs, I have utilized a capacity provider in my CDK project. The constructs within my CDK project include:
const autoscalingGroup = new AutoScalingGroup(this, `${this.id}-autoscaling-group`, {
maxCapacity: 2,
minCapacity: 1,
minHealthyPercentage: 100,
maxHealthyPercentage: 200,
desiredCapacity: 1,
...
})
capacityProvider = new AsgCapacityProvider(this, `${this.id}-capacity-provider`, {
autoScalingGroup: autoscalingGroup,
enableManagedScaling: true,
enableManagedTerminationProtection: true,
})
cluster = new Cluster(this, `${this.id}-ecs-cluster`, {
...
})
cluster.addAsgCapacityProvider(capacityProvider)
new Ec2Service(this, `${this.id}-ec2-service`, {
cluster: ecsCluster,
...
})
The above CDK code effectively sets up two infrastructure components and establishes connections between them.
The first component consists of an ECS cluster and service responsible for initiating new tasks for my application, while the second component is an ASG tasked with launching servers for the application's tasks to run on. (Fargate could potentially eliminate the need for this second layer, but that is not the current focus).
Additionally, a capacity provider is created and assigned to the cluster so that it knows where to locate EC2 instances for its ECS services.
During the initial deployment, everything runs smoothly - ASG initiates a new EC2 instance and ECS launches a task on it.
Issues arise during subsequent deployments. At this stage, one would expect ECS to instruct ASG to scale up by adding another instance to match the new task requirement (each instance in my setup accommodates 1 task). Unfortunately, ECS fails to do so and remains stuck without finding a suitable instance to place a new task.