Triggering ECS RunTask from AWS CloudFormation

Advertisements

Inevitably there will come a time where you’re deploying an application on Amazon ECS and you’ll need to fire a single-run command on deployment. This could be something like a database migration or something else that sets up your environment post-deployment that can’t, or shouldn’t be run within your container’s startup sequence (by using entrypoints/bootstrap scripts). You’ll be using CloudFormation (as you should) and you’ll stumble across a bit of an issue..

You can’t trigger RunTask natively with a CloudFormation script.

Well that’s a bummer, eh? Well, hold your horses. This is AWS we’re talking about. There’s a way to achieve pretty much anything on there! I tweeted about it when I hit this brick wall, and the wonderful Clare Liguori pointed me in the direction of custom resources backed by Lambda.

“Ooooo”, I thought. I’d actually played around with custom resources a while back, and had kinda forgot they even existed. So I got to work trying to implement this, and do it in such a way that the Lambda function can be re-usable by other CloudFormation stacks (or even by the same stack for future use), should I deem it necessary. This is what I came up with:

Job 1: write the Lambda function

I’m a Python buff, so naturally steered towards writing this in Python. If that’s not your language of choice and want to implement it in another language, have at it. The core concept is here:

import boto3
from botocore.vendored import requests
import json

def lambda_handler(event, context):
    lambda_response = {
        'StackId': event['StackId'],
        'RequestId': event['RequestId'],
        'LogicalResourceId': event['LogicalResourceId'],
    }
    
    status = True
    reason = ''
    resource_id = None
    
    if 'PhysicalResourceId' in event:
        resource_id = event['PhysicalResourceId']
    
    try:
        status, resource_id = event_handler(event, context)
    except Exception as e:
        status = False
        reason = "Check CloudWatch Logs for errors"
    finally:
        lambda_response['Status'] = 'SUCCESS' if status else 'FAILED'
        lambda_response['Reason'] = reason
        
        if resource_id:
            lambda_response['PhysicalResourceId'] = resource_id
        
        reply_url = event['ResponseURL']
        
        if 'Test' not in event:
            requests.put(reply_url, data=json.dumps(lambda_response))

def event_handler(event, context):
    if event['RequestType'] in ['Create', 'Update']:
        properties = event['ResourceProperties']
        
        network_configuration = properties['NetworkConfiguration']
        awsvpc_configuration = network_configuration['AwsvpcConfiguration']
        
        environment = []
        for env in properties['Environment']:
            environment.append({
                'name': env['Name'],
                'value': env['Value'],
            })
        
        ecs = boto3.client('ecs')
        response = ecs.run_task(
            cluster=properties['ClusterId'],
            taskDefinition=properties['TaskDefinition'],
            launchType=properties['LaunchType'],
            networkConfiguration={
                'awsvpcConfiguration': {
                    'subnets': awsvpc_configuration['Subnets'],
                    'securityGroups': awsvpc_configuration['SecurityGroups'],
                    'assignPublicIp': awsvpc_configuration['AssignPublicIp'],
                }
            },
            overrides={
                'containerOverrides': [
                    {
                        'name': properties['ContainerName'],
                        'command': properties['Command'],
                        'environment': environment,
                    },
                ],
            },
            count=1,
        )
        ### if you don't want to wait for the task to complete, remove the below ###
        waiter = ecs.get_waiter('tasks_stopped')
        waiter.wait(
            cluster=properties['ClusterId'],
            tasks=[response['tasks'][0]['taskArn']],
        )
        
        describe_result = ecs.describe_tasks(
            cluster=properties['ClusterId'],
            tasks=[response['tasks'][0]['taskArn']],
        )
        
        task = describe_result['tasks'][0]
        container = task['containers'][0]
        
        exit_code = container['exitCode']
        return exit_code == 0, container['taskArn']
        ### end section to remove for wait condition ###
        
    return True, 'N/A'

This function performs actions on your AWS account to your ECS cluster on your behalf, so it also needs some permissions. Create an IAM role for it with the following minimum permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ecs:DescribeTasks",
                "ecs:RunTask"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": [
                "arn:aws:iam::111122223333:role/ecsTaskExecutionRole",
                "arn:aws:iam::111122223333:role/MyECSClusterIAMRole"
            ]
        }
    ]
}

You can test it by creating a custom test within Lambda using the following payload:

{
  "Test": true,
  "RequestType": "Update",
  "ServiceToken": "arn:aws:lambda:eu-west-1:111122223333:function:cfn-ecs-runtask",
  "ResponseURL": "https://doesnt.matter.com",
  "StackId": "arn:aws:cloudformation:eu-west-1:182081368311:stack/MyCloudFormationStack/00000000-0000-0000-0000-000000000000",
  "RequestId": "00000000-0000-0000-0000-000000000000",
  "LogicalResourceId": "AppMigrateDefault",
  "PhysicalResourceId": "N/A",
  "ResourceType": "Custom::ECSRunTask",
  "ResourceProperties": {
    "ServiceToken": "arn:aws:lambda:eu-west-1:111122223333:function:cfn-ecs-runtask",
    "TaskDefinition": "arn:aws:ecs:eu-west-1:111122223333:task-definition/my-task-defintion:1",
    "ClusterId": "MyECSClusterId",
    "ContainerName": "webapp",
    "Command": [
      "python",
      "manage.py",
      "migrate",
      "--noinput",
      "--database=default"
    ],
    "Environment": [],
    "Iteration": "1",
    "NetworkConfiguration": {
      "AwsvpcConfiguration": {
        "SecurityGroups": [
          "sg-00000000000000000"
        ],
        "Subnets": [
          "subnet-00000000000000000",
          "subnet-00000000000000001",
          "subnet-00000000000000002"
        ],
        "AssignPublicIp": "DISABLED"
      }
    },
    "LaunchType": "FARGATE"
  }
}

Note: we include a “Test” flag at the top of the payload, and a check within the Lambda function to examine whether this flag is present. If it is, it just goes through its motions, actually running the task requested, but prevents it from calling back to CloudFormation with a result. I’m not sure if needlessly calling back to the endpoint multiple times is a good idea, or has any side effects, but it’s always best to not bother any systems that aren’t yours while you’re testing! 🙂

Job 2: add a custom resource to the CloudFormation template

Now we need to integrate this into our template, and that’s easily done.

Note: I use iidy to manage my templates so I can use includes and build DRY templates, so some of the syntax might seem a little…off. If you’re using raw CloudFormation, you’ll need to update it accordingly. I highly recommend checking out iidy, I swear it’s made my life so much easier!

Resources:
  AppMigrate:
    Type: Custom::ECSRunTask
    Properties:
      ServiceToken: arn:aws:lambda:eu-west-1:111122223333:function:cfn-ecs-runtask
      Iteration: 2
      ClusterId: !Ref ECSCluster
      TaskDefinition: !Ref AppTask
      ContainerName: webapp
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: DISABLED
          SecurityGroups:
            - !GetAtt [AppInternalSecurityGroup, GroupId]
          Subnets:
            - !Ref PrivateSubnet1a
            - !Ref PrivateSubnet1b
            - !Ref PrivateSubnet1c
      Command:
        - python
        - manage.py
        - migrate
        - --noinput
        - --database=default
      Environment: !$ common.Environment

The key points here are;

  • It naturally has to be able to map into your container, feeding in any required environment variables and everything that your application needs to run. Adjust as necessary.
  • The resource gets marked as changed whenever your task definition changes (which would trigger a deployment).

So when your task definition gets updated (presumably by the container’s image URL being updated), because the task definition’s revision is also part of this custom resource’s parameters, it’ll re-trigger your lambda function with the new details. The lambda function will then go off and trigger RunTask with the command you pass in.

More importantly, the Lambda task will wait around for the task to complete and return an exit code. It then uses this exit code to determine whether to send a SUCCESS to FAILED response back to CloudFormation. This can have the effect of rolling back your deployment if your RunTask fails. If this is not your desired behaviour, remove the section commented in the function above that does the waiting/exit code check.

Join the Conversation

1 Comment

  1. Hi Dan,

    Thanks for your great article. This is something I’m looking for.

    I understood the idea of using CloudFormation to run the ECS Task with Lambda function but still a bit confused:

    In the CloudFormation, resource AppMigrate, do I need to define any new ContainerDefinitions or it will use the current containers from TaskDefinition (I already have AppTaskDefinition with container app & nginx). Also, the ContainerName is the name of the container in AppTaskDefinition, right?

    Hope you can help,
    Have a good day

Leave a comment

Your email address will not be published. Required fields are marked *