Lambda Canary Deployments with CodeDeploy using the CDK

In this blog post I talk about how to perform canary deployments of Lambda functions through AWS CodeDeploy and the AWS CDK.

16 min readFeb 19, 2023

Introduction

Most of us, if not all of us have been there :) We need to deploy a feature change to a critical service with an all at once deployment strategy. We’ve tried to make the ticket size and change as small as possible but there’s no way around it, this change has to go out and there’s a risk that the change might impact the service or it’s users in a way that we haven’t envisaged. That might cause the service or it’s functionality to become unavailable. Even with the world’s most diligent and mature teams who have implemented a CI/CD pipeline containing an automated post deployment quality gate that tests the service functionality before classifying the deployment as successful, and automatically executing the rollback stage in the pipeline for failures. Failure is still going to cause downtime — the service or it’s functionality will not be available for a period of time.

The problem with all at once deployments, where the risk lies, is that the change is deployed to all users, all at once. Once the changed service has been deployed, all new traffic is sent to the newly deployed service. A way to mitigate this risk is to perform a canary deployment. Canary deployments are a form of Blue/Green deployment. With canary deployments we roll out the changed service to a percentage of users for a period of time. If the service doesn’t encounter any errors during that period of time, all traffic is sent to the newly deployed service. If an error occurs during the canary deployment time period, the deployment is rolled back and all traffic is sent to the existing service.

The AWS Lambda Service performs an all at once deployment by default. This means that when a change to a Lambda function has been deployed successfully, 100% of new traffic is sent to the new deployment. Any request currently being processed by the Lambda function will be completed by the previous deployment of the Lambda function but any new traffic will be sent to the newly deployed function. Before we look at how to implement canary deployments with Lambda in the CDK, when to use canary deployments and when we can’t or shouldn’t, it’s best we first understand how versioning and aliases work with AWS Lambda.

Lambda Versioning

A new lambda version is created when a lambda is published not deployed, the new version is a cloned snapshot of the code and configuration currently deployed as the $LATEST version. A function version contains the following.

The function code and all associated dependencies.
The Lambda runtime identifier and runtime version used by the function.
All the function settings, including the environment variables.
A unique Amazon Resource Name (ARN) to identify the specific version of the function.

The above has been taken from the AWS Lambda documentation. By default, you don’t have to publish a version when deploying a lambda function. If you don’t publish a version, the deployed code and configuration is deployed as the $LATEST version overriding the current code and configuration stored as the $LATEST deployment. Deploying an unpublished version overrides the current code and configuration stored as the $LATEST deployment.

There are two main rules with Lambda versioning:

Lambda version numbers cannot be custom version numbers. They are integer values that increment automatically on publish. Even if a lambda is deleted, re-deployed and published, the last version number of that lambda is incremented.
Lambda doesn’t allow you to publish a new version if the code and configuration currently deployed as $LATEST is the same as the code and configuration of the previously published version.

Users and other services can access specific versions of a lambda function using a qualified ARN. All functions have an unqualified ARN as demonstrated below.

arn:aws:lambda:aws-region:acct-id:function:helloworld

Using the above unqualified function ARN will invoke the $LATEST version of the function. Specific versions of a lambda function can be invoked by services and users using a qualified ARN specifying the version number to invoke as the suffix, as shown below.

arn:aws:lambda:aws-region:acct-id:function:helloworld:9

Lambda Aliases

Lambda function aliases act as a pointer to a Lambda function version and can be used by users and other services to access specific versions of a function through the alias name. With aliases, a service such as AWS API Gateway can be configured to point to an alias. The alias can then be changed to point to a different version of the lambda function. In this scenario, API Gateway would be configured to access a specific lambda alias but the alias can be changed to point to a different lambda version. Nothing changes from the API Gateway integration configuration but the version of the lambda associated to the alias can be changed without any modification to API Gateway.

As with versions, specific aliases can be accessed using the functions qualified ARN specifying the alias to use as the ARN suffix. For instance, a function might have an unqualified ARN like below.

arn:aws:lambda:aws-region:acct-id:function:helloworld

If we were to create an alias named “test” on the hello world function, we could access the “test” alias of the hello world function using the following qualified ARN, specifying the “test” alias as the suffix.

arn:aws:lambda:aws-region:acct-id:function:helloworld:test

In 2017, AWS added functionality to Lambda Aliases allowing traffic shifting of lambda function versions based on pre-assigned weights. This functionality allows traffic to be gradually shifted from one lambda version to another over a period of time based on a predefined percentage. Using the AWS CLI, AWS Console, AWS CloudFormation, AWS SDK, SAM or the CDK, developers can configure an alias to shift a percentage of all traffic to a newly published version of a lambda function for a period of time. When that period of time has elapsed, 100% of traffic is sent to the newly published function version.

Before we look at how we can bypass the “all at once” default deployment functionality used by AWS Lambda, we need to look at the other service required to allow us to perform advanced deployment techniques alongside Lambda Aliases, AWS CodeDeploy.

AWS CodeDeploy

AWS CodeDeploy is a managed service that allows teams to automatically deploy changes to on-premise and AWS compute services utilising advanced deployment techniques.

CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless Lambda functions, or Amazon ECS services.
CodeDeploy makes it easier for you to:
- Rapidly release new features.
- Update AWS Lambda function versions.
- Avoid downtime during application deployment.
- Handle the complexity of updating your applications, without many of the risks associated with error-prone manual deployments.

Advanced linear and canary deployment techniques are provided by AWS CodeDeploy through the following predefined Deployment Configurations.

AWS CodeDeploy predefined deployment configurations sourced from AWS documentation

It is also possible to create custom deployment configurations via the AWS Console, AWS CLI or CloudFormation if the provided predefined configurations don’t meet your needs. Below is an example taken from the AWS documentation of how to create a custom lambda deployment configuration named “Canary25Percent45Minutes” using the AWS CLI where 25% of the traffic is sent to the new version of the Lambda when first deployed. The remaining 75% is shifted to the Lambda function 45 minutes later.

aws deploy create-deployment-config --deployment-config-name Canary25Percent45Minutes --traffic-routing-config "type="TimeBasedCanary",timeBasedCanary={canaryPercentage=25,canaryInterval=45}" --compute-platform Lambda

An example of creating the same custom deployment configuration using the CDK is provided below.

const config = new codedeploy.LambdaDeploymentConfig(this, 'CustomConfig', {
  trafficRoutingConfig: new codedeploy.TimeBasedCanaryTrafficRoutingConfig({
    interval: cdk.Duration.minutes(45), 
    percentage: 25, 
  }), 
  deploymentConfigName: 'Canary25Percent45Minutes', 
});

With either a predefined or custom deployment configuration selected, we can use AWS CodeDeploy to perform the advanced deployment safely with minimal risk by creating an AWS CodeDeploy deployment that executes and completes the deployment configuration if no errors are thrown by the new lambda function version during the initial traffic shifting period. This is achieved by specifying one or many CloudWatch Alarms that when triggered, would cause the deployment to halt and a roll back to start where 100% of traffic is shifted back to the original function version. Typically, one alarm that is triggered when an error is thrown by the lambda function is associated with the deployment but more than one alarm can be used. It is also possible to configure the alarm to send an SNS notification when the alarm is triggered, notifying teams of a failed deployment.

Now that we have covered how versioning and aliases work with AWS Lambda and how CodeDeploy can be used to perform blue green deployments, let’s look at putting it all together with the CDK.

Putting it all together with the CDK

In a previous post I discussed the benefits of using custom L3 CDK Constructs to ensure organisational standards, default configuration is provided and mandatory configuration is enforced. The post also demonstrates how doing so would enable teams to develop and deliver value faster with a guaranteed quality threshold since most of the usual boilerplate configuration used when creating resources in the CDK would be provided through the custom L3 Construct. I am going to demonstrate how to implement canary deployments in this blog post in the same way. The resources required to perform the canary deployment will be provided in the same L3 Lambda CDK Construct introduced in the previous post as a method that can be optionally called. The code shown in this blog post can be found in the accompanying GitHub repository linked below.

GitHub - JCDubs/cdk-canary-deployments: A demo project showing how to implement code deploy canary…

A demo project showing how to implement code deploy canary deployments with a stack that is deployed with a non AWS…

github.com

The demo CDK project contains a single, simple create product POST API created using AWS API Gateway and AWS Lambda. The API is backed by a DynamoDB database which is managed in the CDK statefull stack. The API Gateway and AWS Lambda distributions are managed in the CDK stateless stack. The diagram below provides an archetectual and deployment view of the stacks.

The L3 Lambda Construct is located in the L3/lambda.ts file and is no different to the construct shown in the previous post exept for the additional “asBlueGreenDeployment” method that contains the logic to create the canary deployment. The method is shown in the below code snippet.

/**
* Set the function as a blue green deployment.
* @param {LambdaDeploymentConfig} lambdaDeploymentConfig - Optional LambdaDeploymentConfig value.
* @default {LambdaDeploymentConfig.ALL_AT_ONCE}
* @returns {QualifiedFunctionBase} - The alias created for the blue green deployment.
*/
asBlueGreenDeployment(
  lambdaDeploymentConfig?: ILambdaDeploymentConfig
): QualifiedFunctionBase {
  const newVersion = this.currentVersion;
  newVersion.applyRemovalPolicy(RemovalPolicy.RETAIN);

  const alias = new Alias(this, "BlueGreenAlias", {
    aliasName: "live",
    version: newVersion,
  });

  const failureAlarm = new Alarm(this, "DeploymentAlarm", {
    metric: alias.metricErrors(),
    threshold: 1,
    alarmDescription: `${this.functionName} ${newVersion.version} blue green deployment failure alarm`,
    evaluationPeriods: 1,
  });

  new LambdaDeploymentGroup(this, "LambdaDeploymentGroup", {
    alias,
    deploymentConfig:
      lambdaDeploymentConfig ?? LambdaDeploymentConfig.ALL_AT_ONCE,
    alarms: [failureAlarm],
  });
  return alias;
}

Let’s talk through the contents of the method. The “asBlueGreenDeployment” method receives an optional method argument of type “LambdaDeploymentConfig” that can be used to specify a deployment configuration to use instead of the default “LambdaDeploymentConfig.ALL_AT_ONCE” configuration. The default deployment configuration has been set as “ALL_AT_ONCE” in the above code but the configuration used should be based on the company/team deployment policy. It might make more sense for a company/team to enforce a default canary deployment policy based on the criticality of the majority of their services and risk.

The method first calls the “currentVersion” getter method of the lambda function to retrieve the current lambda version. If a current version doesn’t exist, the “currentVersion” method creates a new version. The “asBlueGreenDeployment” method then creates a Lambda alias providing an aliasName of “CanaryDeployment” and the retrieved version as the alias version. As mentioned previously, one or many alarms can be provided in the deployment to specify when the deployment should be cancelled and rolled back. In this case, a new alarm is created to trigger if an error is thrown by the new lambda function version in the given evaluation period. A lambda deployment group is then created using the CDK CodeDeploy “LambdaDeploymentGroup” Construct providing the alias, default or overridden deployment configuration and the alarm. Finally the created alias is returned to the caller.

The “asBlueGreenDeployment” method can then be called whenever the custom L3 Lambda construct is used to create a Lambda function. The code snippet below demonstrates this.

export class ProductStatelessStack extends cdk.Stack {
  /**
   * Create the ProductStatelessStack stack.
   * @constructor
   * @param {Construct} scope
   * @param {string} id
   * @param {ProductStatefulStackProps} props
   */
  constructor(scope: Construct, id: string, props: ProductStatelessStackProps) {
    super(scope, id, props);

    // Create the API.
    const productApi = API.create(this, "ProductApi", {
      ...props,
      apiName: "ProductApi",
      description: "Product API",
      deploy: true,
    });

    // Create the create product lambda
    const createProductLambda = Lambda.create(this, "CreateProduct", {
      entry: path.join(
        __dirname,
        "../src/handler/create-product-function/index.ts"
      ),
      description: "Create a product",
      serviceName: "createProduct",
      environment: {
        TABLE_NAME: props.productTable.tableName,
      },
    });

    // Create the blue green deployment as a 10% percent canary over 15 minutes.
    const createProductAlias = createProductLambda.asBlueGreenDeployment(
      LambdaDeploymentConfig.CANARY_10PERCENT_15MINUTES
    );

    props.productTable.grantReadWriteData(createProductLambda);

    // Create the POST endpoint with an integration pointing to the canary alias.
    productApi.addEndpoint({
      resourcePath: "/product",
      method: HttpMethod.POST,
      function: createProductAlias,
    });
  }
}

In the above code snippet, the product API is created along with the create product lambda function. The custom L3 API Gateway and Lambda construct is used to create both. The “asBlueGreenDeployment” method of the lambda instance is then called providing the “LambdaDeploymentConfig.CANARY_10PERCENT_15MINUTES” deployment configuration before granting the function read write permission on the database. The above call to the “asBlueGreenDeployment” method overrides the default “ALL_AT_ONCE“ deployment configuration by providing the “CANARY_10PERCENT_15MINUTES” deployment configuration. The custom “addEndpoint” function of the custom L3 API Construct is finally called providing the endpoint path, HTTP Method verb and the created Lambda function canary alias.

We can deploy the stacks in the repository by executing the “cdk deploy — all” command. A canary deployment wouldn’t be performed on initial creation of the Lambda function since a previous version of the function doesn’t exist but canary deployments would be performed on subsequent deployments. Below is a screenshot of the canary deployment of the create product function in the AWS CodeDeploy console.

Canary deployment of the create product lambda in AWS CodeDeploy

After fifteen minutes the canary deployment would complete shifting 100% of traffic to the newly published lambda version. The screenshot below shows the completed canary deployment in the AWS CodeDeploy console.

Completed canary deployment of the create product lambda in AWS CodeDeploy

If we were to modify the create product function to throw an error on invocation and call the endpoint during the initial fifteen minute traffic shifting period, the canary deployment would be cancelled and alias would be rolled back to send 100% of the traffic to the previously deployed version. The screenshot below shows the failed deployment in the AWS CodeDeploy console.

Failed Canary deployment in the AWS CodeDeploy console

The above code and configuration seems pretty simple and as demonstrated, can be made simpler by standardising team deployment process by utilising sharable custom L3 CDK Constructs. There are some caveats and considerations that need to be addressed before fully using blue green canary deployments in this way. I’ll list and discuss each of them below.

Unchanged lambda deployments

The resources created from the call to the “asBlueGreenDeployment” method required to perform canary deployments become part of the stack and are subject to the usual functionality of CDK stacks. They are diff’d against the currently deployed resources and created, updated or deleted as required. This doesn’t cause any problems on it’s own but the call to the “currentVersion” getter method of the Lambda function does. As mentioned above, the “currentVersion” getter method creates a new version of the lambda function and is called every time we perform a deployment of the stateless stack. The call to the “currentVersion” getter method is unavoidable since it is required when creating the Alias resource which is subsequently used to create the LambdaDeploymentGroup resource. A new Lambda function version resource is created every time the stack is converted to a CloudFormation template and deployed through CloudFormation, publishing a new Lambda version.

Why does this cause a problem? As mentioned above there are a number of rules that are enforced when publishing a new version of a Lambda function. The main rule is that you can’t publish a new version of the function if the currently deployed code and configuration being published as the new version is the same as the previously published version. If we deploy the stack without any changes to a Lambda function’s code or configuration where the “asBlueGreenDeployment” method is called, a new version of the function is published with the code and configuration matching the previously published version. This doesn’t cause an error during synthesis of the CloudFormation template but an error is thrown when CloudFormation attempts to publish the new Lambda function version when creating the Version resource. The below screenshot shows the thrown version error during CDK deploy.

Screenshot of error thrown in the terminal from unchanged lambda deployment

The best and most efficient way of addressing this problem would be to conditionally create the new version in the CDK code if the Lambda’s code or configuration has changed. There doesn’t seem to be a way of determining this through the CDK. It could possibly be achieved through the Lambda and/or Lambda Version SDK but I haven’t come across or been able to create this sort of solution. If we were able to conditionally create the new version, it would also mean that creation of the alias and deployment group would also be subject to this condition. This would mean that in the case where the stack is deployed where a new function version isn’t created, the CodeDeploy resources created alongside the deployment group would be removed from the stack. This would include traces of all previous canary deployments which might want to be kept for audit purposes.

There are a number of ways we can work around unchanged Lambda versioning problem.

Optionally use the asBlueGreenDeployment method

As mentioned above, the “asBlueGreenDeployment” method has been provided as an optional method that developers can choose to implement and use for each deployment. This is instead of adding the resources required for canary deployments to the Lambda Construct constructor. For deployments where lambda functions within a stack don’t contain any changes, the call to the “asBlueGreenDeployment” method could be removed and used as and when an advanced deployment technique is required. Removing the call to the “asBlueGreenDeployment” method would cause all CodeDeploy resources to be removed from the stack and re-created when the call to the method is added back in. As mentioned above, this would mean that all previous deployments would be removed which might want to be kept for audit purposes.

Issues with this work around might come from teams forgetting to add the call to the “asBlueGreenDeployment” when they should perform a canary deployment which could cause downtime. Mitigation could be provided through stringent code review processes but this couldn’t be classed as a fool proof solution.

Lambda Configuration Changes

As mentioned a number of times in this post, a lambda version contains the code and configuration of a published lambda. This configuration includes the function environment variables. With this in mind, to get around this error we could ensure that the lambda configuration changes on every deployment by setting a version environment variable. This would involve modifying the deployment pipeline to perform the “npm version” task either pre or post deployment. The project version located in the “package.json” file can then be used as the value of a new “VERSION” lambda environment variable. This ensures that the new version of the lambda created on every deployment is different to the previously published version.

Screenshot of the project version and pre:deploy task

The above screenshot shows how the npm version command can be implemented as a pre deploy task as well as the project version.

import packageJson from "../../package.json";

...

let defaultEnvironment = {
  LOG_LEVEL: LogLevel.DEBUG,
  POWERTOOLS_LOGGER_LOG_EVENT: "true",
  POWERTOOLS_LOGGER_SAMPLE_RATE: "1",
  POWERTOOLS_TRACE_ENABLED: "enabled",
  POWERTOOLS_TRACER_CAPTURE_HTTPS_REQUESTS: "captureHTTPsRequests",
  POWERTOOLS_TRACER_CAPTURE_RESPONSE: "captureResult",
  VERSION: `${packageJson.version}`,
};

...

/**
   * Create an instance of Lambda setting fixed and default
   * configuration.
   * @param {Construct} scope -  CDK Construct.
   * @param {string} id - Id of the resource.
   * @param {ILambdaProps} props - Properties to be applied when creating the Lambda.
   * @returns {Lambda} A new instance of Lambda.
   */
  static create(scope: Construct, id: string, props: ILambdaProps): Lambda {
    const lambda = new Lambda(scope, id, {
      ...props,
      ...defaultProps,
      ...fixedProps,
      environment: {
        ...(props.environment ?? {}),
        ...fixedProps.environment,
        POWERTOOLS_METRICS_NAMESPACE: props.serviceName,
      },
      functionName: namingUtils.createResourceName(
        props.serviceName,
        this.resourceType
      ),
      currentVersionOptions: {
        removalPolicy: RemovalPolicy.RETAIN,
        description: `Version deployed on ${new Date().toISOString()}`,
      },
    });
    return lambda;
  }

The above code snippet is of the custom L3 Lambda construct with a change to set the “VERSION” lambda environment variable as the semantic project version number located in the package.json file. The contents of the package.json file is imported first before setting the VERSION environment variable as the “packageJson.version” value. This will ensure that the configuration in all lambdas created using the custom L4 Lambda Construct would change on every deployment.

Another topic to cover when discussing blue green or canary deployments is when they should be used. I’ve previously stated in this post that advanced deployment techniques such as canary or linear deployments should be used when deploying changes to critical systems where there is a risk of the change causing downtime. There is also a scenario when advanced deployments shouldn’t be used because of technical constraints such as lambda permission changes.

Lambda Permission Changes

CodeDeploy advanced deployment techniques such as canary and linear deployments rely on an alarm to determine whether the function has been deployed successfully, rolling back the deployment if the provided alarm has been triggered. A highly likely reason for a published version to throw an error is a mismatch in lambda permissions between a newly and previously published lambda version.

It is advised that advanced deployment techniques such as canary and linear deployments are not used when deploying a lambda permission change and the “ALL_AT_ONE” deployment configuration is used instead. Regardless of whether you choose to optionally call the “asBlueGreenDeployment” method in the stack or add a “VERSION” lambda environment variable to get around the version error shown above, the “asBlueGreenDeployment” method has been created with this requirement in mind. The “asBlueGreenDeployment” method implements the “ALL_AT_ONCE” deployment configuration by default for times where advanced deployment configurations shouldn’t be used. In this case, the method call should be used in the following way.

// Create the blue green deployment using the default all at once deployment.
const createProductAlias = createProductLambda.asBlueGreenDeployment();

This functionality allows teams to keep the call to the “asBlueGreenDeployment” method in the code even when they need to perform an all at once deployment.

Wrapping Up

Thank you for taking the time to read my post and I hope you find it useful or at least as a starting point for your own investigations into Serverless blue green deployments. As shown, the logic and resources required to configure advanced deployment techniques is simple to implement but the complications come with how we handle unchanged lambda function version management in the CDK code and CloudFormation deployments. I would be extremely interested in seeing other solutions to this problem. Please feel free to reach out to me or post a comment on the subject if you have come across a solution.

If you are interested in being notified of my future posts, feel free to follow me on Medium, Linkedin and Twitter.