A Versioning Strategy for Serverless Applications

Published in

The Startup

18 min readJul 20, 2020

When building serverless applications, it’s difficult to maintain comprehensive version control. Even the simplest application will bring together several of the services your cloud provider offers, and it’s often quicker to write the code and set up the configuration directly in the console when you’re prototyping, rather than trying to script everything out. When it comes time to update the app, you can’t risk affecting the live instance, so you must try to replicate all the code and configuration to a secondary instance, or even a whole separate AWS account to ensure you can work on it and test new changes.

Photo by Glenn Carstens-Peters on Unsplash

I want to define and script a strategy to accomplish two goals:

A nicely structured project under sound version control that contains all the configuration and code as comprehensively as possible for consistent deployment to the cloud.
Multiple ‘stages’ in the cloud, so that I can deploy my application to stage the next release without touching anything that the that live application depends on.

A Framework and a CLI

The best solution for goal #1 is a good framework with a command-line interface (CLI) that builds and deploys everything. Either you have to build it yourself using shell scripts or a build tool like Gulp or Grunt, or for Amazon Web Services, there is the AWS Serverless Application Model with the AWS SAM CLI.

The SAM CLI can scaffold a sample project, in which you can define your own AWS resources and write the code you need, and then SAM will build it into the template files needed for AWS CloudFormation and package and deploy them to your AWS account.

Isolated Clouds

For goal #2, you need guaranteed isolation for your cloud resources. The fail-safe, but cumbersome method is to set up separate accounts for production and staging. There will be no risk of overlap, but you’ll have to remember to specify the AWS profile with every command you run. If you forget to provide the --profile option with your deployment command, and you didn’t realize that your AWS_PROFILE environment variable was set to the production profile, or maybe your local default under ~/.aws was the production profile, you won’t even realize right away that you just overwrote production. 😨

The more convenient approach is to rely on AWS CloudFormation. The AWS SAM framework ultimately builds and deploys one stack in CloudFormation. If you can ensure that all your resources that your application needs are defined in that one stack, then you can deploy any version of that stack under a different name to the same AWS account.

In this preferred approach, your deployment stages will each be their own CloudFormation stack.

Refining the Strategy

We could be done here. Install the SAM CLI, scaffold a project from one of the samples, drop in your code, fill in the config and deploy it to its own isolated stack. Except it’s not going to work with that minimal of an effort. Right away you’ll notice that all your resources got auto-named.

This is good, because CloudFormation stacks are not containers, they’re indexes — all the resources just get created in your AWS account the normal ways. The isolation happens because each resource gets a unique name and linkage to the stack.

This is bad, because your code at development and build time has no idea what the names of the resources in its own template are going to be at deployment time.

We have to get around the bad part by making our template intelligent. Let’s walk through a sample application lifecycle to figure this out.

Building a New Application

Start with the simple Hello World app that’s scaffolded for you by the AWS SAM CLI.

If you’re new to AWS SAM and you’d like to code along with me, a pre-requisite would be to follow the Getting Started section in the AWS SAM Developer Guide. Otherwise, just read along to grasp the gist of the strategy and then reference back once you’re in your own project trying to put this into action.

Install the AWS SAM CLI.

I’m using homebrew on macOS, the developer guide has instructions for installing on any platform.

$ brew tap aws/tap
$ brew install aws-sam-cli
$ sam --version
SAM CLI, version 0.53.0

Scaffold a project.

sam init

This is a guided initialization, so choose your preferred runtime, pick the the ‘hello-world’ template and name it whatever you want.

Add a Resource

Now you have a project you can build and test locally or deploy to AWS. This sample project has a single Lambda function with a REST API event to trigger it, but to illustrate the kinds of clever things we’re going to need to do in the template, I’m going to add a DynamoDB Table as a resource that the function will depend on.

This is the Resources section from the template.yaml file. Everything defined under Resources gets provisioned by CloudFormation as part of the stack. I just added MyTable of type AWS::DynamoDB::Table.

When this is deployed, CloudFormation isn’t going to just name it ‘MyTable’. It will be named {stack name}-MyTable-{random uniquifier}. Our code is going to need to know that name in order to connect to it with the AWS SDK. We can do this by introducing an environment variable.

Don’t be tempted by the TableName property on AWS::DynamoDB::Tablethat overrides the automatic naming. We want the unique name to be generated or else when we get to the end of this, all the different stacks would be updating the same DynamoDB table. In most cases this would be a problem.

Since the table name doesn’t exist until deploy time, we’re going to use a feature that CloudFormation provides directly in the template specs called Intrinsic Functions.

CloudFormation Template Intrinsic Functions Reference

These functions get evaluated to strings in your template by CloudFormation at deploy time. !Ref MyTable will tell CloudFormation to reference the other resource in my template that I named ‘MyTable’ and put its return value down as the value of the TABLE_NAME environment variable for the function.

The CloudFormation documentation lists every AWS Resource type and what the return values are from provisioning it. So to know for sure what we have to work with, we’ll go find the docs for the DynamoDB resource.

AWS::DynamoDB::Table

Scroll down to Return Values and the documentation will tell you what value is returned for Ref as well as what attributes you can retrieve using another intrinsic function calledFn::GetAtt.

Now we can put some code in our serverless function to scan that table by using the environment variable.

If you’re new to the AWS SAM project structure. Here’s enough of a primer to follow the code sample: The function definition in template.yaml has three important properties for the code:

CodeUri: hello_world/ — where to find the code
Handler: app.lambda_handler — what function should handle the Lambda invocation
Runtime: python3.8 — which runtime to use

Since I’m using Python, there is hello_world/app.py which defines a function called lambda_handler which will be called when the Lambda function is invoked.

I need to use the AWS SDK to scan the DynamoDB table, so for Python I can just add a requirements.txt to the CodeUri path with just a list of dependencies (boto3 plus whatever else becomes needed):

boto3

Use the SAM CLI to build and deploy these updates and the requirements.txt file will be passed to Python’s pip package installer and all the dependencies will be bundled in with the package for Lambda.

sam build
sam deploy --guided

You’ll need to run the deploy command with --guided the first time so that it will ask you questions to set the defaults for all the deployment parameters. Let it save them at the end and you won’t have to use that switch anymore.

Now you can invoke this in your browser to test it, but you need the URL of the API Gateway that was set up. You can go to the console and navigate to API Gateway and find the url somewhere in there, or notice those ‘Outputs’ in your terminal output? There was a section in the template for Outputs, and it uses all those intrinsic functions to reference all the resources that have been created and puts together the API URL to display for you at the end of the script run. You can configure whatever Outputs you need in the template, and they can be handy in cases like this where you just need the value in your terminal, or these outputs are easy to get for the AWS SDK and CLI and even other applications if you’re architecting something big.

Open the API url in your browser. It’s just a simple get path so it should just display the body that we set to be a dump of the DynamoDB scan response, but instead there’s an error. We’re going to need to review the logs. We can go into the console, or just use the SAM CLI.

sam logs — name HelloWorldFunction — stack-name my-sam-app

The problem is we added the DynamoDB Table resource to our app, and just expected the lambda function to be able to access it because they’re in the same stack. We need to have a policy attached to the role that runs the Lambda function so that it can read from the table.

The AWS::Serveless::Function takes a Policies property that will accept a list of IAM policies. There are templates we can use and pass parameters to, so the following YAML snippet will again make use of the Ref intrinsic function to pass the table name to a policy template that will give our function read-only access to the table.

Now save and rebuild and redeploy and then when you refresh the URL in your browser, you’ll see the JSON dump of a scan result that yields 0 items from our empty table.

Staging a New Deployment

Let me fast forward a bit now to get back to the point of this article. Let’s say you’ve implemented some functionality and this project is running live and stable in your AWS account. You’re ready to start work on another feature, but you need to protect what’s currently live. This is where Git becomes indispensable.

I’m going to highly recommend using Git Flow. It’s a branching model for Git that establishes that your main branch should only contain code that is stable and live. All work is done on feature branches and merged and tested on a ‘next release’ branch and then finally merged into the main branch when it becomes stable and live. It’s a semantic strategy, but it can be installed as a set of macros in your Git installation.

Git-flow cheat-sheet

Let’s say your code up to this point has just been incrementally committed to the main branch. There was no live and stable version before what’s currently implemented. Now you need to establish the Git Flow model so that what you work on next stays on a separate branch, and doesn’t get merged to master until it’s been tested and deployed live.

git flow init

This command will guide you through some questions for which you can just accept the defaults, and then the ‘next release’ branch (commonly called develop) will be checked out in your local workspace, so that any commits you make now will be on this new branch, and nothing will touch the main branch that corresponds to what’s live.

In most cases, you’ll make many commits, testing stuff locally as you bring forth the next release. Once you think you’re finished and ready for your QA team and stakeholders to evaluate it, you can create a release branch with git flow:

git flow release start 1.1

This will put your workspace on a branch named release/1.1 by default. The version number can be anything you want, I’m pretending that the version that’s live is ‘1.0’ and this is just a minor update for a new feature.

You can now start working on deploying it to AWS and making tweaks to the configuration or fixing bugs on this branch. Whatever you fix will be merged back in, and other work can still go forward on the regular branches. With git flow, you could even code and push a fix to the current version on production while the next version is being tested in staging, all from within one well-branched repository.

Up to this point you’ve just been running sam deploy to push everything to the cloud. It just uses your project name as the stack name and keeps updating the same stack. But now, others are depending on that stack to be available and stable, so you need a second stack to keep your new un-tested changes separate.

You can specify a new stack-name with the --stack-name switch:

sam deploy --stack-name my-sam-app-staging

But that looks like the kind of thing that most of us would forget to put at least once.

It would be ideal to have the default stack name be the one that isn’t live and stable.

In your project root, there’s a file called samconfig.toml where the defaults used for deployment are stored. In there, we can go ahead and change two of the parameters, so that the default deployment will go to our staging stack and we’ll have to specify overrides to do a production deployment. This will be much safer.

stack_name = my-sam-app-stagings3_prefix = my-sam-app-staging

The S3 prefix will be changed along with the stack name just to be on the safe side. The deployment process needs to upload all your code to S3 so that it can be deployed into the Lambda function, and this keeps your code from different stages in separate folders in the bucket even though this is only used at deployment time.

Now you can just run your normal sam deploy command you’ve been running, and it will deploy to the staging stack rather than overwriting production.

Identifying the Stage

At this point, you’ve got two separate stages in AWS: staging & production, represented as isolated stacks, each with it’s own DynamoDB Table resource and the code doesn’t know any different because it uses the Table it’s aware of from it’s environment variable.

But what if you’ve got external dependancies, like a third party API where you have different endpoints for production and staging. Or maybe there’s certain things in the code that need to behave differently depending on the stage. We still need to parameterize the deployment so that each stack knows which stack it is and has whatever extra environment variables it needs corresponding to the stage.

To solve this, we’ll use the Parameters section of the SAM template.

Parameters are directly from CloudFormation, and they’re meant to be applied to generalized templates to create or update stacks with specific configuration for individual resources.

Start with one parameter to define the deployment stage.

The default will be staging for safety. We want to have to remember to type stuff only to pull off a production deployment. The AllowedValues constrains the input for this parameter, and this is where we’ll define that we’re just going to have staging and production as our deployment stages.

Next, let’s put that DeploymentStagevalue in the environment variables so the code can use it if needed.

But there’s a potential flaw here: this counts on your code knowing the exact string name that you’ve given each deployment stage. What would be more ideal is simply a boolean that the code can check in order to know if it’s production or not. That’s usually all that matters.

To solve this, we’ll use the Mappings feature of CloudFormation templates, and set up an environment variable mapping to define a separate set of environment variable values for each deployment stage name.

In the environment variables for the Lambda function, I use the FindInMap intrinsic function to set the value for each applicable environment variable based on what I set up in the Mapping. This way, my function gets an environment variable with the name of the deployment stage, but also a few other variables with their own meaning that switch depending on the deployment stage.

Disambiguating from API Gateway Stages

There is one very confusing aspect to this strategy. The API Gateway is a resource in your stack, so each provisioned stack creates its own. If you go to the API Gateway area in the AWS console, you’ll see your production API Gateway and your staging API Gateway. This is fine, but when you navigate to either one of them, you’ll notice there’s Stages defined there too. Each API Gateway gets both a ‘Prod’ and a ‘Stage’ that are basically identical because they were both deployed at the same time as part of your last SAM deployment.

It’s confusing, because if you push another update to your production stack, the API gateway gets fully updated and both Prod and Stage still point to the same newest stuff. The ‘Stage’ stage in your staging stack is the same as the ‘Prod’ stage in your staging stack, and the Prod stage in your staging stack has nothing to do with anything in your production stack. 😕

The API Gateway uses stages as its own version control system. When you make a bunch of changes to an API Gateway (directly via the console or via the AWS CLI) you must publish your changes to a new or existing stage in order for them to accessible. If we weren’t using SAM, this API Gateway stage concept could maybe accomplish this versioning strategy, but there would be a lot more work involved,

To get rid of this confusion, we need to control the API Gateway stages and set it up so that our staging API Gateway only has the ‘Stage’ stage, and so that the production API Gateway only has the ‘Prod’ stage. This may still seem redundant, but when you’re glancing at a URL and you see the /Stage path, you know it’s the staging url. If you try to hit /Stage on the production stack, it would make a lot more sense if it didn’t work.

Where is the API Gateway defined, though? API Gateway is just another resource in the stack we’re creating, but it’s not in our template as a resource. Each function we define in our template has an Events property for specifying what triggers it, and Api is an event type. The path and method is defined there. There can be multiple functions in a serverless application, and each could have its own Api trigger with a different method and path, so SAM just implicitly creates the REST API because the methods and paths are all that’s really needed.

Except they’re not all that’s needed. There’s usually a few more details that need to be set up for the API Gateway, and the template specification has a whole section of settings under Globals for Api for you to set those. Globals is for applying certain settings to all resources of a type when you have more than one, but the API Gateway is the one resource type that I can think of almost no reason to have more than one of in a single application.

No need to rant, though. We can create an explicitly-defined API Gateway and reference it from each of those event triggers where the paths and methods are defined.

First, introduce another parameter:

Create the API as an additional resource in the stack:

Finally, set up the function event to reference this API rather than implicitly creating one

We only need to add the RestApiId and it will signal to SAM that we don’t need your implicit API, we’d like a little more control over it, thank you.

On more thing, though. If you try to deploy this now, you’ll get an error from a part of the template you’ve barely thought about — the Outputs section. Down there, where it was spitting out the URL of the API endpoint, it was using ${ServerlessRestApi} which is a special variable for the implicitly created REST API that we stopped relying on. It’s using the Sub intrinsic function to construct a string with interpolations, so reconstruct it with the resources and parameters we’ve since added.

There is one wrinkle: the SAM deployment never removes an API Gateway Stage. It only updates existing ones or publishes new ones. If you’ve been following along with the progression of the code in the article, you’ll still have to go in once and delete the ‘Prod’ stage from staging and the ‘Stage’ stage from production.

If you’re just reading through and haven’t done any interim deployments before setting up this API Gateway Stage, however, you’ll run into another weird little problem. If you deploy an app with SAM for the first time with ‘Stage’ as your only StageName, you’ll get an error that ‘Stage’ already exists. Implicit or not, this default API Gateway Stage behavior sticks around. Fortunately, many have grappled with this and there’s a lengthy explanatory thread on one of the Github issues. Long story short, it’s one of those super complicated things that they can’t fix without adverse effects on existing apps.

The workaround is to use the OpenApiVersion property in the globals, set to a recent value, to signal that your app is new and you would like the fixed behavior.

Now, if you deploy that (as the first deployment to a new stack), you’ll get just the one API Gateway Stage that you named in StageName. If you decide to change the name to something else, a new API Gateway Stage will be created and the old one will be left alone just how it was.

This could allow for an interesting advanced use case. We’ve lifted our staging/production concept up to the stack level, but we could use the API Gateway stage for something else.

If your application is going to be used by many clients and projects, and you need to publish an update that would require everyone to else to update their code, it would be considerate to push it out in such a way that your clients have some time to upgrade. Try using the version number as the deployment stage name. Granted, the code itself will be updated each time, but the Lambda handler function can act as a router. Each API Gateway stage is a different path on the url, and when a request comes into your function, you can check for which one the client used. If it’s not the newest one, pass the request through to some backwards-compatibility adapters that you’ll have to write and test. Clients who are lagging behind on an older version will hopefully not see anything break, but will still get important bug fixes. This is a pretty advanced use-case, but we’ve laid the foundation that makes it possible.

Deploying to Production 🚀

Now that we’ve established all this extra config for an isolated staging deployment, we need to push our updates to production, and it’s going to be a much longer command to override all the defaults.

sam deploy --stack-name my-sam-app --s3-prefix my-sam-app --parameter-overrides ParameterKey=DeploymentStage,ParameterValue=production ParameterKey=ApiStageName,ParameterValue=Prod

--stack-name my-sam-app — Our stack name for production is just going to be the clean app name with no suffix. That’s just my preference.
--s3-prefix my-sam-app — Remember we’re just going to keep this the same as the stack name for neatness in the S3 deployment bucket.
--parameter-overrides ... — Override all the parameters that default to staging values.

This is what we wanted. A simple sam deploy will take all the staging defaults, and won’t touch production, but all of these parameters will need to be set correctly to pull off a deployment to production.

Of course, the next thing any developer would do is put all of that in a shell script, but so long as it’s separate from all the sam commands that are frequently run during the development and staging cycles, it will be safe.

After you deploy everything to production and all is stable and your heart rate is back to normal, it’s time for one last command with git flow.

git flow release finish 1.1

This will merge all the changes into a new merge commit on your main branch. It will be the first commit on that branch since the last time you pushed to production, so it keeps with the rule that the main branch is only for the live and stable code that’s been released. Any changes that were done on the release branch are also merged back to the development branch.

Wrap-Up

There’s a lot of complicated configuration, but if you take a few steps back it’s all still a pretty simple strategy.

The SAM build and deployment produces an AWS CloudFormation stack, which can be considered an application root. We’re establishing one for each major stage: production and staging.

The Lambda function’s execution environment can be configured with variables to provide information per deployment stage to help the code.

The API Gateway in our stack has its own deployment stages, which is kind of confusing, so we cleaned that up.

Our code and configuration is all in one source code repo with the Git Flow branching model. We work locally on the development branch, then start a release branch and work on deploying it to AWS in the staging stack, and when it’s tested and ready to go live, we deploy to AWS in the production stack and finish off the release branch so that Git Flow merges our latest stable code back to the main branch.

I didn’t go into all the useful functionality the SAM CLI offers for local development, using Docker to recreate pieces of the AWS environment on your local development system. There are plenty of tutorials out there to get you started on building your project now that you’ve considered the AWS Serverless Application Model and this strategy for keeping your versioning sanity.