Simple, cheap blog hosting on AWS with Terraform

Posted on Aug 17th, 2018

There are something like over a million ways to host a blog these days and for a personal blog you should choose something that’s easy to maintain and cost effective. This post covers how to get cheap hosting for a statically generated blog (or really any simple website) using Jekyll and Terraform on AWS.

This is a very manual route to take and my motivation for doing this was really just to play around with AWS and Terraform and focus on coming up with a clean solution and workflow for my own blog. I hope that there’s something in this for everyone though, whether that be a mini-tutorial on Terraform, a gentle introduction to AWS, or solution for hosting your own blog.

Requirements

You will need some things to complete this tutorial.

You’ll need an AWS Account.
Install and configure the AWS Command Line Interface.
Install Terraform
Install Jekyll

As far as the AWS account, you may need a credit card to sign up, but it won’t cost you much, if anything, for this tutorial because everything should be covered by the Free Tier for the first year.

Get a locally running blog

Before we worry about hosting something publicly with a bunch of infrastructure, let’s get a Jekyll blog running locally:

jekyll new my-blog
cd my-blog
bundle exec jekyll serve

You should now be able to go to http://localhost:4000 and see the default Jekyll blog.

You can explore as much as you want at this point, but make sure you have a working blog before you move on.

Infrastructure Architecture

Let’s talk for a minute about what kind of infrastructure concerns we have to address in order to host a website that only contains static content like HTML, JavaScript, and images.

At a minimum we need:

Storage (i.e. what holds our content)
Serving Content (i.e. giving it to people that ask for it)

In addition, there are the following nice to haves:

Custom Domain Name (e.g. dasnewman.com instead of xya.us-east-1.aws.com)
SSL Termination (i.e. ability to use SSL certificates to proove we are who we say we are)
Scalabilty (i.e. allowing just about the entire world to access our content without the system exploding)

We can accomplish all of the above with traditional infrastructure. It’s just a matter of getting the right number of servers, configuring them by installing software, and ensuring that we can monitor them and keep them up to date. Anyone that’s done this for a living knows how much work that can be, especially for a large site. So why not get someone else to do all of that heavy lifting for us?

The solution, in a word, is “serverless”. Now, let’s not get confused here. Serverless doesn’t imply some kind of magic. It just means that someone/something else is managing those servers and you don’t have to care a whole lot about them; e.g. what OS they are running, how many there are, etc.

OK, so let’s map our needs above to the resources we can leverage from Amazon Web Services. Chances are, you can find an exact analog in any major cloud provider like Azure, Google Cloud Platform, etc.

Storage: Simple Storage System (S3)
Serving Content: S3 (to store) and CloudFront (to serve)
Custom Domain Name: Route53 (to register and host our DNS entries)
SSL Termination: Certificate Manager (to generate a signed SSL Certificate) and CloudFront (to serve the certificate to our clients)
Scalability: CloudFront and S3

Now that we know what we need to use we can talk about how we are going to set it all up. There are a few viable options.

We can just go into the AWS web console or use the CLI to enter all the commands and options. Nope. Not going to do that.
Use Cloud Formation templates. This works and is a good option, but none of the knowledge transfers to other cloud providers. It’s AWS specific.
Use Terraform. This provides us with some low level abstractions, is a lot easier to read/understand, and at least some of the knowledge we pick up by using it applies to other cloud providers.

There are of course, more options than the above. What it comes down to though is you should select something that:

Minimizes manual effort.
Is easy to read/understand.
Is something we can use in a lot of future projects.

Alrighty, so the next few sections will demonstrate how we can setup and use Terraform for our blog hosting.

Setting up terraform

Assuming you have terraform installed already (if you don’t, then please go do that now) you should create a directory like the following:

mkdir my-blog-terraform

Next, we want to create what terraform calls our “main module”. It’s just a text file named main.tf and it’s primarily responsible for setting up our AWS Provider. Open up a file named my-blog-terraform/main.tf and add the following contents:

provider "aws" {
  profile = "personal"
  region = "us-east-1"
}

The above tells terraform “when you need to manage resources for this module, you need to do it be interacting with AWS, and all the credentials you need to do that are in an AWS profile named ‘personal’, and oh yeah, unless I say otherwise all the resources are in the us-east-1 region.”

That’s a lot for such a little bit of text, but that’s also why technologies like terraform are so powerful. You can do a lot of work with a little bit of typing.

Now, you may not have a profile configured yet for AWS. To do that you need to make sure that the AWS CLI is installed and create a named profile. I chose to use the name personal, but you can use whatever you want. If you DON’T use personal though, make sure you change the code above and throughout the guide to reflect that.

To validate the above is working so far you can do:

cd my-blog-terraform
terraform init
terraform plan

What does this do? Well, first we make sure we are in the my-blog-terraform directory. Then we are asking terraform to initialize any other modules or dependencies we may have introduced. In this case, it’s going to just be downloading the AWS provider itself. Finally, we run terraform plan which reaches out to various AWS APIs to see if the resources our module is declaring exist or not. At this point, we haven’t declared any resources and so the plan should be empty and report that there is no work to be done.

Using a third-party Module

As with programming in general, you typically have the choice to either build something yourself or leverage an existing component/framework/library. In most cases, you are going to always be better off with the existing component; standing on the shoulders of giants as it were.

Terraform has a concept called modules that allows for this type of reuse. Open your main.tf file again and add the following:

module "terrablog" {
  source = "davidnewman/terrablog/aws"
  version = "0.1.0"

  site_bucket_name = "my-blog"
  domain_name = "my-blog.com"
}

This is telling terraform “fetch a third-party module from the community repository named ‘davidnewman/terrablog/aws’, but make sure it’s version 0.1.0. Once you have it go ahead and create any resources that it defines, but use these input variables to customize it a bit.”

Again, quite the mouthful. We’ll unpack what the module is doing, but it’s important that you change the site_bucket_name and domain_name variables to something you have chosen for your blog. For domain_name, you will likely have to register one with Route53 in order to use the module.

Similar to before, run the following commands once you are ready:

cd my-blog-terraform
terraform init
terraform plan

This time, you should see a whole bunch of output when you run the terraform plan command. This is because terraform has looked at what you’ve got in AWS and decided that there’s a whole bunch of resources that the module wants that don’t yet exist. What it’s showing you is the difference between the set of resources you want (i.e. the declarations in the module) and the current state (i.e. what already exists in AWS).

State is a pretty big topic and we’ll come back to that in a bit. For now, just know that terraform is tracking what is in your AWS environment and is comparing that to the resources your modules declare in order to figure out what it has to do, if anything.

Let’s assume that everything looks good in the plan, you should then be able to do the following:

terraform apply

This is going to tell terraform that it’s time to make our AWS account match the plan. I.e. missing resources will be created, resources with differences will be updated to match the plan, etc. This is effectively going to modify the state terraform is aware of and so running terraform plan again should show that there is no longer any work to do.

At this point, assuming it’s all gone well, you should be able to load your site. You probably don’t have any content there, but that’s just a matter of uploading your content to the S3 bucket that was created.

Uploading some content to your site

If your Jekyll site is ready to be published, you should have a directory like: my-blog/_site. This is the static site that was generated based on your Jekyll template, posts, etc. To upload this to your S3 bucket you can use the aws command as following:

aws s3 cp ./my-blog/_site/ s3://bucket-name/ --recursive --profile personal

This command will copy all of the content for your generated site into the S3 bucket. Of course, if you aren’t ready to publish your site just go ahead and put up an image or a simple HTML page. We just want to make sure everything is working at this point.

What is the terrablog module doing?

I can’t possibly fit an entire tutorial on terraform into this single post, but let’s take a look at some of the resources being created for us by the terrablog module. If you’re interested in seeing the code for the full module, you can see it on github. For a more deep tutorial on terraform, I recommend at least reading through some of the terraform guides.

For now, let’s take a look at one of the key resources, the S3 bucket that stores our conent:

# This creates the s3 bucket and configures it in a way to host
# a static website.
resource "aws_s3_bucket" "website" {
  bucket = "${var.site_bucket_name}"
  acl    = "public-read"
  policy = "${data.aws_iam_policy_document.bucket_policy.json}"

  website {
    index_document = "index.html"
    error_document = "404/index.html"
  }
}

As you can see, terraforms syntax is highly declaritive i.e. you are describing how you want something to look. Terraform reads your declaration and then figures out how to make your AWS resources look that way.

In this case, we see that we are creating a resource of type aws_s3_bucket and we are going to refer to it later on in our module as website.

One thing you will notice about resources in terraform is that their types are named in such a way that they tell terraform what provider they are implemented by and what type of thing they are. For example, aws_s3_bucket is telling terraform that the aws provider is responsible for implementing an s3_bucket. In theory, some other provider could having something that’s also called s3_bucket and we don’t want it to get confused.

Inside for the declaration of the aws_s3_bucket we are setting some attributes that further define how we want the bucket to behave. For example will give the bucket a name, an Access Control List (acl), IAM policy, etc.

Another question you may be asking is: “What’s the deal with that crazy syntax on the right-hand-side of the attributes?”. Fair question. This is terraform’s interpolation syntax and it’s how we can make our resource declarations dynamic using variables and the output for other resources we are creating.

What terraform will do is take a look at all of our variables and references between resources and determine a sequence in which the resources must be created. I.e. you can form dependencies between resources. For example:

resource "aws_cloudfront_distribution" "cdn" {
  origin {
    domain_name = "${aws_s3_bucket.website.website_endpoint}"
    origin_id = "${var.site_bucket_name}"

The above is a snippet of the resource declaration for our CloudFront distribution. You can see that the domain_name attribute relies on some output from the aws_s3_bucket.website resource we created earlier. Terraform is smart enough to see this relationship and ensure that the S3 bucket is created before it attempts to create the CloudFront distribution.

So if I can’t give a good tutorial of terraform in this post, what’s the point of me showing you any of this at all? The point is just to show how much work you can get “for free” by leveraging third party modules like terrablog.

Summary

Using simple, off-the-shelf tools like Jekyll and terraform you can host static sites easily and cheaply at scale. Does your blog get the kind of traffic that warrants this? I know mine certainly does not. In fact, even if it does get that kind of traffic, there are certainly hosting options out there for you.

For me, the act of hosting my blog this was came down to a few things:

I use terraform all the time at work for “real stuff” at scale and I wanted to leverage that skill in my personal life.
I had not yet created a module for people outside of the company, and so I wanted to experience that process.
I think it’s kinda cool.

Mabye these reasons jive with you, and maybe they don’t. It was fun for me though :)