Let’s Encrypt is a service that offers free TLS (aka SSL) certificates. The certificates are recognized by all modern browsers. The only “disadvantage” of using Let’s Encrypt is that the certificates have to be renewed every few months but the process can be automated.

Depending on your environment, there are various ways to get initially setup with their certificates. You can get specific domain (e.g. www.example.com or staging.example.com) or wildcard (*.example.com) certificates. Visit the Let’s Encrypt website to understand all of your options.

Assuming that you already have a Google Cloud project, have setup Google Cloud provider credentials for Terraform and have bought a domain name, you can use the Let’s Encrypt Docker certbot image to get a wildcard certificate using the following process.

Note the assumption is that this is a new domain name which does not have an existing DNS setup. If you are migrating a domain name, you should read the Google Cloud documentation instead.

Terraform

Resource Code

First, create a new directory and then create a Terraform file like:

provider "google" {
  version = "~> 1.0"
  project = "${var.google_project}"
  region  = "${var.google_region}"
  zone    = "${var.google_zone}"
}

variable "google_project" {
  description = "The Google Cloud project to use"
}

variable "google_region" {
  description = "The Google Cloud region to use"
}

variable "google_zone" {
  description = "The Google Cloud zone to use"
}

variable "domain_name" {
  description = "The domain name to use"
}

resource "google_dns_managed_zone" "example_com" {
  name        = "example-com"
  dns_name    = "${var.domain_name}."
  description = "${var.domain_name} domain"
}

resource "google_project_iam_custom_role" "dns_owner" {
  role_id     = "dns_owner"
  title       = "DNS Owner"
  description = "Allows service account to manage DNS."

  permissions = [
    "dns.changes.create",
    "dns.changes.get",
    "dns.managedZones.list",
    "dns.resourceRecordSets.create",
    "dns.resourceRecordSets.delete",
    "dns.resourceRecordSets.list",
    "dns.resourceRecordSets.update",
  ]
}

resource "google_service_account" "letsencrypt_dns" {
  account_id   = "dns-letsencrypt"
  display_name = "Lets Encrypt DNS Service Account"
}

resource "google_project_iam_member" "project" {
  role   = "projects/${var.google_project}/roles/${google_project_iam_custom_role.dns_owner.role_id}"
  member = "serviceAccount:${google_service_account.letsencrypt_dns.email}"
}

resource "google_service_account_key" "letsencrypt_dns" {
  service_account_id = "${google_service_account.letsencrypt_dns.name}"
  public_key_type    = "TYPE_X509_PEM_FILE"
}

resource "local_file" "letsencrypt_credentials_json" {
  content  = "${google_service_account_key.letsencrypt_dns.private_key}"
  filename = "letsencrypt-credentials.json.base64"
}

The above config sets up the Google Cloud provider with a domain name, project, region, and zone via variables to be set later. It creates a DNS managed zone on Google Cloud. You may want to rename some of the resource names like example_com to your specific setup.

Note that for the dns_name, the value will need a trailing . (so the final value will be like example.com.).

The above config also creates a service account with a custom role which allows the service account to modify DNS records. Once the account is created, it will store the credentials in a local letsencrypt-credentials.json.base64 file.

Variable Config

Create a terraform.tfvars file to fill in the variables with your specific config.

google_project = "project-id"
google_zone    = "us-central1-a"
google_region  = "us-central1"
domain_name    = "example.com"

Plan and Apply

To do the one-time initial Terraform provider setup, run:

terraform init

Then to create a plan for creating the resources:

terraform plan -out=terraform.plan

You may want to inspect the output of terraform plan to understand what resources are being created.

Run the following when ready to create the resources:

terraform apply terraform.plan

Setup DNS Nameserver

You will need to have your domain registar use the Google Cloud DNS nameservers. After applying the Terraform config, you can go to the Google Cloud Console under Networks services > Cloud DNS. Find your domain name and get the DNS nameservers. Go to your domain registar and use all of the DNS nameservers (under the NS record like ns-cloud-b1.googledomains.com.).

You may have to wait a few minutes to a day for the nameserver change to propagate.

Let’s Encrypt Certbot in Docker

Running Let’s Encrypt Certbot in Docker, you can finally get and renew a wildcard certificate.

Make 2 directories for Let’s Encrypt config and logs.

mkdir -p certs/config
mkdir -p certs/logs

Base64 decode the service account credentials into a file, and move the file into the config directory.

cat letsencrypt-credentials.json.base64 | base64 -D > letsencrypt-credentials.json
mv letsencrypt-credentials.json certs/config/google-cloud-service-account-credentials.json

Then run the following replacing the <absolute path> with the actual absolute paths:

docker run -it --rm --name certbot -v "<absolute path to>/certs/config:/etc/letsencrypt" -v "<absolute path to>/certs/logs:/var/lib/letsencrypt" certbot/dns-google certonly --dns-google-credentials /etc/letsencrypt/google-cloud-service-account-credentials.json --server https://acme-v02.api.letsencrypt.org/directory

After running the command and answering a few questions, the certbot will use the service account to create a DNS entry to verify domain ownership. Then it will issue a wildcard certificate for your domain. The certificate files and credentials will be stored in your certs/config directory.

You can then re-run the certbot when it is time to renew the certificates. Be sure to keep (and backup) a copy of the certs/* directories to re-use them later.

Kubernetes Jobs are useful for one-off tasks. However, there are some problems when you have to define sidecar containers in your job spec. Primarily, the job’s pod will not terminate when the sidecar containers are still running. If your sidecar container is a logging agent or a proxy for other services, they usually do not terminate. Furthermore, the sidecar container must terminate with an exit code of 0 or else the job may restart.

One suggested solution is to have a script watch for the creation of a file on a shared volume. When the script detects a file, the script will terminate the container. For instance, here is a sample job spec which waits for a file to be created in a shared volume in a sidecar container:

apiVersion: v1
kind: ConfigMap
metadata:
  name: watchfile-config-map
  labels:
    name: watchfile-config-map
data:
  watchfile.sh: |-
    apk update && apk add inotify-tools
    echo "waiting for file..."
    file=/var/lib/sharedwatchfile/file.unlock
    while [ ! -f "$file" ]
    do
      inotifywait -qqt 10 -e create -e moved_to "$(dirname $file)"
    done
    echo "found file"
---
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  template:
    spec:
      containers:
      - name: db-migration
        image: <your job image>
        command: ["/bin/sh",
                  "-c",
                  "<run db migration script> && touch /var/lib/sharedwatchfile/file.unlock"]
        volumeMounts:
        - name: varlibsharedwatchfile
          mountPath: /var/lib/sharedwatchfile
      - name: cloudsql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.11
        command: ["/bin/sh",
                  "-c",
                  "/cloud_sql_proxy -instances=<your db instance>=tcp:5432 -credential_file=/secrets/cloudsql/credentials.json & /bin/sh /var/lib/watchfile/watchfile.sh"]
        volumeMounts:
        - name: cloudsql-instance-credentials
          mountPath: /secrets/cloudsql
          readOnly: true
        - name: varlibwatchfile
          mountPath: /var/lib/watchfile
          readOnly: true
        - name: varlibsharedwatchfile
          mountPath: /var/lib/sharedwatchfile
          readOnly: true
      volumes:
      - name: cloudsql-instance-credentials
        secret:
          secretName: sql-kubernetes-proxy-credentials
      - name: varlibwatchfile
        configMap:
          name: watchfile-config-map
          items:
            - key: watchfile.sh
              path: watchfile.sh
      - name: varlibsharedwatchfile
      restartPolicy: Never
  backoffLimit: 4

The usage of inotifywait is to be a bit more efficient than just using sleep. In the above example, instead of modifying an existing image, the commands to run the sidecar container are slightly modified and a script is mounted via a volume to the sidecar container.

While watching for a file to be created is not exactly ideal, it is a quick workable hack until a general solution is available.

When using Terraform, I find that storing state remotely has great benefits. If you work with others or on multiple machines, remote state allows re-using Terraform defined infrastructure without copying the state manually to all other users. More importantly, it allows a “core” set of resources to be defined and owned by one project while the root level output resources are re-usable in other related Terraform projects.

To store state remotely, add a backend to store the state such as:

terraform {
  backend "s3" {
    bucket = "<your bucket name>"
    key = "default"
    region = "us-east-1"
  }
}

Then you need to run terraform init after adding the backend to your Terraform config.

To import remote state (say you have a core infrastructure Terraform project), add another resource to import:

data "terraform_remote_state" "core_infrastructure" {
  backend = "s3"
  workspace = "${terraform.workspace}"
  config {
    bucket = "<bucket with state to import>"
    key = "default"
    region = "us-east-1"
  }
}

The core infrastructure that I generally have are definitions for DNS zones (so related projects can import the DNS managed zone identifier and create subdomains), wildcard SSL certificates for test domains, and general repository definitions for where the code is stored.

If you have multiple users, you will need to look into remote state locking solutions as well with your backends,