Open Firehawk

Open Firehawk is an environment to create an on demand render farm for VFX with infrastructure as code. It uses Terraform to orchestrate resources, Ansible to configure resources, and Vagrant (with Virtualbox) as a VM container for these tools to run within. A Linux or Mac OS host for the VM’s is recommended at this time. Terraform is able to interface with many cloud providers, current base implementation is with AWS. It does use resources that have costs for their use, the types of resources chosen are based off the ones that were most cost effective for my use case. PR’s for other resource options are welcome!

Intro

We document steps you can follow for replication of Firehawk in another environment.

Some of this documentation will share what you will need to learn if you are a TD / Pipeline TD new to running cloud resources. I’d recommend learning Terraform and Ansible. I recommend passively putting these tutorials on without necesarily following the steps to just expose yourself to the concepts and get an overview. Going through the steps yourself is better.

These are some good paid video courses to try which I have taken on my own learning path-

Pluralsight:

Udemy:

Books:

Disclaimer: Running your own AWS account.

You are going to be managing these resources from an AWS account and you are solely responsible for the costs incurred, and for your own education in managing these resources responsibly. If new to AWS, tread slowly to understand AWS charges. The information I provide here is not perfect, but shared in a best effort to help others get started.

Getting Started

You will need two AWS Accounts. One for the dev environment and one for the production environment. When operating, we make changes to the dev branch/environment and test before we update the production environment. Some exceptions during a deployment may mean changes unique to the production environment have to be done on the fly, and when they occur we merge those changes back to dev.

With each of the accounts:

Best practices for security and best practice around secrets management are important. Feel free to notify us if you observe security implementation that could be improved.

Permissions for the new user

Firehawk automates creation of some user accounts, instances, images, VPN, NAS storage and others. An AWS user with appropriate permissions to create all these resources must be manually created for this to be possible.
We will define the permissions for this new user (in each of the accounts). Later we will generate secret keys that will be stored in an encrypted file to create resources with Terraform and Ansible that rely on these permissions.

AmazonEC2FullAccess
IAMFullAccess
AmazonS3FullAccess
AmazonECS_FullAccess
AmazonRoute53FullAccess
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ce:*"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

AWS Images

Some images are used with a cost associated. Firehawk is not paid to recommend these, they are used because they are at the time of writing believed to be the most economical, fairly scalable, and automated choices available with good support. Not all of these images are open source themselves, but can be replaced.

Subscribe to these Images (AMIs), which will allow them to be used with automation after you have agreed to their terms.

If you are new to AWS, experiment with launching these instances, and destroying them and their security groups.

Keybase / PGP Keys

Install Keybase on your phone or PC - head to keybase.io to create an account. Keybase is the easiest way to create a secure PGP key, allowing secrets to be encrypted using your email as a reference to a public key. Only devices authorised with your private key that have been authorised can decrypt secrets, and you can easily initialise new devices from your phone. It is possible to use your own key if you don’t wish to use keybase.
Terraform uses PGP encryption when creating new aws users with AWS Secret keys. PGP encryption ensures that the shell output is not readable by anyone except someone authorised with the PGP key. Terraform requires this ability to create users with permissions to automate a remote system to have access to S3 Cloud Storage. Those systems have ability to write, read and list contents of bucket storage, unlike the admin account whcih can do far more. The difference is the admin account credentials should only reside on the ansible_control VM.

Vagrant

Vagrant is a tool that manages your initial VM configuration onsite. It allows us to create a consistent environment to launch our infrastructure from with Ruby files that define the VMs. We create two VMs, ansiblecontrol and firehawkgateway. Ansible control is where terraform and ansible provision outwards from. It is where the secrets and keys need to reside. Firehawk Gateway will be configured as a VPN gateway and it will have the deadline DB and Deadline Remote Connection Server (RCS).

Vagrant workstation for the dev environment

When doing test deployments, we use seperate VM’s from production. This Vagrant VM creates a CentOS 7 VM with a Gnome GUI. To isolate your workstation from testing, it is recommended that you use this VM here to simulate an isolated a workstation in a dev environment. This protects your actual workstation from testing failed deplopyments that would affect productivity. In the production environment, you would replace any IP adresses and ssh keys / passwords with those used for your actual workstation.

Vagrant up

This login information will be entered into your encrypted secrets file in later steps, and is only temporarily used until the login is replaced with an ssh key for the deployuser (which will also be created automatically). Once the ssh key is configured by Firehawk the password wont be usable for ssh access anymore. Passwords are not recommend to be allowed for continued SSH access in a firehawk deployment.

Thinkbox Usage Based Licensing

To use Deadline in AWS, instances that reside in AWS are free. But any onsite systems that render will require a licence. If you wish to use any other UBL licenses (eg houdini Engine) they will also
require your Thinkbox UBL URL and UBL activation code. These are entered in your encrypted secrets file, and are used to configure the Deadline DB upon install automatically.

License servers

License servers should be configured on your network to issue any floating licenses for software you require. The VPN gateway and routes configured should allow a cloud based system to access the license server at the environment variable TF_VAR_houdini_license_server_address in secrets/config. It is also possible to use deadline Usage Based Licenses for render nodes to use licenses on a per hour basis (eg. Houdini Engine, Mantra)

Side Effects API OAuth2 keys

If you intend to use Houdini, Firehawk uses Side FX provided keys to query and download the latest daily and produciton builds from sidefx.com. It will query the current version, download it, install it and also preserve that installer in S3 cloud storage enabling you to lock infrastructure to a particular installation version if needed.

Replicate a Firehawk clone and manage your secrets repository

WARNING: NOT MAKING THE REPOSITORY PRIVATE IS A SECURITY RISK.

This provides a structure for your encrypted secrets and configuration, which exist outside of the firehawk submodule. The firehawk submodule is a public submodule, and it can exist as a fork or a clone. This allows the code to be shared while keeping configuration and secrets seperate.

Configuration

These steps allow us to configure a setup in the ‘dev’ environment to test before you can deploy in the ‘prod’ environment, in a seperate folder.
You will have two versions of your infrastructure, we make changes in dev branches and test them before merging and deploying to production.

WARNING: Never commit unencrypted secrets into a repository. You can also read here to remove data from a repository.

Saving costs with sleep

When we deploy to cloud above, we specify if we want to keep the Storage EBS volumes or not. Specify this to be explicit with what you want to happen to those volumes when you put the deployment to sleep.

Destroying the deployment

Destroying resources manually

In your AWS console you should check regularly for any resources that are running that shouldn’t be. If you need to destroy resources manually here are particular area to pay attention to-

Also observe your daily cost graphs to identify any resources you may not have caught.

Production

Security

Security isn’t a state that you should believe you have reached, but a process that requires continuous evaluation. It also results from effort that should be proportional to the value that you represent as risk and effort vs reward to an attacker. An AWS account is quite a prize, because it can be used to mine crypto or perform other compute on. An attacker could also use it to do harm by accessing client Intellectual Property or racking up a large bill for you. So the steps taken should be proportional to the value of the work you are performing, and as much should be done as reasonably possible. If you observe a security concern, contact andrew@firehawkvfx.com.

The hosts your VM’s reside on, shouldn’t be exposed to website browsing patterns from other users, or sitting exposed on the public internet (they should be behind a NAT gateway- normal for any system at home connected to the internet). If possible, ensure those systems are on a different subnet to other devices you don’t have control over on your network (Guest wifi, non work related systems).

Ideally, if you wanted to step up security further, there could be entirely seperate systems (bare metal) dedicated for the unique purpose of Firehawk provisioning and the VPN.
When not for shared use this is a good step to take. You can also disable SSH access to the host running Ansible on bare metal for further protection. We have taken steps to make sure that ansible and terraform provisioning occurs on a unique vm to where the VPN and Deadline DB reside. We could go further and put each of those (Deadline and VPN) on their own seperate metal. Bare metal for a single purpose is more secure than a VM because if a hypervisor is compromised everything else on that system can be compromised.

We should be as difficult a target as reasonably possible, and we should have means to deactivate a vulnerability that might be actively used by an attacker.

For example:

It’s also important that your router firmware is kept up to date (consider a regular reminder). It is a significant potential vulnerability between you and AWS - your router. Open VPN encrypts traffic before it goes through the router, but if the router is compromised, enough information to establish those credentials can be gained for a man in the middle attack.

We configure AWS to ignore all inbound communication for SSH or other ports to instances from anywhere but your own IP at the time of provisioning. You may encounter difficulty without a static IP, although it is possible to update security groups with a change to your IP on each Terraform apply. Your secret keys if aquired could be used by an attacker to alter resources. Provided you have a Static IP, you can alter policies for your AWS remote access account to deny access from anywhere but your own static IP.

Pointers on cost awareness:

Initially run small tests and get an understanding of costs that never use more than say 100GB of storage, and that can be produced on light 2 core instances. Cost managment in AWS is not without effort, and you usually should allow a day before you can see a break down of what happenned (though its possible to implement more aggressive cost analysis with developement).

How To Create a Hosted Zone

If you want to be able to access your vpn and other resources through a domain like the vpn, eg vpn.example.com you can create a public hosted zone in Route53. Since this will be a permanent part of your infrastructure you will need to do this manually. You can either transfer an existing domain to aws (not recommended for dev if you are using this domain in production, best to place that in the production account!) or you can purchase a new domain of some random name with a cheap extension (doesn’t need to be .com, there are plenty of cheap alternatives)

Replacing resources with Terraform - taint

If you make changes to your infrastructure that you want to recover from, a simple way to replace resources is destroying them with something like this…

S3 Bucket Cloud Storage size

You can keep tabs on an S3 bucket’s size with this command,
aws s3 ls s3://bucket_name --recursive | grep -v -E “(Bucket: |Prefix: |LastWriteTime|^$|–)” | awk ‘BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}’