Triforge: I Deployed an AI Model Without Clicking a Single Button

[!WARNING] The AWS SageMaker instance backing this demo has been intentionally shut down to control infrastructure costs.
As a result, some endpoints may return 500 errors or fail to respond.
The case study, architecture, infrastructure code, and deployment workflow remain fully valid.

“I was done with ClickOps. I wanted infrastructure I could rebuild, version, and trust.”

Live Demo: chat.elijahu.me | Code: git.new/triforge | Model: Helsinki-NLP/opus-mt-en-es

The Problem#

I have used the AWS console. I have clicked through SageMaker’s UI. I have filled in notebooks, configured endpoints through dropdowns, forgotten what I did two weeks later, and had to start all over again.

Every time I wanted to reproduce something I had already built, I was clicking through menus trying to remember which instance type, which container image, and which IAM role I used. There was no source of truth. No way to version it. No clean way to hand it to someone else and say, “Run this.” Just vibes, muscle memory, and the sinking feeling that I was about to waste another afternoon doing the same thing again.

I was sick of it.

Triforge is my answer to that.

Three IaC tools. One AI model. Zero GUI. Everything reproducible from a single git clone.

What Is Triforge?#

A multi-tool IaC pipeline that deploys an English-to-Spanish translation AI inference endpoint on AWS using Pulumi, Terraform, and OpenTofu — each owning a specific layer of the infrastructure. No tool overlap. No spaghetti. Surgical separation of concerns.

Architecture:

User → chat.elijahu.me (HTTPS) → nginx → Flask proxy
Flask → AWS Secrets Manager (gets endpoint name)
Flask → SageMaker endpoint (sends text, gets translation)
State → S3 bucket (remote-state-h1)

Tool split:

Pulumi (Python) → SageMaker endpoint + IAM + Secrets Manager
Terraform (HCL) → EC2 instance + Security Groups + Elastic IP
OpenTofu (HCL) → S3 remote state backend
Flask (Python) → Proxy server bridging the UI and SageMaker

A Note on the Model (and Why the Endpoint is Still Called “Whisper”)#

I started this project planning to use OpenAI Whisper — a speech-to-text model. It was a bad idea. Whisper on a CPU instance has the latency of a man thinking very hard about a question he doesn’t know the answer to. Every request felt like it aged me slightly. I swapped it out for Helsinki-NLP/opus-mt-en-es, a translation model from the University of Helsinki’s language technology research group. Same HuggingFace inference container, different task, dramatically faster on ml.m5.large.

The endpoint is still named triforge-whisper-endpoint in the code because renaming it mid-project would have broken the Secrets Manager reference and I had more important things to fix. It haunts me. I have made peace with it.

Genuine credit to the Helsinki-NLP team — their opus-mt models are compact, fast, and do exactly what they say on the tin. The HuggingFace model card is actually readable, which is more than I can say for most ML documentation.

Why Three Tools?#

Because everyone picks one and calls it a day. I wanted to understand the philosophy behind each one, not just the syntax.

Terraform is the mother of IaC. Battle-tested, massive community, HCL is readable by people who have never touched it before. It owns the EC2 because spinning up a server is exactly the kind of standard, declarative task HCL was designed for. Boring in the best way possible. The Terraform AWS provider docs are actually decent — I used them constantly for the EC2 and networking sections.

OpenTofu is the interesting backstory. When HashiCorp changed their licensing in 2023, the community forked Terraform and OpenTofu was born. Functionally identical, free as in freedom. I gave it the S3 backend because someone has to manage the state bucket and it felt right to let the FOSS fork hold everything together. There is something poetic about that. opentofu.org if you want the full story.

Pulumi is the one I actually enjoyed. It throws HCL out entirely and lets you write real Python — real loops, real conditionals, real functions. No more count hacks or for_each workarounds that make you question why you chose this career. SageMaker has a lot of moving parts and I did not want to fight HCL to express conditional logic. The Pulumi AWS SageMaker docs are good — clear examples, well structured, nothing like the AWS native docs. If you know Python you can pick Pulumi up in a weekend. The AWS resource knowledge is the same across all three tools anyway.

Installing the Tools#

I am on Arch. My installs look different from yours. The concepts are identical.

Pulumi — the AUR package was outdated so I went straight to the official install script:

curl -fsSL https://get.pulumi.com | sh
export PATH="$HOME/.pulumi/bin:$PATH"

My internet was being absolute dogshit during this. I spent time in the AUR hole before I gave up and just used the script. Learn from my suffering.

OpenTofu:

sudo pacman -Syu opentofu

Ubuntu users: use your package manager or pull the official binary. If your network is also terrible:

tar -xzf opentofu_*_linux_*.tar.gz
sudo mv opentofu /usr/local/bin/
sudo chmod +x /usr/local/bin/opentofu

Extract and enjoy.

Terraform:

sudo pacman -Syu terraform
terraform version

I aliased terraform to tf and opentofu to tofu in my zsh config because typing the full name every time is a crime against productivity. Dotfiles are on my GitHub if you want to steal them.

Remote State First — The Chicken and Egg Problem#

Before writing a single resource I bootstrapped the S3 bucket with the AWS CLI. You cannot use IaC to create the thing that stores IaC state. I worked this out myself through pure logic and it clicked much harder than if I had just read it somewhere, so I am telling you directly: bootstrap the bucket first, manually, before you touch any of the tools. Do not skip this step.

aws s3api create-bucket --bucket remote-state-h1 --acl private --region us-east-1

Then point Pulumi at it:

pulumi login 's3://remote-state-h1?region=us-east-1'

The single quotes are because zsh interprets ? as a glob pattern without them. Bash users can drop the quotes.

Building the SageMaker Endpoint with Pulumi#

Three resources chain together in order:

aws.sagemaker.Model → aws.sagemaker.EndpointConfiguration → aws.sagemaker.Endpoint

The reason I used native Pulumi resources instead of the SageMaker Python SDK directly: state. If you create infrastructure with the SDK, Pulumi never tracks it. Run pulumi destroy later and that endpoint just sits there on AWS billing you while you wonder what went wrong. Pulumi can only destroy what it created through Pulumi. Do not mix the two.

Finding the container URI was genuinely the most annoying part of this entire project. The AWS SageMaker docs for available container images are one of the worst reading experiences I have had in a while — tables within tables, outdated URIs, no clear way to filter by framework or task. I eventually stopped fighting the documentation and just queried ECR directly:

aws ecr list-images \
  --registry-id 763104351884 \
  --repository-name huggingface-pytorch-inference \
  --region us-east-1 \
  --query 'imageIds[*].imageTag' \
  --output text | tr '\t' '\n' | grep cpu | tail -20

Got the tag I needed, built the URI:

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04-v1.4

Learn the CLI. Use it before you reach for the AWS docs.

A pattern I noticed while writing Pulumi code that nobody told me:

Every Pulumi AWS resource follows the same sandwich structure. The bread is always aws.sagemaker.WhateverResource(...). The filling — the nested config object inside it — always borrows the parent’s name and adds Args at the end. Model → ModelPrimaryContainerArgs. EndpointConfiguration → EndpointConfigurationProductionVariantArgs. You never have to guess. Just look at what you’re inside and copy the parent name.

Once I figured this out I stopped tabbing through autocomplete looking confused and actually started writing code at a normal speed.

The complete Pulumi code:

import json
import pulumi
import pulumi_aws as aws

trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": { "Service": "sagemaker.amazonaws.com" },
            "Action": "sts:AssumeRole"
        }
    ]
}

sagemaker_role = aws.iam.Role(
    "sagemakerRole",
    assume_role_policy=json.dumps(trust_policy),
    description="Role assumed by SageMaker for inference"
)

aws.iam.RolePolicyAttachment(
    "sagemakerManagedAttach",
    role=sagemaker_role.name,
    policy_arn="arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
)

sagemaker_model = aws.sagemaker.Model(
    "sagemakermodel",
    execution_role_arn=sagemaker_role.arn,
    primary_container=aws.sagemaker.ModelPrimaryContainerArgs(
        image="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04-v1.4",
        environment={
            "HF_MODEL_ID": "Helsinki-NLP/opus-mt-en-es",
            "HF_TASK": "translation",
        }
    )
)

sagemaker_endpoint_configuration = aws.sagemaker.EndpointConfiguration(
    "sagemakerendpointconfiguration",
    production_variants=[aws.sagemaker.EndpointConfigurationProductionVariantArgs(
        instance_type="ml.m5.large",
        model_name=sagemaker_model.name,
        variant_name="default",
        initial_instance_count=1,
    )]
)

sagemaker_endpoint = aws.sagemaker.Endpoint(
    "sagemakerendpoint",
    name="triforge-whisper-endpoint",
    endpoint_config_name=sagemaker_endpoint_configuration.name
)

secret_resource = aws.secretsmanager.Secret("secretResource",
    description="SageMaker Endpoint name",
    name="triforge-endpoint-secret",
    recovery_window_in_days=0,
)

aws.secretsmanager.SecretVersion("secretVersionResource",
    secret_id=secret_resource.id,
    secret_string=sagemaker_endpoint.name,
)

pulumi.export("endpoint", sagemaker_endpoint.name)
pulumi.export("role_arn", sagemaker_role.arn)

I attached AmazonSageMakerFullAccess because this is a PoC and I know exactly what that means. In production you scope this down to the minimum permissions the role actually needs. I am not going to pretend I did not take the lazy route here.

pulumi up result:

pulumi up — 8 resources created in 4 minutes

pulumi destroy — 8 resources deleted in 30 seconds

8 resources up in 4 minutes. 8 resources destroyed in 30 seconds. That is the feeling I was chasing the entire time I was clicking through the AWS console.

Building the EC2 with Terraform#

Standard stuff, but with one addition that matters — an Elastic IP. Without it, every terraform apply gives the EC2 a fresh public IP. That means updating GitHub secrets, updating DNS, updating your own mental model every time you redeploy. Elastic IP locks the address permanently. Set this up from the start, not after you have already updated your DNS records three times.

Quick note on ingress and egress since people mix these up:

Ingress → traffic coming into your server. HTTP, HTTPS, SSH. You are the one knocking.
Egress → traffic going out of your server. The server is making calls to external things.

On the 0.0.0.0/0 on SSH — yes I know. PoC. In production, lock SSH to your actual IP. Port 22 open to the entire internet is not a flex.

The EC2 gets an IAM instance profile with Secrets Manager and SageMaker permissions. No credentials hardcoded anywhere. The role handles it completely.

Handing the S3 Bucket to OpenTofu#

The bucket existed already from the CLI bootstrap. OpenTofu needed to take ownership without recreating it — that is what tofu import is for:

resource "aws_s3_bucket" "name" {}

import {
  to = aws_s3_bucket.name
  id = "remote-state-h1"
}

tofu init && tofu apply

Output: 1 imported, 0 added. OpenTofu owns the bucket. Terraform and OpenTofu each write their state to different keys inside it — compute/terraform.tfstate and storage/terraform.tfstate. One bucket, clean separation.

The Flask Proxy#

Static HTML cannot call AWS Secrets Manager. Flask sits in between — reads the SageMaker endpoint name from Secrets Manager, forwards translation requests, returns results. nginx proxies /invoke to Flask and serves everything else as static files.

The thing I will not forget: do not cache the secret. Every pulumi destroy and pulumi up can rotate the underlying endpoint even if the name stays the same. If you cache the old value you get 500 errors until you SSH in and restart Flask manually. Reading fresh from Secrets Manager on every request adds about 50ms. That is a completely acceptable trade to avoid debugging an error that has nothing to do with your actual application.

The CI/CD Pipeline#

triforge.yml runs on every push to main. Four jobs:

pre-environment-setup — installs all three IaC tools on the runner
deploy-iac — pulumi up, terraform apply, tofu apply in sequence
configure-ec2 — SSHs in, pulls latest code, restarts Flask, reloads nginx
cleanup-crew — wipes the workspace

There is also triforge-destroy.yml, manually triggered via workflow_dispatch. When I am done demoing, I trigger it from my phone and the SageMaker endpoint disappears. No idle instance burning money while I sleep.

The hardest part was keeping Flask alive after the SSH session closed. nohup and disown both got killed by appleboy’s ssh-action doing its cleanup. I was annoyed about this for longer than I should have been. The fix was converting Flask to a proper systemd service — the pipeline now just runs sudo systemctl restart triforge and moves on. One line. No background process drama.

What I Actually Learned#

Remote state is non-negotiable. If state lives locally the pipeline cannot see it. If it lives on the GitHub Actions runner it dies when the runner does. S3 is always on. This is not a debate.

IAM before everything else. Every service that talks to another needs a role. Write the trust policy, create the role, attach the policy, then build the resource that uses it. Do it out of order and you will spend an embarrassing amount of time debugging “unauthorized” errors wondering if you have lost your mind.

Name your resources explicitly or you will regret it. Every pulumi up was creating a new endpoint with a random suffix and the Secrets Manager entry was still pointing to the old name. 500 errors on every deploy. It took me too long to figure out why. Adding name="triforge-whisper-endpoint" to the endpoint resource fixed it permanently. Explicit names. Every time.

The AWS docs are bad and querying the CLI is faster. I said it. I have no notes.

CPU inference has a ceiling. ml.m5.large on a translation model is fine for a demo. For anything with real users, budget for a GPU instance or accept that your response times will make people think something is broken.

Credits#

This project used real work from real people and I want to be specific about that:

Helsinki-NLP and the University of Helsinki language technology group — opus-mt-en-es is compact, fast, and the model card is actually readable. The model is doing all the real work in this demo.
HuggingFace — the inference containers and model hub made SageMaker deployment significantly less painful than it would have been otherwise.
Pulumi AWS docs — genuinely good documentation. Clear examples, well organised. Rare.
Terraform Registry — solid for EC2 and networking resources.
OpenTofu — for existing and being the fork the community deserved.
HuggingFace Deep Learning Containers — more useful than the AWS docs for finding container URIs.
appleboy/ssh-action — the tool that killed my nohup processes and indirectly forced me to learn systemd properly. Thanks I guess.

What’s Next#

This is the icing on a much longer plan. Next up is zero trust pipelines and ephemeral CI environments — the same infrastructure wrapped in a proper staging → test → destroy → deploy-to-prod workflow.

That writeup is coming. Find me on Twitter or LinkedIn when it drops.

About the Author#

Elijah Udom (elijahu) is an Infrastructure & Cloud Engineer based in Lagos, Nigeria. AWS, Kubernetes, eBPF security, AI/ML infrastructure. Building in the open.

Elijah Udom

← Previous: Days 9–12 | Next: Coming Soon →