Infrastructure to Build Infrastructure

In the previous post I explained why the new cluster runs Proxmox instead of XCP-ng. This one is the migration itself — specifically, how it started: the Terraform and Ansible that stood up the first guest, what running them actually looked like, and the moment the code finally had a home somewhere other than my laptop.

Rebuild, don’t convert

The obvious-sounding plan is to export each VM from XCP-ng and import it into Proxmox. I didn’t do that. The two platforms use different disk formats and guest tooling, and — as the last post covered — Proxmox’s LXC model means many of these guests stop being full VMs at all. A cross-hypervisor disk conversion would have dragged years of per-VM cruft along with it.

Instead, every service was re-provisioned from scratch on Proxmox and then cut over. The rule from the first post applies here: if it isn’t captured as code, it didn’t happen. Hosts defined in Terraform, configured with Ansible. That rule has a consequence the first post didn’t spell out: the code needs somewhere to live. GitLab was first.

GitLab went first

Two reasons. The old GitLab instance on XCP-ng was already shut down — nothing to preserve, no cutover to coordinate, a clean greenfield. And I wanted the infrastructure code to live somewhere other than my laptop before I started depending on it. That’s a circular problem with an obvious solution: stand up GitLab first, then push the code there. (The title is intentional. My name is recursive too.)

Two commits, Sunday evening

The initial state was two commits made back-to-back on May 31. One for Terraform, one for Ansible.

The Terraform commit was a skeleton. Seven files:

terraform/
├── .gitignore
├── proxmox/
│   ├── .terraform.lock.hcl
│   ├── gitlab.tf
│   ├── main.tf
│   ├── providers.tf
│   ├── terraform.tfvars.example
│   └── variables.tf

providers.tf wired up two providers — the bpg/proxmox provider and the hashicorp/random provider for the generated root password:

terraform {
  required_providers {
    proxmox = {
      source  = "bpg/proxmox"
      version = ">= 0.60.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
  }
}

provider "proxmox" {
  endpoint  = var.proxmox_api_url
  api_token = var.proxmox_api_token
  insecure  = var.proxmox_insecure
}

variables.tf had four variables:

variable "proxmox_api_url" {
  description = "The URL for the Proxmox API (e.g., https://192.168.1.100:8006/)"
  type        = string
}

variable "proxmox_api_token" {
  description = "The API token for Proxmox in the format user@realm!tokenid=uuid"
  type        = string
  sensitive   = true
}

variable "proxmox_insecure" {
  description = "Whether to ignore SSL certificate validation errors"
  type        = bool
  default     = true
}

variable "proxmox_target_node" {
  description = "The Proxmox node to deploy resources to"
  type        = string
  default     = "proxmox-mini01"
}

terraform.tfvars.example was three lines — placeholder values to copy into terraform.tfvars and fill in before running anything:

proxmox_api_url   = "https://YOUR_PROXMOX_IP:8006/"
proxmox_api_token = "root@pam!terraform=your-token-uuid"
proxmox_insecure  = true

proxmox_insecure = true is a temporary workaround — Proxmox ships with a self-signed certificate, so TLS verification would fail until a real cert was in place. Once GitLab was up and issuing real certs, this could be flipped to false.

The actual terraform.tfvars with real credentials stays out of git via .gitignore. The .example file is what gets committed — it documents the shape without committing the secrets.

And gitlab.tf, which is the whole point of the commit:

resource "proxmox_download_file" "debian_12_lxc_template" {
  content_type = "vztmpl"
  datastore_id = "local"
  node_name    = var.proxmox_target_node
  url          = "http://download.proxmox.com/images/system/debian-12-standard_12.2-1_amd64.tar.zst"
}

resource "random_password" "gitlab_container_password" {
  length  = 16
  special = true
}

resource "proxmox_virtual_environment_container" "gitlab_container" {
  description = "GitLab Server (Managed by Terraform)"
  node_name   = var.proxmox_target_node

  unprivileged = true
  features {
    nesting = true   # GitLab's internal systemd services need it
  }

  initialization {
    hostname = "gitlab-server"
    user_account {
      password = random_password.gitlab_container_password.result
    }
    ip_config {
      ipv4 { address = "dhcp" }
    }
  }

  cpu    { cores = 4 }
  memory { dedicated = 8192 }
  disk   { datastore_id = "local-lvm"; size = 32 }

  network_interface {
    name    = "eth0"
    vlan_id = 22
  }

  operating_system {
    template_file_id = proxmox_download_file.debian_12_lxc_template.id
    type             = "debian"
  }
}

output "gitlab_root_password" {
  value     = random_password.gitlab_container_password.result
  sensitive = true
}

A few things in there worth calling out. nesting = true is required for GitLab: LXC containers share the host kernel, and GitLab runs its own internal service supervisor. Without nesting, those sub-processes fail silently inside the container. DHCP for the network address — simpler while the pattern was still being worked out. And a generated root password, emitted as a sensitive output so it could be retrieved with terraform output -raw gitlab_root_password and SSH’d in with.

The Ansible commit was even smaller. Three files — an inventory and a 33-line install playbook:

ansible/
├── .gitignore
├── install_gitlab.yml
└── inventory.yml

inventory.yml:

all:
  children:
    gitlab:
      hosts:
        gitlab.tod.net:
          ansible_user: tod
          ansible_become: yes

install_gitlab.yml:

- name: Install GitLab CE
  hosts: gitlab
  become: yes
  tasks:
    - name: Install prerequisites
      apt:
        name: [curl, openssh-server, ca-certificates, tzdata, perl]
        state: present
        update_cache: yes

    - name: Download GitLab repository installation script
      get_url:
        url: https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh
        dest: /tmp/gitlab_script.deb.sh
        mode: '0755'

    - name: Run GitLab repository installation script
      command: /tmp/gitlab_script.deb.sh
      args:
        creates: /etc/apt/sources.list.d/gitlab_gitlab-ce.list

    - name: Install GitLab CE
      apt:
        name: gitlab-ce
        state: present
      environment:
        EXTERNAL_URL: "http://gitlab.tod.net"

Nothing clever. Prerequisites, add the package repo, install gitlab-ce with the external URL set so the GitLab installer configures nginx for the right hostname on first run.¹

`terraform apply`

Copy terraform.tfvars.example to terraform.tfvars, fill in the Proxmox API URL and token, run terraform apply. The provider downloads the Debian 12 LXC template to local storage on mini01, creates the container, starts it. The container comes up on DHCP on VLAN 22 and pulls an address. GitLab stayed on DHCP with a manual fixed-client reservation in UniFi — intentionally, and predating the Terraform-managed reservation pattern that later services would use. The reasoning: if the network changes, DHCP propagates it automatically rather than requiring manual updates everywhere.² That’s still how GitLab is configured today.

The generated root password is now in Terraform state. To get it:

$ terraform output -raw gitlab_root_password
<redacted>

Find the DHCP address in the Proxmox UI (the container’s Network tab shows it once the OS has booted), SSH in as root with the generated password, and you’re on a fresh Debian 12 container. The Proxmox side is done. Now Ansible.

Installing GitLab

The inventory.yml already had gitlab.tod.net pointing at the tod user — but right now the only user that exists on the container is root, and the hostname resolves to wherever the DHCP lease landed. So the first Ansible run went against the IP directly, as root:

ansible-playbook install_gitlab.yml -i inventory.yml \
  -e "ansible_user=root ansible_host=<dhcp-ip>"

(The -e ansible_host= override applies to all hosts in the targeted group — which is fine here because the gitlab group in inventory.yml has exactly one host.)

GitLab CE’s install is one of those installs that takes long enough to be uncomfortable — package download, database setup, nginx and Puma configuration, the whole initial reconfigure. When it finishes, http://gitlab.tod.net loads the new-instance setup screen. HTTP — no cert yet.

Getting a cert (manually)

GitLab on HTTP is fine for a few minutes while you verify it loaded. It’s not fine for anything you’re going to push code to. Getting a cert meant doing it by hand first, then figuring out how to automate it.

The instance isn’t publicly reachable, so HTTP-01 validation is out. DNS-01 via Cloudflare is the answer — certbot proves domain ownership by creating a TXT record through the Cloudflare API rather than by serving a file over HTTP.³ As root on the container:

apt install python3-certbot-dns-cloudflare
mkdir -p ~/.secrets/certbot/
vi ~/.secrets/certbot/cloudflare.ini     # dns_cloudflare_api_token = <token>
chmod 600 ~/.secrets/certbot/cloudflare.ini
chmod go-rx ~/.secrets/certbot/
certbot certonly --dns-cloudflare \
  --dns-cloudflare-credentials ~/.secrets/certbot/cloudflare.ini \
  -d gitlab.tod.net

GitLab CE manages its own nginx and expects certs in a specific location under /etc/gitlab/ssl/. One change to gitlab.rb is required: flip external_url from http:// to https://. That single edit is what tells GitLab Omnibus to enable its SSL listener — it then reads the cert files, configures nginx, and updates every internal link and OAuth redirect URI to use HTTPS automatically.⁴ That last part — no hunting through config files to find every http:// reference — is what made it feel almost magical. A deploy hook in certbot’s renewal pipeline keeps the cert files current on every renewal:

# /etc/letsencrypt/renewal-hooks/deploy/gitlab-deploy.sh
#!/bin/bash
cp /etc/letsencrypt/live/gitlab.tod.net/fullchain.pem /etc/gitlab/ssl/gitlab.tod.net.crt
cp /etc/letsencrypt/live/gitlab.tod.net/privkey.pem   /etc/gitlab/ssl/gitlab.tod.net.key
gitlab-ctl reconfigure

With the hook in place: certbot renew --force-renew to test it, check that https://gitlab.tod.net loads with a valid cert, done. (It took a few gitlab-ctl reconfigure runs and a wrong cert path — cert.pem instead of fullchain.pem — before it landed cleanly. The bash history on that box is archaeological evidence of the fumbling.)

None of this was in code yet. It lived in root’s bash history and a shell script that hadn’t been committed anywhere. That’s exactly the problem the rule is supposed to prevent — a week later, a session with Claude walked through what had been built manually and turned it into an Ansible role: install certbot, drop the Cloudflare credentials from vault, issue the cert, wire up the deploy hook. The next service that needed a cert got it from Ansible instead of from memory.

Configuring it

With GitLab running, the next session was about making it actually useful — OAuth sign-in so it wasn’t just password-gated. That meant a second Ansible playbook, gitlab.yml.

The playbook is two plays. The first runs the certbot-cloudflare role to issue the TLS cert — pulling that manual cert work from the previous section into code. The second installs and configures GitLab CE itself, including OmniAuth. Running them together means a fresh GitLab instance goes from bare container to HTTPS + SSO in one shot.

The directory structure at this point:

ansible/
├── inventory.yml
├── install_gitlab.yml       # initial install (from first commit)
├── gitlab.yml               # cert + OAuth configuration
├── group_vars/
│   └── gitlab/
│       ├── vars.yml         # non-secret variables; vault references live here
│       └── vault.yml        # ansible-vault encrypted secrets
└── roles/
    └── certbot-cloudflare/  # TLS cert issuance + deploy hook

Ansible loads group_vars/gitlab/ automatically for any host in the gitlab inventory group. vault.yml is encrypted at rest with ansible-vault — it’s ciphertext in the repo, decrypted at runtime by the --vault-id flag. vars.yml acts as an indirection layer between the playbook and the vault: tasks reference plain names like gitlab_google_client_id, which resolve to vault_gitlab_google_client_id. The vault variable names never appear in task code, so the playbook works the same whether the values come from vault, from --extra-vars, or from a CI secret. The TLS token follows the same pattern:

gitlab_external_url: "https://gitlab.tod.net"

gitlab_entra_tenant_id: "{{ vault_gitlab_entra_tenant_id }}"
gitlab_entra_client_id:  "{{ vault_gitlab_entra_client_id }}"
gitlab_entra_client_secret: "{{ vault_gitlab_entra_client_secret }}"

gitlab_google_client_id:     "{{ vault_gitlab_google_client_id }}"
gitlab_google_client_secret: "{{ vault_gitlab_google_client_secret }}"

certbot_domain:    "gitlab.tod.net"
certbot_ssl_dir:   "/etc/gitlab/ssl"
certbot_reload_command: "gitlab-ctl hup nginx"
certbot_cloudflare_api_token: "{{ vault_certbot_cloudflare_api_token }}"

The OmniAuth configuration is the interesting part.⁵ GitLab’s gitlab.rb uses Gitlab::ConfigMash, which doesn’t support Ruby’s << operator — you can’t append to omniauth_providers incrementally. The entire array has to be assigned in one shot. The playbook handles this with an accumulator pattern: one set_fact task per provider builds up _omniauth_providers_ruby, a list of Ruby hash literals; a single blockinfile task at the end writes the complete array. Each provider task is also guarded by when conditions that check whether the credentials are defined and non-placeholder, so you can deploy with only Google, only Entra ID, or both, without touching the task structure.⁶⁷

In practice, only Google OAuth2 ran. The Entra ID vault entries are still placeholders — I’m still deciding how much I want to lean on Entra for homelab SSO, and the when guards mean the task silently skips until real credentials land in the vault. The playbook is ready; the app registration in Azure is not.

- name: Register Google OAuth2 provider
  set_fact:
    _omniauth_allow: "{{ _omniauth_allow | default([]) + [\"'google_oauth2'\"] }}"
    _omniauth_providers_ruby: "{{ _omniauth_providers_ruby | default([]) + [_ruby] }}"
  vars:
    _ruby: >-
      { name: "google_oauth2", label: "Sign in with Google",
        app_id: "{{ gitlab_google_client_id }}",
        app_secret: "{{ gitlab_google_client_secret }}",
        args: { access_type: "offline", approval_prompt: "" } }
  when:
    - gitlab_google_client_id is defined
    - gitlab_google_client_id != "REPLACE_WITH_YOUR_GOOGLE_CLIENT_ID"
    - gitlab_google_client_secret is defined
    - gitlab_google_client_secret != "REPLACE_WITH_YOUR_GOOGLE_CLIENT_SECRET"

- name: Register Entra ID OIDC provider
  set_fact:
    _omniauth_allow: "{{ _omniauth_allow | default([]) + [\"'openid_connect'\"] }}"
    _omniauth_providers_ruby: "{{ _omniauth_providers_ruby | default([]) + [_ruby] }}"
  vars:
    _ruby: >-
      { name: "openid_connect", label: "Sign in with Microsoft",
        args: { name: "openid_connect", scope: ["openid", "profile", "email"],
          response_type: "code",
          issuer: "https://login.microsoftonline.com/{{ gitlab_entra_tenant_id }}/v2.0",
          client_auth_method: "query", discovery: true, uid_field: "sub",
          client_options: { identifier: "{{ gitlab_entra_client_id }}",
            secret: "{{ gitlab_entra_client_secret }}",
            redirect_uri: "{{ gitlab_external_url }}/users/auth/openid_connect/callback" } } }
  when:
    - gitlab_entra_tenant_id is defined
    - gitlab_entra_tenant_id != "REPLACE_WITH_YOUR_TENANT_ID"

- name: Configure OmniAuth providers in gitlab.rb
  blockinfile:
    path: /etc/gitlab/gitlab.rb
    marker: "# {mark} ANSIBLE MANAGED BLOCK - OmniAuth Providers"
    block: |
      gitlab_rails['omniauth_allow_single_sign_on'] = [{{ _omniauth_allow | default([]) | join(', ') }}]
      gitlab_rails['omniauth_providers'] = [
        {{ _omniauth_providers_ruby | default([]) | join(',\n    ') }}
      ]
  when: (_omniauth_providers_ruby | default([])) | length > 0
  notify: Reconfigure GitLab

The handler at the top of the play — gitlab-ctl reconfigure — only fires if a notify: Reconfigure GitLab task actually changed something. Two blockinfile tasks, two potential notifications, one reconfigure at the end if either changed. Running the playbook:

ansible-playbook -i inventory.yml gitlab.yml --vault-id gitlab@prompt

GitLab CE picked up the new config on its next reconfigure. OAuth sign-in worked. GitLab was now a usable instance, not just an installed one.

The first push

With GitLab live and configured, both repos needed to get off the laptop and onto it. glab auth login to authenticate against the local instance, create the remote repositories, then:

git remote add gitlab https://gitlab.tod.net/tod/homelab.git
git push gitlab main

git remote add gitlab https://gitlab.tod.net/tod/ansible.git
git push gitlab main

That’s the point the whole post was building toward. The Terraform that provisioned the GitLab container now lives on the GitLab instance that Terraform provisioned. The Ansible that installed GitLab is there too. From here, every subsequent service should get committed before it gets built, and the code has somewhere to go.

One thing I didn’t appreciate at the time — and didn’t appreciate until writing this post and putting it in front of an AI reviewer — is that the code is in GitLab, but the Terraform state file is not. The state file records every resource Terraform manages and is the source of truth for what actually exists; without it, Terraform loses track of the entire cluster. It lives on my laptop, gitignored. The recursion in the title isn’t quite complete. Backing it up properly is on the list.

The next post covers what came next: a week building a cert automation approach that turned out to have a fatal flaw, then six days and three attempts trying to get FreeIPA running as a subordinate CA. Neither worked. Both were instructive.

AI-assistant disclaimer: This post was drafted by Claude Code (Claude Sonnet 4.6, claude-sonnet-4-6) from session transcripts, commit history, and the actual Terraform and Ansible in my repo, then reviewed and edited by me. It may contain inaccuracies; verify specifics before relying on them.

GitLab CE installation guide for Debian/Ubuntu: https://about.gitlab.com/install/#debian. The EXTERNAL_URL environment variable is the documented way to set the hostname at install time. ↩︎
The “propagates automatically” argument has a hole: if the subnet changes, gitlab.tod.net in DNS still needs a manual update regardless of DHCP. The DNS side of the network is the piece I haven’t fully codified yet — a broader cleanup is on the list. ↩︎
certbot-dns-cloudflare plugin documentation: https://certbot-dns-cloudflare.readthedocs.io/en/stable/. This is distinct from the HTTP-01 challenge — no public web server required. ↩︎
GitLab Omnibus SSL configuration: https://docs.gitlab.com/omnibus/settings/ssl/. The external_url change from http:// to https:// is the documented trigger for enabling Omnibus’s built-in SSL handling. ↩︎
GitLab OmniAuth configuration: https://docs.gitlab.com/ee/integration/omniauth.html. Covers global settings, provider registration, and the omniauth_providers array format. ↩︎
GitLab Google OAuth2 setup: https://docs.gitlab.com/ee/integration/google.html. The access_type: "offline" and empty approval_prompt args are the recommended defaults for server-side OAuth flows. ↩︎
GitLab Microsoft Azure Active Directory / Entra ID OIDC setup: https://docs.gitlab.com/ee/integration/azure.html. The uid_field: "sub" choice uses the stable OIDC subject identifier rather than email, which can change. ↩︎