Kill Your Service Account Keys: Secure GitLab CI/CD on Google Cloud

If your CI/CD pipeline authenticates to Google Cloud with a service account key stored in a CI variable, you have a problem. You might not know it yet, but you have a problem.

That JSON key file is a static credential. It doesn’t expire (unless you rotate it, which you don’t). It has no context about who or what is using it. If it leaks (and CI variables leak more often than anyone admits), an attacker gets the same access your pipeline has. Forever, or until someone notices.

So I built a POC to try the alternative: a keyless, signed, vulnerability-gated pipeline from GitLab to Google Cloud. No service account keys. No stored secrets.

The Architecture#

The pipeline does six things, in order:

graph LR A["OIDC\nFederation"] --> B["Build"] B --> C["Push to\nArtifact Registry"] C --> D["Vulnerability\nScan"] D --> E["Binary\nAuthorization"] E --> F["Deploy\nto QA"] F -.->|manual| G["Promote\nto PROD"] style A fill:#44475a,stroke:#50fa7b,color:#f8f8f2 style B fill:#44475a,stroke:#8be9fd,color:#f8f8f2 style C fill:#44475a,stroke:#8be9fd,color:#f8f8f2 style D fill:#44475a,stroke:#ffb86c,color:#f8f8f2 style E fill:#44475a,stroke:#ff79c6,color:#f8f8f2 style F fill:#44475a,stroke:#50fa7b,color:#f8f8f2 style G fill:#44475a,stroke:#bd93f9,color:#f8f8f2

Each stage has a security purpose:

OIDC Federation: GitLab authenticates to Google Cloud using a JWT token. No keys.
Build: Container image built with Docker-in-Docker, pushed to GitLab’s own registry first.
Push to Artifact Registry: Image copied to Google Artifact Registry, authenticated via Workload Identity Federation.
Vulnerability Scan: Google Container Analysis scans the image. If it finds critical vulnerabilities, the pipeline stops.
Binary Authorization: If the scan passes, the image gets cryptographically signed. Unsigned images cannot be deployed.
Deploy: Cloud Deploy creates a release to QA. Production requires manual promotion, and the Binary Authorization policy will reject unsigned images.

The entire infrastructure is defined in Terraform. No ClickOps.

Workload Identity Federation: the Key That Isn’t a Key#

sequenceDiagram participant GL as GitLab CI participant OIDC as GitLab OIDC participant WIF as Workload Identity
Federation participant GCP as Google Cloud APIs GL->>OIDC: Request JWT token OIDC-->>GL: JWT with claims
(project, role, ref) GL->>WIF: Present JWT token WIF->>WIF: Validate issuer,
map claims to attributes WIF-->>GL: Short-lived
access token GL->>GCP: Authenticated API call Note over GL,GCP: No service account key involved

This is the core idea. Instead of giving GitLab a service account key, you tell Google Cloud: “trust JWT tokens issued by GitLab, but only for this specific project, and only with these specific claims.”

The Terraform configuration creates an identity pool and maps GitLab’s JWT claims to Google Cloud attributes:

resource "google_iam_workload_identity_pool_provider" "gitlab_provider" {
  attribute_mapping = {
    "google.subject"             = "assertion.sub"
    "attribute.project_path"     = "assertion.project_path"
    "attribute.developer_access" = "assertion.developer_access"
    "attribute.ref"              = "assertion.ref"
  }

  oidc {
    issuer_uri = "https://auth.gcp.gitlab.com/oidc/${var.gitlab_namespace}"
  }
}

What this gives you is contextual, ephemeral authentication. Every pipeline run gets a short-lived token scoped to exactly what it needs. When the pipeline finishes, the token expires. There’s nothing to steal, nothing to rotate, nothing to leak.

The IAM bindings are granular too. A developer pushing code gets artifactregistry.writer and logging.logWriter. A guest gets artifactregistry.reader. The permissions follow the GitLab role, not a static key that has everything.

The Deliberate Vulnerability#

graph TD A["git push
alpine:3.14.2"] --> B["Build"] B --> C["Vulnerability Scan"] C -->|"CVEs found"| D["BLOCKED"] D --> E["Binary Auth
refuses to sign"] E --> F["Deploy FAILS"] A2["git push
alpine:3.18"] --> B2["Build"] B2 --> C2["Vulnerability Scan"] C2 -->|"Clean"| D2["PASSED"] D2 --> E2["Image signed"] E2 --> F2["Deploy succeeds"] style D fill:#ff5555,stroke:#ff5555,color:#f8f8f2 style F fill:#ff5555,stroke:#ff5555,color:#f8f8f2 style D2 fill:#50fa7b,stroke:#50fa7b,color:#282a36 style F2 fill:#50fa7b,stroke:#50fa7b,color:#282a36

This is the part I like most about this demo. The Dockerfile intentionally uses a vulnerable base image:

FROM alpine:3.14.2

Alpine 3.14.2 has known CVEs. When the pipeline runs for the first time, the vulnerability scan catches them, and Binary Authorization refuses to sign the image. The deployment to production fails, by design.

The fix is simple:

FROM alpine:3.18

Push this change, and the pipeline turns green. The scan passes, the image gets signed, the deployment succeeds. I’ve found this works way better than any slide deck for getting people to care about supply chain security. When they see the pipeline actually block a deploy, it clicks.

Binary Authorization: Trust, but Verify (and Sign)#

Binary Authorization is the mechanism that ties everything together. It implements a simple policy: no container runs in production unless it has been scanned and signed.

The Terraform setup creates a GPG key pair, a Container Analysis note, and an attestor:

resource "google_binary_authorization_policy" "policy" {
  default_admission_rule {
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.vulnz_attestor.name
    ]
  }
}

The pipeline’s sign stage uses the GPG private key (stored in Secret Manager, not in CI variables) to create an attestation. Cloud Run checks this attestation before allowing the container to run.

This closes the loop: even if someone pushes an image directly to Artifact Registry, bypassing the pipeline, Binary Authorization will block it from running. The policy is enforced at the platform level, not at the pipeline level.

Progressive Delivery#

The deploy stage targets QA automatically. Production promotion is manual: you go to the Cloud Deploy console, review the release, and click “Promote.” This is intentional. Fully automated production deployments are a luxury you earn after your team trusts the process.

The Cloud Deploy pipeline has two targets:

QA: automatic deployment after successful scan and signing
PROD: manual promotion with Binary Authorization enforcement

Both targets run on Cloud Run, which keeps the infrastructure cost near zero for a demo.

What I Learned Building This#

Stop using service account keys in CI/CD. Seriously. Workload Identity Federation exists, it works, and once you’ve Terraformed it, it’s done. The setup is more complex upfront, but that’s a one-time cost.

Break things on purpose. The deliberately vulnerable Dockerfile is the most useful part of this demo. People nod politely when you explain supply chain security. They actually pay attention when they see a deploy get blocked.

Terraform all of it. The entire setup is about 400 lines of HCL. Reproducing it in a different project takes minutes. Doing it by hand in the console would take hours and you’d forget a step.

Think about security before production, not after. Most organizations bolt on security after deployment: WAFs, monitoring, incident response. But if your pipeline doesn’t verify what it deploys, your monitoring is just a faster way to find out you’ve been compromised.

The full source code is available at gitlab.com/thekoma/secure-gitlab-gcp-cicd. It’s MIT-licensed and designed to be forked and adapted.