Four People, Four Datacenters, Three Thousand Servers

In 2015 I joined Irideos (then KPNQwest Italia) as a Cloud Architect. The job sounded fancy. The reality was four datacenters, roughly three thousand servers, a team of four engineers, and a budget that could generously be described as “creative.”

This is the story of how we made it work, and what I still carry from that experience today.

The constraint that changed everything#

When you have 3,000 servers and 4 people, you don’t have the luxury of doing things manually. You don’t SSH into boxes. You don’t hand-craft configurations. You automate everything or you drown.

This wasn’t a philosophical choice about “Infrastructure as Code.” It was survival.

We rebuilt the entire operational stack from scratch:

Monitoring: Icinga2, because Nagios was showing its age and we needed something that could scale without a dedicated team to babysit it
Logging: ELK stack with centralized authentication, because when something breaks at 3 AM, you need to search logs, not read them
Identity: FreeIPA, because managing credentials across 3,000 servers with spreadsheets is a form of professional self-harm
Automation: Puppet initially, then Ansible, orchestrated through Satellite and GitLab CI/CD
DNS: PowerDNS for recursion, Bind for authoritative, including DDoS mitigation that we learned about the hard way

The lesson I keep relearning#

Looking back, the best infrastructure decisions we made came from having no money, not from being smart. When you can’t buy your way out of a problem, you have to actually think about it.

Every tool we chose had to earn its place. Every automation had to save more time than it cost to build. Every process had to be simple enough that any of the four of us could handle it at 3 AM after being woken up by a page.

I still catch myself applying this filter: what is the simplest thing that actually works? Even at Google Cloud, where the temptation to throw services at a problem is constant and the catalog is infinite.

Then vs Now#

graph LR subgraph THEN["2015 - Irideos"] A1["Puppet"] --> B1["Bare Metal"] C1["Icinga2"] --> B1 D1["GitLab CI"] --> A1 end subgraph NOW["2026 - Google Cloud"] A2["Terraform"] --> B2["GKE / Cloud Run"] C2["Cloud Monitoring"] --> B2 D2["CI/CD Pipeline"] --> A2 end THEN -.->|"Same principles
different tools"| NOW style THEN fill:#44475a,stroke:#ffb86c,color:#f8f8f2 style NOW fill:#44475a,stroke:#8be9fd,color:#f8f8f2

What this taught me about cloud#

When I moved to cloud consulting (first Red Hat, then Google), I kept running into the same problem we had at Irideos: how do you keep things running at scale without hiring an army?

The tools are different (Terraform instead of Puppet, GKE instead of bare metal, Cloud Monitoring instead of Icinga2) but the underlying question is the same. And honestly, the companies that struggle most with cloud are the ones that never had to work under real constraints. They have budget, so they buy complexity. Then they need 20 people to operate what 4 people could handle with simpler choices.

This is the first post on this blog. More war stories, technical deep-dives, and unsolicited opinions to follow.