Prometheus HA Limitations and How Thanos Handles Them
Core Problem
Prometheus is stateful and does not replicate its TSDB. Running multiple Prometheus replicas does not give you replicated data — each replica scrapes and stores independently.
Why Replicas Diverge
Two Prometheus replicas pointing at the same targets will never produce identical data:
- Scrape timing: Each replica's scrape loop fires on its own schedule. Sample timestamps differ by milliseconds to seconds.
- Crash / restart windows: When replica A is down for rolling restart, replica B keeps scraping. A's TSDB has a gap that B doesn't, and vice versa.
- Network blips: A scrape failure on one replica produces a gap that the other may not have.
The result: replica A and replica B each hold a slightly different copy of "the truth" about your targets.
Why a Load Balancer Breaks HA
The naive HA pattern — put a LB (NLB / ALB / Service ClusterIP with random selection) in front of N Prometheus replicas — does not work.
LB picks one randomly
│
┌────────┴────────┐
▼ ▼
Replica A Replica B
(had a gap (was healthy
during rolling during that
restart 09:00) window)
If your query hits Replica A for the 09:00 window, you see a gap. If it hits Replica B, you see complete data. Same query, different answers, depending on which backend the LB picked. This is worse than no HA — it makes data quality non-deterministic.
How Thanos Solves It
Thanos Query connects to all replicas simultaneously (via DNS service discovery — typically dnssrv+_grpc._tcp.<headless-svc>). It then:
- Pulls the same time range from every replica
- Deduplicates overlapping samples (using
replicalabel) - Fills gaps from one replica with data from another
- Returns a single, gap-free merged view to the caller
This requires direct endpoint access to every replica, not a single LB endpoint that hides them. The "many endpoints" model is essential to the algorithm.
Design Implication
| Scenario | Right pattern |
|---|---|
| 1 replica Prometheus | LB OK (no dedup needed) |
| N replica HA Prometheus + Thanos | DNS discovery / Headless Service — never a single LB |
| N replica HA + only a single LB | Anti-pattern — breaks Thanos dedup |
| Thanos Receive (not Sidecar) | Receive handles its own replication via RF flag |
Key Takeaway
LB and HA-Prometheus-with-Thanos are mutually exclusive design choices. Choosing HA replicas forces you to expose them via DNS-based service discovery so Thanos Query can reach every replica directly.
Source
Thanos official documentation: