Why Half of MCP Servers Fail
We ran 1,677,644 health checks on 615 MCP servers. The average reliability score across the ecosystem is 46.8%. That means roughly half of all MCP servers you might integrate with are failing a significant portion of checks.
Here's what the data actually shows — broken down by failure type, time of day, and what separates the servers that stay up from the ones that don't.
The Failure Modes
Not all failures are equal. We categorize errors from the mcp_health_checks table into four buckets — and the distribution tells a story about why servers fail, not just that they fail.
Server accepts the connection but never responds within the SLA window.
TCP handshake immediately rejected. Process is down, port not listening.
Server responds but with an error — unhandled exceptions, OOM crashes, bad deploys.
Hostname doesn't resolve. Domain expired, misconfigured, or infrastructure removed.
SSL errors, rate limiting, auth failures, malformed responses.
The timeout dominance matters for AI agent design. When your Claude agent calls a tool backed by an unreliable MCP server, a timeout doesn't just fail — it burns context window and latency budget before it fails. A connection refused at least fails in milliseconds.
The DNS failure category is interesting: these are essentially dead servers. Once a domain goes unresolvable, it never comes back. These servers inflate the "unreliable" numbers but represent a distinct failure class — abandoned infrastructure, not operational instability.
When Servers Fail (Hour by Hour)
We grouped all failed checks by the hour they occurred (UTC). The pattern is consistent across 30 days of data: failure rates spike during specific windows that map almost exactly to US business hours.
The peak failure window — typically 00:00, 01:00, 02:00 UTC — corresponds to US East Coast morning through afternoon (8am–2pm ET). This is when:
- →Developer traffic spikes as teams start work
- →CI/CD deploys happen (introducing bad code or missed env vars)
- →Free-tier cloud services hit daily compute quotas
- →Shared infrastructure becomes contended
The overnight dip (02:00–08:00 UTC) is genuine stability — fewer deployments, lower load. If you're scheduling agent tasks, run them at 03:00–07:00 UTC for the lowest failure probability.
What the Reliable 10% Do Differently
We compared the top 10 servers by reliability score against the bottom 10. The differences aren't surprising, but they're stark.