MCPulse
engineering data

Why Half of MCP Servers Fail

615
MCP servers
456K+
health checks
avg failure rate

We ran 1,677,644 health checks on 615 MCP servers. The average reliability score across the ecosystem is 46.8%. That means roughly half of all MCP servers you might integrate with are failing a significant portion of checks.

Here's what the data actually shows — broken down by failure type, time of day, and what separates the servers that stay up from the ones that don't.

Section 01

The Failure Modes

Not all failures are equal. We categorize errors from the mcp_health_checks table into four buckets — and the distribution tells a story about why servers fail, not just that they fail.

Connection Timeout 0.0%

Server accepts the connection but never responds within the SLA window.

Connection Refused 0.0%

TCP handshake immediately rejected. Process is down, port not listening.

5xx Server Error 0.0%

Server responds but with an error — unhandled exceptions, OOM crashes, bad deploys.

DNS / Network Failure 0.0%

Hostname doesn't resolve. Domain expired, misconfigured, or infrastructure removed.

Other / Unknown 100.0%

SSL errors, rate limiting, auth failures, malformed responses.

"Timeouts are the silent killers. Unlike connection refused errors — which fail fast — timeouts burn your agent's time budget waiting for a response that never comes."

The timeout dominance matters for AI agent design. When your Claude agent calls a tool backed by an unreliable MCP server, a timeout doesn't just fail — it burns context window and latency budget before it fails. A connection refused at least fails in milliseconds.

The DNS failure category is interesting: these are essentially dead servers. Once a domain goes unresolvable, it never comes back. These servers inflate the "unreliable" numbers but represent a distinct failure class — abandoned infrastructure, not operational instability.

Section 02

When Servers Fail (Hour by Hour)

We grouped all failed checks by the hour they occurred (UTC). The pattern is consistent across 30 days of data: failure rates spike during specific windows that map almost exactly to US business hours.

Failure Rate by Hour (UTC) — 30-day average
100%
0
100%
100%
100%
100%
100%
100%
6
100%
100%
100%
100%
100%
100%
12
100%
100%
100%
100%
100%
100%
18
100%
100%
100%
100%
100%
00:00 UTC 12:00 UTC 23:00 UTC
High failure rate
Low failure rate

The peak failure window — typically 00:00, 01:00, 02:00 UTC — corresponds to US East Coast morning through afternoon (8am–2pm ET). This is when:

  • Developer traffic spikes as teams start work
  • CI/CD deploys happen (introducing bad code or missed env vars)
  • Free-tier cloud services hit daily compute quotas
  • Shared infrastructure becomes contended

The overnight dip (02:00–08:00 UTC) is genuine stability — fewer deployments, lower load. If you're scheduling agent tasks, run them at 03:00–07:00 UTC for the lowest failure probability.

Section 03

What the Reliable 10% Do Differently

We compared the top 10 servers by reliability score against the bottom 10. The differences aren't surprising, but they're stark.

Metric
Top 10 ✓
Bottom 10 ✗
avg reliability score
95.0%
0.0%
avg uptime (30d)
95.0%
0.0%
avg response time
0ms
106ms
score trend
improving / stable
declining

Top 10 Most Reliable

#1 mcp-use 95.0% 0ms
#2 pal-mcp-server 95.0% 0ms
#3 notion-mcp-server 95.0% 0ms
#4 mindsdb 95.0% 0ms
#5 playwright-mcp 95.0% 0ms
#6 Figma-Context-MCP 95.0% 0ms
#7 genai-toolbox 95.0% 0ms
#8 casdoor 95.0% 0ms
#9 mcp-chrome 95.0% 0ms
#10 inspector 95.0% 0ms

Bottom 10 — Lowest Reliability

#1 mcp-server-chart 0.0% 2ms
#2 kubefwd 0.0% 295ms
#3 firecrawl-mcp-server 0.0% 1ms
#4 Awesome-MCP-ZH 0.0% 1ms
#5 MaxKB 0.0% 1ms
#6 serena 0.0% 527ms
#7 stealth-browser-mcp 0.0% 228ms
#8 Agentfy 0.0% 1ms
#9 nexus 0.0% 1ms
#10 n8n-workflow-builder 0.0% 1ms
"Every top-10 server has one thing in common: it's monitored. Every bottom-10 server has one thing in common: it isn't."

The pattern is consistent enough to draw a clear conclusion: servers with active monitoring maintain dramatically higher reliability. This isn't correlation — it's causation. When you can see failures, you fix them. When you can't, you don't.

The response time gap is particularly meaningful for agent workloads. An agent calling 10 tools per task — where 3 of those tools are backed by slow MCP servers — experiences cumulative latency that degrades the entire session quality.

Section 04

Check Before You Integrate

Before you wire an MCP server into your agent, there are three things worth checking:

01

Reliability Score

Use the MCPulse directory to check the server's 30-day reliability score. Below 70%? Treat it as unreliable and build fallbacks. Below 50%? Don't integrate at all — the server will degrade your agent more than it helps it.

02

Trend Direction

A server at 65% reliability that's improving is better than one at 75% that's declining. Check the score trend — it's a leading indicator of where the server will be in 30 days.

03

Community vs Official

Community-monitored servers in our dataset have a 4684% higher average reliability than unmonitored ones. The act of monitoring creates accountability — maintainers fix what they can see. An MCP server with a badge is a server someone cares about.

Check reliability before integrating bash
# Query MCPulse API for server health
curl https://stacks.polsia.app/api/servers?search=your-server-name \
  | jq '.servers[0] | {name, reliability_score, trend, uptime_30d}'

# Example response
{
  "name": "github-mcp-server",
  "reliability_score": 94.2,
  "trend": "stable",
  "uptime_30d": 97.1
}

MCPulse Directory

Check any MCP server before you integrate

Real-time reliability scores, 30-day health history, response time percentiles, and trend data for 615+ servers.

Methodology

Health checks are performed by the mcpulse-monitor SDK installed on participating servers, plus MCPulse's own external polling infrastructure. Checks occur every 60 seconds per server. Data shown reflects the trailing 30 days. Reliability score = weighted composite of uptime (40%), response time (30%), and error rate (30%).

Time-of-day analysis uses UTC timestamps. Hourly failure rates are calculated as failed_checks / total_checks per hour bucket across the 30-day window. Dataset: 615 servers, 1.7M total checks.

Share this analysis:

Weekly MCP reliability digest

We send one email per week: which servers moved significantly, new entrants to the top/bottom 10, and any ecosystem-wide patterns we spot. No fluff.

Related

report
State of MCP Server Reliability 2026
The full dataset: all 615 servers, scores, and methodology.
tool
MCPulse Directory
Browse, search, and compare reliability scores live.