Every team running a server eventually learns the same lesson the hard way: their monitoring tool told them everything was fine right up until it wasn't. A user noticed the outage before any alert fired. The on-call engineer was staring at a green dashboard while customers got 500 errors.
This happens because most monitoring tools only cover one of the three distinct layers of server health — and they never tell you which layer they're missing.
Layer 1: External Availability (Are You Reachable?)
This is the layer UptimeRobot, Better Uptime, Freshping, and hundreds of other services cover. Every few minutes, they make an HTTP request to your URL and check whether you respond with a 200.
It's genuinely useful. If your web server crashes, your load balancer falls over, or your DNS stops resolving, Layer 1 monitoring catches it within minutes and fires an alert.
But Layer 1 has a fundamental blind spot: it only knows what the outside world can see. It has no idea what's happening inside your server. You could be running at 99% memory utilization, accumulating 40GB of log files headed toward a disk-full crash, or running an outdated SSL certificate 6 days from expiry — and every external uptime check would still return green.
What Layer 1 catches: Full outages visible from outside the network.
What Layer 1 misses: Everything internal. Slow disk fills. Memory pressure. Service-level issues that haven't yet caused a visible outage. Gradual degradation.
Layer 2: Internal Health (What Are Your Metrics Saying?)
This is the layer Netdata, Prometheus, Grafana, and traditional APM tools occupy. An agent runs inside your server, collecting CPU load, memory utilization, disk I/O, network traffic, and sending that data to a central store for dashboards and threshold-based alerts.
Layer 2 is richer and more actionable than Layer 1. It can tell you that disk is at 87% and trending toward full. It can show you that memory started climbing after the 3pm deploy. It can alert on a specific service going down, not just the entire site.
The limitation: Layer 2 requires the agent to be running and reporting. If the agent process crashes, gets killed by an OOM event, or the machine loses network connectivity, your Layer 2 monitoring goes silent — and typically gives you no indication that it's silent. From the dashboard's perspective, silence looks exactly like "no problems."
What Layer 2 catches: Internal resource pressure, service-level failures, gradual degradation over time.
What Layer 2 misses: Its own failure. When the monitoring agent dies, you don't know.
Layer 3: Agent Presence (Is Your Monitoring Agent Still Watching?)
This is the layer almost every tool below the enterprise tier leaves completely unimplemented. Layer 3 asks a deceptively simple question: is the agent that's supposed to be monitoring your server still alive?
This sounds obvious, but implementing it requires a fundamentally different architecture than Layers 1 and 2. You need:
- A push-based heartbeat model — agents regularly check in with the control plane, rather than the control plane polling them.
- A watchdog on the control plane side — if an agent stops checking in, that silence is treated as a failure, not as "nothing to report."
- Alert deduplication and suppression — to avoid flooding users with repeated "still offline" messages during extended outages.
Without Layer 3, a server that powers off completely, gets rebooted into a kernel panic, or has its monitoring daemon killed by an OOM event will silently disappear from your fleet. Your Layer 2 dashboard will just stop updating. You won't know until a user reports a problem or you happen to check manually.
What Layer 3 catches: Agent crashes, machine power-off events, network partitions, OOM-killed monitoring processes, anything that kills the observer itself.
What Layer 3 misses: Nothing — by design, Layer 3 is about catching what the other layers can't.
Why the Gap Persists
The reason most tools don't implement all three layers comes down to product positioning and incentive structures.
External uptime tools (UptimeRobot, Freshping) are built around simplicity. They're easy to set up, require no agent installation, and sell on one metric: "Is your site up?" They have no incentive to tell you about your disk fill rate because that would require a much more complex product.
Internal metrics tools (Netdata, Prometheus, Grafana) are built around data density. They're dashboards first, alerting second. They track thousands of metrics but typically don't implement agent heartbeat monitoring because that's "ops infrastructure" rather than "observability data."
The result is that most small teams running their own servers end up with a patchwork:
- UptimeRobot checking their domain
- Maybe Netdata installed but rarely checked
- No Layer 3 coverage at all
And then they get surprised when the monitoring agent gets killed at 2am and their dashboard stays green while the server is on fire.
What Full Coverage Looks Like
A complete monitoring approach covers all three layers simultaneously, with clear visibility into which layer a problem originates from.
- Layer 1 alert: "Your health check endpoint returned 503" — something external is broken.
- Layer 2 alert: "Disk on prod-3 is at 91% and trending toward full in 6 days" — internal pressure building before it causes an outage.
- Layer 3 alert: "prod-3 has gone offline — last check-in was 32 minutes ago" — the monitoring agent stopped reporting. This is the one that prevents silent failures.
The "back online" notification matters just as much as the offline one. When the machine recovers and the agent resumes reporting, you should get an automatic all-clear so you know the gap in monitoring has closed — and when it closed.
The Operational Impact
Teams with all three layers covered experience a qualitatively different operational posture than teams with only one or two.
With Layer 1 only: you know about outages after users do.
With Layer 2 added: you catch most problems before they become outages, but you still have blind spots when the agent dies.
With Layer 3 added: silence from your monitoring system is itself a signal, not a default state. You get alerted when your observer goes blind, not just when it sees something bad.
That third layer is what transforms monitoring from "dashboard to check occasionally" into "system I can actually trust."
Tink's Three-Layer Approach
Tink was designed from the start to cover all three layers with a single agent and one monthly cost.
Layer 1 is covered by the external uptime probe cron — every 5 minutes, Tink probes the HTTP health check URLs of registered services from outside your network, using the same infrastructure that powers the control plane. If a service stops responding or starts returning 5xx errors, you get a Telegram or email alert within minutes.
Layer 2 is covered by the in-server agent — CPU, memory, disk, running services, TLS certificate expiry, SSH auth log analysis, and predictive trend detection across the last 4 scans. Issues are diagnosed with AI-generated root cause analysis and suggested fix commands, not just raw metric values.
Layer 3 is covered by the machine health cron — every 15 minutes, the control plane checks which agents have stopped reporting. Any machine that hasn't checked in for more than 25 minutes triggers an alert across all configured notification channels: Telegram, email, Discord, Slack, ntfy, and webhook. When the agent comes back online, an automatic "back online" notification fires so you know the monitoring gap has closed.
The three layers reinforce each other. A server that fails Layer 1 (external ping fails) while still passing Layer 3 (agent is still reporting) tells you the web server process died but the machine is still up — a very different situation than a machine that fails Layer 3 entirely (agent stopped reporting, machine may be down).
If your monitoring stack only covers one of these three layers, you have silent failure modes waiting to happen. The good news is that all three can be covered with a single tool if that tool was designed with the full picture in mind.
Most tools weren't. That's the gap Tink was built to close.
Try Tink on your server
One command to install. Watches your server, explains problems, guides fixes.