PagerDuty starts at $21/user/month. Opsgenie recently folded into Atlassian's broader platform. Squadcast, Better Stack, ilert — there's a whole ecosystem of on-call scheduling tools built for teams of 10, 50, or 500 engineers.
If you're a 2-person founding team or a freelancer managing 10 client servers, most of this is overkill. But the core problem is real: when something breaks at 2am, someone needs to know about it, acknowledge it, and deal with it before it turns into a customer complaint.
Here's what on-call management actually looks like at small scale — and what tools you actually need.
What On-Call Scheduling Solves
On-call scheduling tools exist to answer three questions:
- Who gets the alert? When a server goes down, which engineer's phone rings?
- Did they see it? Has someone acknowledged the incident, or is it still ringing?
- What happens if they don't respond? Escalation: after N minutes, alert the backup person.
For a 50-person team, this requires sophisticated tooling. You need rotations, override calendars, escalation policies, and audit trails showing who was on-call when and whether they responded within SLA.
For a 2-person team? You need exactly one thing: the right person's phone to buzz when something breaks.
The Accidental On-Call Engineer
Most small-team servers aren't managed by "on-call engineers" — they're managed by whoever set up the server, who is also the founder, the lead developer, and probably the one doing customer support.
This creates a specific pattern:
- Alerts fire at inconvenient times
- The "on-call person" is always the same person
- They check their phone, see something is broken, and don't know where to start
- They spend 30 minutes SSHing around before diagnosing the problem
The bottleneck isn't the alerting — it's the gap between alert and diagnosis. Getting woken up is one thing. Knowing what to do about it is another.
What Small Teams Actually Need From On-Call
After talking to dozens of freelancers and small dev teams, the requirements collapse to:
1. Reliable, low-noise alerts
The #1 reason people ignore monitoring alerts is alert fatigue. If everything pages at warning level, nothing feels urgent. Small teams need:
- Critical-only alerts by default
- Alert deduplication (don't re-alert for the same issue on every scan)
- Snooze capability for known maintenance windows
2. Context with the alert
Getting a message that says "server CPU is at 92%" is not helpful at 2am. You need to know: which server, is it getting worse, what process is causing it, and what should I do first?
This is where AI-native monitoring tools differ from traditional ones. Instead of raw metric alerts, you want: "prod-api CPU hit 94%. Top process is node (pid 18432). This started 20 minutes ago and is trending up. This is 3x above your normal baseline."
3. Quick acknowledgment
When you see an alert, you want to be able to mark it as "I've got this" — both for your own peace of mind and so a second person on the team doesn't start investigating the same thing independently.
4. Rotation tracking (for 2+ people)
If you have at least two people, you want to be able to say "this week it's @alice's turn, next week it's @bob's." Nothing complex — just a name attached to who's responsible right now.
Where PagerDuty Goes Wrong for Small Teams
PagerDuty is exceptional software — for the right use case. If you have:
- Multiple teams
- Complex escalation policies (level 1 → level 2 → management)
- SLA-tracked incident response
- Dedicated on-call rotations by service domain
...then PagerDuty is the right tool. It's built for that use case.
But for small teams, you're paying for:
- Per-user seats (you need at least 5 to do rotations meaningfully)
- Configuration overhead for escalation policies you'll never use
- A dashboard that's empty most of the time
- Integration setup that takes days
The yearly cost for a 3-person team using PagerDuty's Standard tier is ~$750. That's before monitoring, status pages, or any of the other tools you need.
The Lightweight Alternative
Here's what small-team on-call actually looks like in practice:
Use Telegram (or Slack) as your paging channel. It's where you're already watching messages. Add your server monitoring tool to send critical alerts there.
Set a current on-call person. Just a name. "This week it's Alice." When an alert fires, Alice sees it and responds. No complex rotation software needed.
Acknowledge in-channel. A button press that says "I've seen this" prevents the 2am "did you see the alert?" text thread.
AI diagnosis on tap. When you get paged, you want to be able to ask "What's going on with prod-api?" and get a real answer — not SSHing in blindly.
This is what Tink implements natively. The /oncall @alice command sets the current on-call person. Their name appears on every critical alert. The "Acknowledge" button lets them mark it as seen. And the "What's going on?" button triggers an immediate AI diagnosis.
No $21/seat SaaS required.
Port Exposure: The Silent Risk Most Tools Miss
One under-discussed on-call scenario: getting paged because a database is publicly accessible.
Redis, MongoDB, Elasticsearch, and MySQL all bind to 0.0.0.0 by default on some Linux distributions. If you don't have a firewall configured, these databases are reachable from the public internet — often without authentication.
This isn't a hypothetical: there are active internet scanners looking for open Redis instances to use as command-and-control servers. Open MongoDB deployments have leaked millions of records.
Traditional monitoring tools don't detect this because they're watching metrics, not network exposure. Tink's port exposure detection checks every scan for known dangerous ports (Redis 6379, MongoDB 27017, Elasticsearch 9200, MySQL 3306, and others) and fires a warning if they're listening — before an attacker finds them.
The Three-Tier On-Call Stack for Small Teams
For most small teams and freelancers, the right on-call stack is:
- Detection — server monitoring that catches issues before they become customer problems (Tink, or similar)
- Notification — delivery to the right channel at the right severity (Telegram, Slack, email, Discord)
- Response context — AI-powered diagnosis so the person who gets paged knows what to do immediately
That's it. You don't need incident management platforms, on-call scheduling software, or complex escalation policies until you have at least 5+ engineers and multiple service domains.
Start with the basics, stay out of the tools, and focus on response time and resolution quality. That's what customers actually care about.
The Real On-Call Metric: MTTR
Mean Time to Resolution (MTTR) is the metric that matters — not whether you have a PagerDuty account.
MTTR breaks down into:
- MTTD (detection) — how long until someone knows there's a problem
- MTTA (acknowledgment) — how long until someone is actively working on it
- MTTF (fix) — how long until the problem is resolved
Most small-team monitoring improves MTTD (better alerts) but ignores MTTF (what do I actually do?). The biggest leverage point for solo operators and small teams is closing the gap between "I see the alert" and "I know what to do."
That's why AI-native monitoring — where the diagnosis comes with the alert — cuts MTTR more than any scheduling tool at this scale.
Tink is an AI-powered server mechanic for accidental sysadmins and small teams. It installs on any Linux server in one command and sends plain-English alerts with built-in diagnosis. Get started free — no credit card required.
Try Tink on your server
One command to install. Watches your server, explains problems, guides fixes.