Skip to main content

Guide

How to Set Up Uptime Monitoring

Step-by-step guide to configuring uptime monitoring that tells you when something is wrong -- check intervals, alert routing, and escalation.

Category Guide
Read Time 5 min read
Updated April 2026
Steps 5 steps

Who This Guide Is For

This guide is for business owners, operations leads, and technical managers who run web applications, APIs, or background services and want to know when something goes down before their users tell them. You are setting up monitoring either for the first time or replacing an ad-hoc setup that is not working.

Before You Start

You should have a list of the systems, endpoints, or services you need to monitor. If you are not sure what needs monitoring, start by listing everything your business depends on that runs on a server: your website, any web applications, APIs, background jobs, and third-party integrations. Each one is a potential monitoring target.

You should also know who needs to be notified when something goes wrong and how — email, SMS, push notification, or a chat channel. Alert routing is as important as the monitoring itself.

Step 1: Define What “Up” Means for Each Service

“Is it running?” is not specific enough. For each service you monitor, define what a successful check looks like.

For a website or web application: an HTTP request to the homepage or health endpoint returns a 200 status code within an acceptable time (typically under three seconds). For an API: a request to a known endpoint returns the expected response format. For a background service: the service sends a heartbeat within its expected interval — if it should run every thirty seconds, a missing heartbeat after sixty seconds is a failure.

Define these thresholds before you configure anything. A check that passes when the server returns a 500 error page (because it technically responded) is worse than no monitoring at all — it creates a false sense of security.

Step 2: Set Check Intervals Based on Impact

Not every service needs checking every thirty seconds. Match the check interval to the business impact of downtime.

Critical services (client-facing application, payment processing, primary API): check every one to two minutes. Downtime here directly affects revenue or client experience.

Important services (internal tools, reporting dashboards, secondary APIs): check every five minutes. Downtime is inconvenient but not immediately damaging.

Background services (scheduled jobs, data sync, maintenance tasks): check based on the service’s own schedule. A job that runs every hour should be checked every hour, not every minute.

Over-monitoring wastes resources and can trigger rate limits on the services being checked. Under-monitoring means you find out late. Match the interval to the cost of a delayed detection.

Step 3: Configure Alert Routing and Escalation

Alerts need to reach the right person through the right channel at the right time. A monitoring system that sends every alert to a shared email inbox will be ignored within a week.

Set up tiered alerting:

  1. First alert — the on-call person or primary contact receives a notification via their fastest channel (push notification or SMS). This fires the moment a check fails and is confirmed by a second check.
  2. Escalation — if the issue is not acknowledged within fifteen to thirty minutes, the alert escalates to a second person or a broader channel. This catches situations where the primary contact is unavailable.
  3. Resolved notification — when the service recovers, everyone who was alerted receives a resolution notification. This prevents unnecessary investigation of issues that have already resolved.

The confirmation step (requiring two consecutive failures before alerting) prevents false positives from transient network issues. A single failed check is noise. Two consecutive failures within a few minutes is a genuine signal.

Step 4: Add Context to Your Alerts

An alert that says “Service X is down” tells you something is wrong. An alert that says “Service X returned a 503 at 14:32 after two consecutive failures, last successful check at 14:28, response time was 12 seconds before failure” tells you what to investigate.

Configure your alerts to include: the service name, what failed (status code, timeout, missing heartbeat), when the failure started, how long it has been down, and a link to the monitoring dashboard for more detail. This context saves minutes of investigation on every incident.

If your monitoring tool supports it, add runbook links to each alert — a URL pointing to documentation about what to do when that specific service fails. At 3am, the person responding will thank you for not making them figure out the recovery procedure from memory.

Step 5: Test Your Monitoring Before You Need It

After configuration, verify that monitoring actually works by deliberately triggering a failure. Take a test service offline and confirm that the alert fires, reaches the right person, through the right channel, with the expected context. Then bring the service back up and confirm the resolution notification fires.

Test the escalation path too. Ignore the first alert deliberately and confirm that the escalation reaches the second person. This is the step most people skip, and it is the reason they discover their escalation does not work during a real incident at 2am.

Review your monitoring setup monthly. Services change, endpoints move, and the person who was on-call three months ago may have left the team. A monitoring configuration that is not maintained becomes unreliable over time.

Common Mistakes

  • Monitoring only the homepage. The homepage being up does not mean the application works. Monitor the health endpoint, the login flow, and the API — the parts that users actually depend on.
  • Alerting on every single failure. A single failed check is usually a network blip. Require two consecutive failures before alerting. Otherwise your team will disable notifications within the first week.
  • Sending all alerts to email. Email is not a real-time channel. Critical alerts need push notifications or SMS. Email is fine for daily summaries and resolved notifications.
  • Not testing the alert path. If you have never seen your monitoring fire a real alert, you do not know whether it works. Test it. Deliberately.
  • Setting the same interval for everything. A background job that runs hourly does not need per-minute checks. Match intervals to business impact and the service’s own cadence.

What Good Looks Like

A well-configured monitoring setup looks like this: every critical service is checked at an appropriate interval. Alerts reach the right person within minutes of a genuine failure. False positives are rare enough that every alert is taken seriously. The team has practised responding to alerts and knows what to do for each service. When something goes down, the monitoring system tells them before any user does.

Next Steps

If you are also monitoring background processes and scheduled jobs, How to Set Up Automated Lead Qualification covers monitoring in the context of automated workflows. For the broader system type behind uptime monitoring, see Uptime Monitoring System. If you want monitoring set up and managed for you, get in touch.

Need Hands-On Help?

Our guides give you the thinking. If you want someone to do the building, we should talk.

Start a Project Browse Case Studies