The Scenario
A digital services company runs its own corporate website, a client portal, an internal knowledge base, and three client-facing web applications that it hosts and maintains under support retainers. That is seven web properties in total, spread across three hosting providers.
The managing director discovers outages the same way his customers do: something stops working and someone complains. Last month, the client portal went down for four hours on a Tuesday morning. The first indication was an email from a client asking why they could not log in. The technical lead investigated, found a database connection issue, and resolved it. But by that point, three clients had noticed and two had called in. The managing director spent the rest of the day apologising.
There is no monitoring in place. Nobody checks whether the sites are up. Nobody is alerted when response times degrade. Nobody knows whether last night’s automated backup actually completed. The assumption is that everything is working until evidence arrives that it is not.
The Problem
Reactive outage discovery is a reputational risk that compounds over time. Each incident erodes client confidence. The clients who complained are the ones who noticed. Others may have encountered the issue and silently formed an opinion. For a company that sells digital services, having its own infrastructure go down without awareness is particularly damaging.
The seven web properties are maintained by a small technical team that also handles development work. They do not have time to manually check every site every morning, and even if they did, a manual check at nine o’clock would not catch an outage at eleven. The gap between something breaking and someone noticing can be minutes or hours, depending entirely on whether a user happens to be active at the time.
Performance degradation is an even bigger blind spot than outages. A site that loads in eight seconds instead of two is technically up but practically broken. Nobody complains about slowness the way they complain about downtime, so the issue persists invisibly until it affects conversion rates, user satisfaction, or search rankings.
SSL certificate expiry, domain renewal, and hosting resource limits are additional risks that surface as crises rather than scheduled maintenance tasks. The company has already had one near-miss where an SSL certificate expired on a client site over a weekend. The browser security warning was live for fourteen hours before anyone noticed.
The Approach
Digital Royalty builds a unified monitoring dashboard that covers all seven web properties and provides real-time visibility into uptime, performance, and infrastructure health.
Each site is monitored at one-minute intervals from multiple geographic locations. If a site fails to respond or returns an error, an alert fires immediately. Alerts are tiered: the technical lead receives the first notification, and if the issue is not acknowledged within ten minutes, it escalates to the managing director.
Response time monitoring tracks page load speed continuously. The dashboard displays current response time alongside historical trends, making it easy to spot gradual degradation before it becomes a user-visible problem. Thresholds are configurable per site — the client portal has tighter tolerances than the internal knowledge base because the audience and stakes are different.
SSL certificate and domain expiry monitoring provides thirty, fourteen, and seven-day warnings. These are not emergencies. They are scheduled maintenance tasks that should never become emergencies if they are tracked properly.
A status overview screen shows all seven sites on a single page. Green means healthy. Amber means degraded. Red means down. The managing director can glance at this screen in five seconds and know the state of every property the company is responsible for.
Historical uptime data is logged and available as reports. This serves two purposes: internally, it tracks the team’s infrastructure reliability over time; externally, it provides evidence for client SLA reporting.
The Outcome
The first tangible result arrives within a week. The monitoring system detects that one of the client web applications is returning intermittent 500 errors between two and four in the morning. The technical lead investigates and finds a cron job that runs a database cleanup process that briefly locks a table. It has probably been happening for months. Users in different time zones may have encountered it. The fix takes thirty minutes.
Over the first quarter, the system catches three incidents before any user reports them. In each case, the technical team resolves the issue during the alert window. The managing director does not receive a single client complaint about downtime in that period. This is a direct contrast to the previous quarter, where there were four.
The SSL certificate warning prevents a repeat of the weekend expiry incident. The team receives a thirty-day notice, schedules the renewal, and completes it during a planned maintenance window. The process takes ten minutes. The crisis it prevents would have taken hours.
The response time trends reveal that the corporate website has been getting progressively slower over six months. An image optimisation pass and a caching configuration change bring load time from 4.2 seconds back to 1.8 seconds. The improvement would not have been prioritised without data showing the decline.
Client-facing SLA reports, generated directly from the monitoring data, become a value-add in retainer renewal conversations. Clients can see documented uptime of 99.95 percent. It turns infrastructure reliability from an assumed commodity into a demonstrated strength.
Who This Applies To
This scenario is relevant to any business responsible for more than one web property — whether those are your own sites, client sites under management, or a mix of both. It applies to agencies, managed service providers, SaaS companies, and any organisation where website downtime has a direct impact on revenue, reputation, or client relationships.
If you currently find out about outages from your users rather than your systems, or if you have no visibility into performance trends across your web estate, this is the gap to close.