The True Cost of Downtime: Calculating Operational Resilience

JetSetGo Operations AnalystMay 26, 2026

It is 8:47am on a Saturday in peak season. The wharf is filling. Three sailings are due to load before lunch. The dispatcher refreshes the booking dashboard and the page is grey. The point-of-sale app on the counter staff's tablet has stopped accepting cards. Two minutes later the phone starts ringing — the website is throwing a 500 error and a customer halfway through checkout wants to know whether their booking went through.

Most operators have lived a version of this morning. The question that rarely gets asked, in advance, is what it actually cost when it happened.

This article walks through the categories that compose the real cost of an outage for a transport or tourism operator, the formula for each, and a worksheet you can apply to your own operation. It is honest about the trade-off: higher uptime tiers cost real money, and accepting some downtime exposure is a defensible commercial choice once you can see the number you are accepting.

Why this matters now

Three things have shifted under operators in the last decade.

The first is that almost every layer of the operation now depends on cloud-hosted software. The booking engine, the point-of-sale, the manifest, the manifest's connection to the captain's tablet, the payment processor, the SMS gateway that confirms tickets, the channel-manager that distributes inventory to OTAs — all of it lives on someone else's servers. An outage in any one of those layers stops part of the operation.

The second is channel-mix complexity. A decade ago a small ferry might have taken bookings by phone, walk-up, and one OTA. The same operator today is likely on a direct website, mobile POS, two or three OTAs, and an agent portal — each of which expects live availability. When the inventory pool goes dark, every channel's customer experience degrades at once.

The third is the concentration of revenue into a small number of peak windows. Long-weekend mornings, school-holiday Saturdays, the summer peak, the festival weekend — these are the hours that pay for the quiet weeks. An outage at 8:47am on a Saturday in peak season is not a proportional slice of an annual revenue line. It is a disproportionate slice. Calculating downtime cost on an annual-average basis under-states it badly.

The cost of downtime, broken into components

Total downtime cost is the sum of five components. Some are direct and easy to model; some are indirect and harder. Both classes matter.

Component 1: Forgone bookings

Direct, measurable, and usually the easiest place to start.

Forgone bookings = (typical bookings per hour during the affected window) × (hours of outage) × (average booking value) × (recapture loss rate)

Two things are worth getting right here.

Use the bookings-per-hour rate for the actual window, not an annual average. A Saturday-morning peak rate is often three to five times the weekday-afternoon rate.

The recapture loss rate is the share of customers who do not come back later. Some customers will retry an hour later; some will book a competitor; some will give up on the day. Operators with strong direct-channel loyalty (resident card holders, season-pass renewers, repeat tourists) recapture more. Operators selling to first-time visitors on a tight itinerary recapture less. A reasonable working range is 30%–70% lost — the rest reschedule. Pick a figure for your operation rather than copying one in.

Component 2: Refunds and remediation issued

When the system comes back, customers who were affected often expect something for their trouble. Some operators issue automatic small refunds (5%–10%) on the affected sailings; some issue rebooking credits; some absorb the support load instead of paying out.

Refunds and remediation = (affected customers) × (typical goodwill payout per customer) + (customers escalated to chargeback) × (typical chargeback cost)

Chargebacks are the under-appreciated tail. A handful of customers who could not reach a human, did not get a refund quickly enough, and lodged a card-issuer dispute can move this line item by an order of magnitude — chargeback fees and the operational time spent contesting them add up faster than the original refund would have.

Component 3: Support hours absorbed

When the system is down, the office phone is up. Staff who would be running the operation are instead apologising, taking manual bookings on paper, manually authorising card payments by phone to a backup processor, and rebooking next week's overbooked sailing one customer at a time.

Support cost = (extra staff-hours absorbed) × (fully-loaded staff hourly rate) + (overtime premium for staff staying late to catch up)

The second term is the one operators tend to forget. The outage ends at 11am; the catch-up runs to 7pm. Two of your staff are on overtime rate for the back half of that. The labour cost of catching up is often as big as the labour cost of the outage itself.

Component 4: Channel-relationship damage

This is indirect but real, and it is the one operators most often miss.

When your booking system cannot serve live availability, several things can happen on the OTA side, depending on your channel-manager configuration: the OTA may cache stale availability and oversell, the OTA may close out your inventory entirely until availability returns, or your listing may drop in the OTA's search ranking for "responsiveness" reasons.

Channel damage cost = (lost OTA placement value over the recovery window) + (cost of resolving any oversells the OTA created from stale cache)

Recovery from a search-rank drop can take weeks. An oversell that the OTA created because it could not see your real inventory is your problem to clean up, not theirs. Both costs are easy to dismiss in the moment and easy to under-state.

Component 5: Trust erosion

The hardest to measure and usually the largest in the medium term.

A first-time customer whose booking attempt failed at 8:47am will rarely tell you why they chose someone else next time. The cost shows up as a softer renewal rate, a lower repeat-booking percentage, or a quieter shoulder season twelve months later. You will see it in the trend line and you will not be able to attribute it.

The practical way to model it is to assume a fraction of affected first-time customers do not return:

Trust erosion = (affected first-time customers) × (non-return rate) × (estimated lifetime value of a repeat customer)

Pick conservative inputs. A non-return rate of 10%–20% of affected first-time customers is a reasonable working assumption; a lifetime value figure should come from your own repeat-booking history rather than an industry average.

Two worked examples

The formulas above are not useful in the abstract. Two hypothetical operator profiles, both at the start of a peak Saturday outage.

Profile A — small operator, single vessel, half-day outage

A small island ferry operator. One vessel, four sailings on a Saturday in peak season, average twelve bookings per sailing window, average booking value AUD $95. The booking system is down from 8:30am to noon — three and a half hours, covering the first two sailings and the start of the third.

Forgone bookings: 12 bookings/hour × 3.5 hours × AUD $95 × 50% recapture loss = ~AUD $1,995
Refunds and remediation: 30 affected customers × AUD $20 goodwill credit + 2 chargebacks × AUD $35 = ~AUD $670
Support hours absorbed: 2 staff × 4 extra hours × AUD $35 fully-loaded rate + 2 staff × 2 hours overtime at 1.5× = ~AUD $490
Channel damage: assume modest — one OTA closed out for the morning, ranking impact minor over a 2-week recovery = ~AUD $300
Trust erosion: 18 affected first-time visitors × 15% non-return rate × AUD $180 estimated repeat-customer value = ~AUD $485

Total: ~AUD $3,940 for a single 3.5-hour outage on a peak Saturday.

The same outage on a quiet Tuesday in winter would land at perhaps AUD $400. The peak multiplier is the part operators most often forget when budgeting against an annual figure.

Profile B — mid-market operator, multi-product, full-day outage

A regional operator running a passenger ferry, a half-day tour, and a packaged combination of both. Three boats, eleven sailings on a Saturday, average eighteen bookings per sailing window, average booking value AUD $145. The booking system is down for the full operating day — eight hours.

Forgone bookings: 22 bookings/hour × 8 hours × AUD $145 × 55% recapture loss = ~AUD $14,036
Refunds and remediation: ~180 affected customers × AUD $25 average payout + 8 chargebacks × AUD $45 = ~AUD $4,860
Support hours absorbed: 5 staff × 9 extra hours × AUD $40 fully-loaded + overtime at 1.5× for 4 hours = ~AUD $2,400
Channel damage: two OTAs closed out for the full day, modest ranking drop, one oversell to resolve = ~AUD $1,500
Trust erosion: ~50 first-time visitors × 18% non-return × AUD $240 repeat value = ~AUD $2,160

Total: ~AUD $24,956 for one full-day outage on one peak Saturday.

Multiply by the realistic number of peak Saturdays per year, weight by the realistic incidence of outages, and the operator can see the annual exposure they are actually carrying.

The calculator framework — copy and apply

A worksheet you can paste into a spreadsheet or scribble on the back of a manifest. Replace the placeholder values with your own.

Two practical notes on using it.

Run it twice — once for a peak-Saturday-morning outage and once for a quiet-Tuesday-afternoon outage. The gap between the two numbers is what tells you whether the right investment is "more uptime everywhere" or "more uptime during the windows that pay for the year".

Build the per-event cost from your own numbers. A worksheet that uses your real per-sailing bookings, your real chargeback experience, and your real repeat-customer value will be much more useful than one that imports anyone else's averages.

The trade-off, honestly stated

Higher uptime is not free. Multi-region database failover, hot standby payment processors, redundant SMS gateways, real-time channel-manager fallback, an offline-capable POS for the counter staff — every layer of resilience has a cost, paid either as a feature in the software tier you choose, or as engineering time, or as a recurring fee to a backup provider.

The question is not "how do I get to 100% uptime", because the answer is "you do not". The question is: what is the per-event cost of an outage during my most exposed window, multiplied by the realistic frequency, and is that bigger than the annualised cost of buying the resilience that would prevent it?

For a small operator running a few hundred bookings a month, the answer might genuinely be that absorbing one or two outages a year is cheaper than paying for a higher tier or a redundant payment processor. For a mid-market operator concentrating revenue into twenty peak Saturdays a year, the calculation usually points the other way. Both answers are defensible. The undefendable position is having never run the numbers.

What reduces exposure — architecture-level levers

A short, capability-led note rather than a sales section. The choices that reduce downtime exposure are the same regardless of vendor.

Offline-capable point-of-sale. A POS that can keep selling tickets at the counter without a live connection to the booking server — sync when reconnected — protects the on-the-day revenue stream from upstream outages. Not all booking platforms include this; it is worth asking about specifically.

Read-replica architecture for availability. Showing live availability does not require the same database that takes bookings. Platforms that separate read traffic from write traffic stay partially available even when the write layer is degraded — customers can see availability and queue intent even if final confirmation is delayed.

Channel-manager fallback behaviour. Different OTAs handle stale availability differently. Knowing in advance whether your channel-manager will close out, cache, or escalate during an outage matters; configuring it deliberately is better than discovering its default at 8:47am.

Status pages and proactive comms. A platform that posts incident status publicly, and that triggers your own customer comms automatically, reduces the support-hours-absorbed line item materially. The operator who can point a customer at a public status page in 10 seconds saves the 10-minute phone call.

Audit-grade transaction logs that survive the outage. When the system comes back, the question is "what was the state when it went down". A platform whose transaction log survives independently of the database that uses it makes the catch-up half-day quick rather than painful.

None of these are exotic. They are the difference between a half-day of confusion and a half-day of orderly degraded operation.

A note on hosted vs self-hosted

Some operators run their own infrastructure for reasons of cost, control, or contractual obligation. The downtime calculation does not change shape — but the location of risk does. With hosted software, you are paying a vendor to manage the resilience layers; with self-hosted, you own those layers directly. Neither is intrinsically more or less reliable; the question is whether the operator's team has the bandwidth to run the resilience layers a hosted platform would otherwise manage.

For most small and mid-market transport and tourism operators, hosted is the right call simply because the team's attention is finite and is better spent on the operation than on database failover drills. For enterprise operators with dedicated IT functions, the trade-off shifts.

Where this leaves operators

The two takeaways worth keeping.

First, downtime cost is not one number. It is a per-event cost that varies massively with the window in which it happens, multiplied by an event-frequency number that depends on platform choice and operator preparation. Calculating an annual-average figure is misleading; calculating a peak-window figure and a quiet-window figure separately is honest.

Second, the right resilience investment is the one whose annualised cost is smaller than the annualised cost of the outages it would prevent. That is a defensible commercial decision in either direction once the numbers are written down. The failure mode is not running the calculation and then being surprised by the bill on a Saturday morning.

Open question for operators

How do you currently measure the cost of an outage in your operation — and have you ever modelled the difference between an outage at peak versus an outage in shoulder season? Operators who have run those numbers and reached different conclusions about the right resilience tier would be a useful conversation. The answer is not the same for every operator and the calculation framework deserves more real-world inputs than any one operator's experience.