Booking-System Uptime: What "99.9%" Actually Means for Your Operation

JetSetGo Operations AnalystMay 26, 2026

The marketing page says "99.9% uptime". The operator reads it the way it's intended: the system is almost always up, and the 0.1% it isn't will be small, occasional, and probably at 3am on a Tuesday when nobody's buying anyway.

Two of those assumptions are wrong. The first — that 0.1% is small — depends entirely on what you're measuring it against. The second — that downtime falls evenly across the year — almost never holds for transport or tourism. Outages don't politely time themselves around your traffic. The hour you're most likely to be unable to take a booking is the hour you most need to.

This article translates SLA percentages into the numbers that actually matter — minutes per year, exposure during weekend-peak windows, what an SLA does and doesn't guarantee — and walks through how to evaluate any booking platform's uptime claim before signing. It's a companion piece to The True Cost of Downtime: that one calculates what an outage costs, this one calculates how often the platform is contractually allowed to give you one.

Why uptime is unevenly distributed for transport and tourism

Most uptime conversations frame downtime as a flat annual budget. "99.9%" sounds like roughly nine hours a year — a day every other year — and operators read it against an average-day revenue line and decide it's tolerable. Two features of transport and tourism break that framing.

The first is revenue concentration into peak hours. A small or mid-market operator does a disproportionate share of annual booking volume in a handful of windows: Saturday and Sunday mornings in the high season, school holidays, long weekends, festival days, the summer peak across roughly twelve weeks. Eight to twelve hours of any week often carry more booking value than the rest of the week combined. If the 0.1% of downtime your platform is allowed falls inside one of those windows, you didn't lose 0.1% of a year's revenue — you lost a number several times larger.

The second is channel-mix sensitivity. A booking outage isn't only a direct-website outage. The point-of-sale at the wharf, the manifest the captain checks before departure, the channel-manager feed to OTAs, the API call the agent portal makes to confirm availability — they often share infrastructure with the public booking page. When the booking layer goes dark, every channel that depends on live availability degrades together.

So when you read "99.9%" on a vendor page, the right next question is not "is that a lot of downtime?" but "does that 0.1% have to be distributed evenly across the year, and what does the SLA say about when it can happen?" The honest answer is almost always: no, it doesn't, and the contract says very little.

The arithmetic: what each percentage means

The math is deterministic. A year has 525,600 minutes; a 30-day month has 43,200 minutes. Multiply by the allowed downtime fraction and you get the contractual minutes per period.

SLA tier

Allowed downtime per year

Allowed per 30-day month

Allowed per week

99%

87.6 hours (3.65 days)

7.2 hours

1.68 hours

99.5%

43.8 hours (1.83 days)

3.6 hours

50 minutes

99.9%

8.76 hours

43.2 minutes

10.1 minutes

99.95%

4.38 hours

21.6 minutes

5 minutes

99.99%

52.6 minutes

4.3 minutes

1 minute

99.999%

5.26 minutes

~26 seconds

~6 seconds

A few things worth noticing.

The jumps are not linear. Each added "nine" is a factor-of-ten reduction in allowed downtime — and an order of magnitude in engineering cost. That's why most vendors converge on 99.9% as the headline tier; it sits at the cost-benefit knee. The jump to 99.99% requires hot-redundant database failover, redundant payment processors, multi-region routing, and 24/7 on-call engineering depth; not every platform offers it at any price.

99.9% is not "almost always up" — it is "down for the equivalent of a working day, every year, with no constraint on when". "0.1% downtime" sounds small; "8.76 hours per year" sounds like a real number; "could be a full peak-Saturday morning" sounds like a budget item.

A useful reframe: at 99.9% uptime, any given operating hour has roughly a 0.1% chance of being down. Across a 12-hour operating day that's a 1.2% chance of some downtime that day. Most days fine; a handful with a small incident; the meaningful question is which days those land on.

What an SLA actually guarantees — and what it doesn't

The percentage on the marketing page is rarely what the contract commits to, and even the contractual number usually carves out more than the operator expects. Things to read carefully in any uptime SLA:

The exclusions. Standard SLA exclusions include planned maintenance windows (often pre-announced, but counted as zero downtime), force-majeure events (cloud-provider regional outages, undersea cable cuts), the customer's own equipment and internet, customer-induced incidents (a bad API integration spamming the system into rate-limiting), and third-party services the platform integrates with (payment gateways, SMS providers, channel-manager partners). The contractual 99.9% is for the platform's directly-managed components, excluding the listed categories. The number a customer actually experiences — sometimes called user-perceived availability — is usually lower.

What counts as "downtime". Definitions vary between vendors. Some define it as "the booking endpoint returns a 5xx error or fails to respond". Others define it as "the platform is unable to accept any new bookings". A platform serving the public booking page fine but where the point-of-sale module is throttled may not count as "down" under definition one, but is absolutely down from the operator's perspective. Read for the words partial outage, degraded service, and feature-level availability.

Throttling, rate-limiting, and queueing. "The platform is up but every booking attempt takes 90 seconds and one in three times out" is technically not downtime under most SLAs. From a customer's perspective on a peak Saturday it is indistinguishable. Ask whether the SLA covers response time or only availability. The major cloud providers' definitions — AWS's (aws.amazon.com/legal/service-level-agreements) and Google Cloud's (cloud.google.com/terms/sla) — distinguish availability from performance and credit them separately; many booking-platform SLAs flatten the two.

The credit structure. What the operator gets when the platform fails its SLA is almost never the actual revenue loss. The standard structure is service credits — a percentage of that month's subscription fee, scaling with the size of the breach. A 99.9% SLA breached down to 99.0% might trigger a 10–25% credit, capped at 50% of the monthly bill. For an operator paying a few hundred dollars a month, the credit is in the order of tens of dollars. The actual cost of the outage on a peak Saturday is several orders of magnitude higher. The SLA credit is a penalty against the vendor, not a remedy for the operator.

Reporting and proof. In many SLAs the burden is on the customer to file an incident report within a short window (often 48–72 hours), with logs and timestamps, to claim a credit. An operator running a peak weekend who only notices a multi-hour partial outage on Monday morning may already be outside the claim window. Ask whether outages are auto-credited based on the vendor's monitoring, or whether the customer has to file.

The look-back window. SLAs measure availability over a defined period — usually monthly, sometimes quarterly, occasionally annual. A one-hour outage in a 30-day month is 0.14% downtime that month (failing 99.9%); the same outage averaged over a year is 0.011% (well inside 99.9%). Customers want monthly look-backs; vendors prefer annual. The longer the window, the easier the SLA is to meet.

Three different uptime numbers a vendor publishes

There are three uptime numbers in play and they mean very different things.

The marketing claim ("99.9% uptime!" on the homepage) is aspirational, often unqualified, and rarely matches the SLA wording. Treat it as a positioning statement, not a contractual commitment.

The forward SLA is the number the vendor commits to in the contract, with the exclusions and credit structure attached. This is what matters for compensation when something goes wrong. It is often lower than the marketing claim — the page says "99.9%" and the contract says "99.5% excluding maintenance windows, customer-end issues, and third-party service interruptions". Ask for the contract language before signing.

Historical uptime reporting is the record of actual measured availability over the last 30/60/90/365 days, ideally on a public status page. The most useful of the three, because it shows what really happened. The cloud-native pattern is status.<vendor>.com with daily uptime, incident history, and post-mortems. A vendor that publishes this and keeps it current is treating you as an adult.

Read all three. They will not agree, and the gap between them is informative.

Architecture: hot, warm, and best-effort

Behind every uptime number sits an architectural choice that determines what the platform actually does when something breaks.

Hot-redundant (active-active). Two or more copies of the platform run in parallel, each serving live traffic, in separate cloud regions. When one fails, the other has already been serving alongside it; traffic shifts in seconds. This makes 99.99%+ SLAs achievable, and it is expensive — roughly double the infrastructure cost.

Warm-standby (active-passive). A primary platform serves live traffic; a secondary in another region is fully provisioned but receiving only replicated data. When the primary fails, the secondary is promoted to active. Failover takes minutes — sometimes tens of minutes for database promotion — during which the platform is partially or fully unavailable. This sits behind most 99.9% SLAs.

Best-effort (single-region). One environment, in one region, with backups but no live secondary. Recovery from a regional outage means restoring from backup — measured in hours, sometimes the better part of a day. The marketing page may still say "99.9%", but the realised number in a year that includes a regional incident will be materially lower.

The question to ask is "what happens if your primary cloud region goes down for four hours?" The answer reveals the architecture. A best-effort vendor answers "the cloud provider's SLA is 99.99%, we've never had this happen". A warm-standby vendor describes failover steps and a recovery window. A hot-redundant vendor says "traffic keeps serving from the other region; the failure is mostly invisible". Three architectures, three different operator outcomes on the day it matters.

A vendor-evaluation checklist

The questions worth asking before signing — or before renewing. Ask each in writing, capture the answer.

What is your contractual uptime SLA, in writing, for the month or quarter? Not the marketing claim — the contract number.
How is "downtime" defined? Booking endpoint only? All customer-facing surfaces? Partial outages? Degraded performance?
What exclusions apply? Planned maintenance, third-party services, force majeure, customer-end issues. Get the full list.
How are response-time and throughput failures treated? Is a slow platform that's technically responding "up" under your SLA?
What is the credit structure when the SLA is breached? Percentage of monthly fees, with what cap?
Are credits auto-applied from your monitoring, or must the customer file a claim? Within what window?
What is the SLA measurement window — monthly, quarterly, annual? Shorter is more honest.
Do you publish a real-time status page? What URL, and how far back does the history go?
Can I see your last 12 months of actual measured uptime? Including incident post-mortems for material outages.
What is your architectural posture — single-region, warm-standby, or hot-redundant? Realistic recovery time from a regional cloud outage?
Do scheduled maintenance windows count as downtime? When are they scheduled, and can they be moved out of my peak windows on request?
Does the SLA apply equally to all customers, or does a higher-tier plan get a stronger one? Some vendors layer SLA tiers; understand which one you're buying.

A vendor that answers these openly is selling a product they're confident in. A vendor that deflects is also telling you something.

Matching the tier to your operation

The right SLA tier follows from your own numbers, not a generic recommendation. The arithmetic to run, paste-able onto the back of a manifest:

Two notes on running it.

Use your real peak distribution. An operator whose revenue is 60% concentrated in 12 peak weekends has very different exposure than one whose revenue is flat across the year, even at the same SLA tier and the same total revenue.

Run it for two scenarios. Once assuming outages fall randomly across operating hours (the optimistic case), once assuming they cluster on high-traffic days because that's when load is highest and edge cases get triggered (the more realistic case). The gap between the two is the concentration premium — the extra exposure you carry because of when outages happen, not just how often.

What financial credits actually compensate

The most-misread part of any SLA. The credit you receive when a vendor breaches the uptime promise is a refund of subscription fees, not a recovery of revenue.

If a peak-Saturday outage costs an operator AUD $25,000 in forgone bookings, support hours, channel damage, and trust erosion (the kind of figure the True Cost of Downtime worksheet produces for a full-day outage on a mid-market operator's peak day), the SLA credit on a $400/month subscription might be $100. The financial recovery is 0.4% of the loss.

This is not an indictment — vendors cannot reasonably underwrite their customers' revenue, and any vendor that promised to would be selling a different product at a very different price. The SLA exists to align vendor incentives with availability and to give the operator standing to terminate if the vendor breaches systematically. It does not exist to make the operator whole on the day.

Two implications. First, the right architectural choice prevents the outage from costing the revenue in the first place — offline-capable point-of-sale, redundant payment processors, read-replica availability, channel-manager fallback policies. None are SLA terms; they're platform capabilities. Second, the SLA negotiation matters less than the platform-capability evaluation. A platform with a 99.99% SLA but no offline POS leaves the wharf staff helpless when their tablet can't reach the server; a platform with a 99.9% SLA and a properly-designed offline POS keeps selling tickets through most outage types. The architecture does more work than the contract.

Where SLAs are heading

A few patterns worth watching, already visible in cloud-native software and slowly arriving in booking platforms.

Multi-region, region-specific commitments. SLAs expressed per region or per geographic group, with hot-redundant cross-region commitments at higher tiers. An operator can ask which region their data is in and what happens during a regional cloud outage.

Status-page-as-norm. Public real-time status pages with detailed incident post-mortems are becoming a baseline expectation. Booking platforms that don't publish one are increasingly visible by absence.

Component-level availability. Per-component SLAs — separate targets for the booking engine, the point-of-sale, the payment processor, the channel-manager feed, the reporting layer. More honest than a single overall percentage.

SLA-as-product-tier. Some vendors layer SLAs into pricing tiers — 99.9% standard, 99.95% business, 99.99% enterprise — rather than one number for the whole customer base. Honest pricing of the engineering cost difference, and it lets the operator buy the tier their exposure justifies.

Where this leaves operators

Three takeaways.

First, the marketing percentage is not the contract number, and neither is the number you'll actually experience. Read all three before signing. If the vendor won't share contractual SLA wording or historical uptime, that is itself an answer.

Second, the right SLA tier follows from your own peak-window exposure, not from a generic recommendation. A small operator with revenue spread evenly across the year may rationally accept a 99% or 99.5% platform; a mid-market operator with 60% of revenue concentrated into a dozen peak weekends needs something materially stronger, and the arithmetic justifies paying for it.

Third, the SLA credit is not the remedy — the architecture is. Offline-capable point-of-sale, channel-manager fallback policies, multi-region failover, and audit-grade transaction logs that survive an outage do the work the SLA cannot. The operator's real question is not "what's your uptime number" but "what does my Saturday morning look like when something fails". Ask the second question; the answers are more useful.

If you'd like to compare notes on the platform-capability questions worth asking — not just the SLA-percentage one — the JetSetGo team is happy to talk.

Open question for operators

Have you ever pulled out a vendor's actual SLA wording and walked through it line by line — exclusions, definitions, credit structure, look-back window? For operators who have done this and found a gap between marketing claim and contractual commitment, the specifics would make for a useful conversation. The homepage percentage is a positioning statement; the contract is the product.