AI Compute Cost & Sovereignty Calculator

Headline comparison

Cheapest option is highlighted. "Cheapest" depends on which constraints actually matter for you — check the badge matrix below.

Workload sizing

Start from your use case. The tool sizes the GPU fleet from your peak traffic, then prices each architecture.

Inference workload

Training workload (optional, additive)

Derived workload

Per-archetype detail

Sovereignty & risk matrix

Badges are derived deterministically from your inputs and policy checkboxes — see the methodology section for the rules.

Time-horizon sensitivity

TCO at 3, 5, 7, and 10 years. The crossover point — where local beats cloud, or hybrid beats both — often sits between years 3 and 7.

Top sensitivities

Which 3 inputs, perturbed ±25%, most change the recommended archetype. These are the assumptions that actually matter for your decision.

Policy & sovereignty constraints

These don't change cost numbers. They drive the badges and the narrative.

Other policy constraints (free text, surfaced in export)

Edit assumptions (hardware, cloud pricing, datacenter, financial)

Methodology & formulas

Inference compute

peak_tokens_per_sec = (users × queries_per_day × output_tokens × peak_factor) / (86400 × util_fraction)
flops_per_token      = 2 × model_params × (1 − quant_savings) × 1.15   // 15% attention overhead
peak_flops_required  = peak_tokens_per_sec × flops_per_token
gpus_inference       = ⌈ peak_flops_required / (gpu_peak_flops × achieved_utilization) ⌉

Quantization savings on compute time: FP16=0, FP8=0.4, INT8=0.5, INT4=0.7.

Training compute

flops_per_run        = 6 × model_params × tokens_per_run
total_flops_yr       = flops_per_run × runs_per_year × experimental_multiplier
gpus_training        = ⌈ total_flops_yr / (gpu_peak_flops × training_util × 3600 × 24 × 365 × target_completion_fraction) ⌉

Default target completion: 30 days wall-clock per run; training utilization: 40%.

Aggregate

total_gpus = max(gpus_inference, gpus_training)   // assumes shared hardware

Local-only TCO

hardware_capex   = total_gpus × gpu_unit_cost × (1 + ⌊horizon / refresh_years⌋)
datacenter_capex = total_gpus × gpu_tdp × PUE × $/W_buildout
power_opex/yr    = (total_gpus × gpu_tdp × PUE × 8760 × util) / 1000 × $/kWh
staff_opex/yr    = (total_gpus × gpu_tdp × PUE / 1_000_000) × $/MW/yr
TCO              = capex + opex × horizon

All-cloud TCO

blended_rate = onDemand × f_od + reserved × f_res + spot × f_spot
gpu_hrs/yr   = total_gpus × 8760 × utilization
compute_opex = gpu_hrs × blended_rate
egress       = (queries × (in+out_tokens) × 4 bytes × 365 × 0.3) / 1e9 × $/GB

Hybrid TCO

local_gpus  = ⌈ total_gpus × sensitive_fraction ⌉
cloud_gpus  = total_gpus − local_gpus
TCO         = local_only_cost(local_gpus) + cloud_cost(cloud_gpus, ×0.5 egress)

Edge + central TCO

per_edge_gpus = ⌈ inference_gpus / edge_sites ⌉      // rounding overhead
edge_capex    = per_edge_gpus × edge_sites × gpu_cost
edge_dc       = edge_power × $/W × 1.3                // edge buildout premium
central_*     = training_gpus × normal local economics
+ inter-site connectivity (network × edge_sites × 0.5)

NPV

NPV = capex + Σ(opex_yr / (1 + r)^year) for year = 1..horizon

Badge derivation

Rules live in BADGE_RULES in the script block. They are deterministic functions of inputs & policy flags — no hand-assigned values. View source to audit.

Sensitivity analysis

Each of ~10 candidate inputs is perturbed ±25%. The 3 with the largest swing in delta(cheapest TCO − next-cheapest TCO) are reported, since archetype ranking matters more than absolute TCO.