Headline comparison
Cheapest option is highlighted. "Cheapest" depends on which constraints actually matter for you — check the badge matrix below.
Workload sizing
Start from your use case. The tool sizes the GPU fleet from your peak traffic, then prices each architecture.
Inference workload
Training workload (optional, additive)
Derived workload
Per-archetype detail
Sovereignty & risk matrix
Badges are derived deterministically from your inputs and policy checkboxes — see the methodology section for the rules.
Time-horizon sensitivity
TCO at 3, 5, 7, and 10 years. The crossover point — where local beats cloud, or hybrid beats both — often sits between years 3 and 7.
Top sensitivities
Which 3 inputs, perturbed ±25%, most change the recommended archetype. These are the assumptions that actually matter for your decision.
Policy & sovereignty constraints
These don't change cost numbers. They drive the badges and the narrative.
Edit assumptions (hardware, cloud pricing, datacenter, financial)
Methodology & formulas
Inference compute
peak_tokens_per_sec = (users × queries_per_day × output_tokens × peak_factor) / (86400 × util_fraction) flops_per_token = 2 × model_params × (1 − quant_savings) × 1.15 // 15% attention overhead peak_flops_required = peak_tokens_per_sec × flops_per_token gpus_inference = ⌈ peak_flops_required / (gpu_peak_flops × achieved_utilization) ⌉
Quantization savings on compute time: FP16=0, FP8=0.4, INT8=0.5, INT4=0.7.
Training compute
flops_per_run = 6 × model_params × tokens_per_run total_flops_yr = flops_per_run × runs_per_year × experimental_multiplier gpus_training = ⌈ total_flops_yr / (gpu_peak_flops × training_util × 3600 × 24 × 365 × target_completion_fraction) ⌉
Default target completion: 30 days wall-clock per run; training utilization: 40%.
Aggregate
total_gpus = max(gpus_inference, gpus_training) // assumes shared hardware
Local-only TCO
hardware_capex = total_gpus × gpu_unit_cost × (1 + ⌊horizon / refresh_years⌋) datacenter_capex = total_gpus × gpu_tdp × PUE × $/W_buildout power_opex/yr = (total_gpus × gpu_tdp × PUE × 8760 × util) / 1000 × $/kWh staff_opex/yr = (total_gpus × gpu_tdp × PUE / 1_000_000) × $/MW/yr TCO = capex + opex × horizon
All-cloud TCO
blended_rate = onDemand × f_od + reserved × f_res + spot × f_spot gpu_hrs/yr = total_gpus × 8760 × utilization compute_opex = gpu_hrs × blended_rate egress = (queries × (in+out_tokens) × 4 bytes × 365 × 0.3) / 1e9 × $/GB
Hybrid TCO
local_gpus = ⌈ total_gpus × sensitive_fraction ⌉ cloud_gpus = total_gpus − local_gpus TCO = local_only_cost(local_gpus) + cloud_cost(cloud_gpus, ×0.5 egress)
Edge + central TCO
per_edge_gpus = ⌈ inference_gpus / edge_sites ⌉ // rounding overhead edge_capex = per_edge_gpus × edge_sites × gpu_cost edge_dc = edge_power × $/W × 1.3 // edge buildout premium central_* = training_gpus × normal local economics + inter-site connectivity (network × edge_sites × 0.5)
NPV
NPV = capex + Σ(opex_yr / (1 + r)^year) for year = 1..horizon
Badge derivation
Rules live in BADGE_RULES in the script block. They are deterministic functions of inputs & policy flags — no hand-assigned values. View source to audit.
Sensitivity analysis
Each of ~10 candidate inputs is perturbed ±25%. The 3 with the largest swing in delta(cheapest TCO − next-cheapest TCO) are reported, since archetype ranking matters more than absolute TCO.