/ methodology

The hidden P&L of multi-model consumption.

Problem · Value erosion

Unoptimized consumption is silently eroding your AI budget.

Frontier-model spend is sprawling across 3–7 providers. Without a fiscal operating policy, gateway routing rules default to the most expensive path, retries blow past budgets, and finance has zero visibility until the invoice arrives.

Token leak across redundant retrieval & retry loops
Cross-border egress + provider arbitrage left on the table
Premium-tier model usage for commodity workloads
No unit-economics tie-back to product or customer

Solution · Policy engine

A financial operating policy that runs at the gateway.

We codify your unit economics into your gateway — Kong, Portkey, Bifrost, LiteLLM, Shakudo — so every token is routed to the model, region, and tier with the best dollar-per-quality outcome. Auditable, reversible, finance-grade.

Dynamic routing tied to live unit-economics signals
Cost-per-customer attribution to your data warehouse
Cross-region failover that respects egress economics
Quarterly-reviewed governance with finance + platform

/ engagement model

Two offerings. One required entry point.

Every engagement begins with the Token Economics Audit. It is the mandatory diagnostic that establishes your TEI baseline — required whether or not you continue into the quarterly retainer. The Token Fiscal Governance Suite is the optional continuation: a quarterly retained engagement for organizations that want sustained governance after the audit completes.

Offering 01 · Required entry point

Mandatory

Token Economics Audit

$14,500fixed · two-week diagnostic

A two-week diagnostic that baselines your TEI, exposes architectural leaks, validates sovereign compliance, and delivers production-ready gateway architecture recommendations. Required for every engagement — no client advances to the quarterly retainer without it.

TEI baseline + score
Architectural leak forensics
Gateway architecture recommendations

Offering 02 · Optional continuation

Post-audit

Token Fiscal Governance Suite

Quarterlyretained · by application

The optional continuation. A quarterly retained engagement with three integrated workstreams — audit refresh, production telemetry review, and dedicated financial engineering advisory — governed by your finance and platform leaders.

Token Economics & Topology Audit
Production Telemetry Review
Dedicated Financial Engineering Advisory

Begin with the Audit

Inside the Governance Suite · three workstreams

Quarterly · deliverable

Token Economics & Topology Audit

Cross-border optimization study with token-leak identification across every gateway, model, and modality in your stack.

Routing topology map
Provider arbitrage windows
Leak & retry forensics

Continuous · monitoring

Production Telemetry Review

We instrument your gateways and continuously optimize routing rules against your fiscal policy as model pricing and quality move.

Cost-per-customer dashboards
Routing-rule pull requests
Anomaly & burn-rate alerting

Steering · committee

Dedicated Financial Engineering Advisory

A quarterly steering committee with your CFO, CTO, and our principal advisors to align AI unit economics with the P&L.

Quarterly business review
Capital-allocation framework
Direct Slack + on-demand access

Proprietary metric

The Dynamic Token Efficiency Index (TEI).

The global standard for AI unit economics — a single auditable score that benchmarks your dollar-per-quality token against the market and your own historical baseline.

Sample score

87.4

TEI · Q2 2026

Architected for the modern AI stack

Kong

Shakudo

Portkey

Bifrost

LiteLLM

Across frontier and private models

Provider-agnostic governance across closed-weight frontier APIs and self-hosted open-weight deployments — one fiscal policy, every model.

Anthropic

Hugging Face

OpenAI

Google

Mistral

/ sextant · proprietary

Sextant. Your token economics, precisely configured.

Once your Token Efficiency Index is baselined, the next problem is translation — turning audit findings into a working gateway configuration. Sextant is Compass Advisors' proprietary configuration intelligence engine: it ingests your TEI score, workload profile, and governance requirements, and produces a gateway-ready fiscal policy across any major AI gateway — Kong, Portkey, Bifrost, LiteLLM, Shakudo, and others — with routing rules, budget hierarchies, rate limits, and caching parameters codified and ready to deploy. No manual interpretation. No spreadsheet handoffs. A production-ready configuration in minutes, not weeks.

Brass sextant instrument overlaid with fiscal routing topology and ledger data — a visual metaphor for Sextant's configuration intelligence engine. — Fig. 01 — Configuration intelligenceTEI → Gateway

Gateway-agnostic

Works across Kong, Portkey, Bifrost, LiteLLM, Shakudo, and any OpenAI-compatible gateway — one fiscal policy, any infrastructure.

Proprietary methodology

Built on the same TEI framework powering $1.2B in AI spend under governance. Configuration recommendations are auditable, versioned, and finance-grade.

From audit to production

Sextant closes the loop between the Token Economics Audit and live gateway deployment — the missing layer between diagnosis and execution.

Request Sextant Access

/ token economics calculator

Model the real cost of a token —
API providers vs. your own silicon.

Most TCO models miss the unsexy 40%: facility PUE, fabric capex, KV-cache HBM headroom, on-call SREs, depreciation, WACC, and gateway routing efficiency. Model your workload below.

01

Workload

users × per-user activity

Application preset

Total usersseats / active users

Requests / user · day

Input tokens / req

Output tokens / req

Operating days / yr

50 users × 20 req/day = 1,000 req/day│1.13B tokens/yr

02

API provider

public pricing

Other modality

03

On-prem model

open-weights frontier class

Other modality

Recommended silicon

NVIDIA B300 288GB · 8× per replica

Weights ≈ 355GB · KV/activation headroom 80GB → 435GB HBM

Override

04

CapEx & silicon depreciation

time value of money

GPU countcluster size

Fabric capex / GPUNIC + switch + optics

$

Depreciation24–36 for AI silicon

months

WACCcost of capital

%

05

Facilities & cooling

PUE, kWh, sqft

PUE1.1 hyperscale → 1.6 legacy

Power cost

$/kWh

Peak tarifftime-of-use blend

×

Cooling capexliquid loop / RDHx

$/kW

Real estate

$/sqft·yr

Sqft per GPUrack + aisle share

06

OpEx & ecosystem

people, software, risk

SRE / platform FTE

Loaded salary

$/yr

SW + orchestrationK8s, NIM, observability

$/GPU·yr

Network + transit

$/mo

Property tax

% capex

Insurance

% capex

07

Workload dynamics

the idle penalty

Cluster utilizationoff-peak still burns

%

Routing efficiencyNAI gateway / vLLM

%

Effective throughput ≈ 744 tok·s⁻¹ / GPU│726.5B output tok/yr cluster-wide

Annual TCO comparison

On-prem wins · Δ $4,796

Anthropic API

$6,023

$5.32 / 1M tokens

On-prem (allocated)

$1,226

$5.60 / 1M tokens (fully loaded)

API$6,023

On-prem$1,226

On-prem cost composition

Total $4,067,607 / yr

Silicon depreciation

56.6%

$2,300,416

Cost of capital (WACC)

6.7%

$273,174

Power + cooling

3.3%

$132,875

Real estate

1.8%

$72,000

Ops engineering

21.6%

$880,000

Software + orchestration

5.5%

$224,000

Network + transit

2.4%

$96,000

Taxes + insurance

2.2%

$89,141

Total capex outlay

$5,751,040

compute $4,992,000 · fabric $544,000 · cooling $215,040

IT power draw

89.6 kW

1,060 MWh/yr @ PUE 1.35

Footprint

160 sqft

$72,000 / yr

Capacity utilized

0.0%

of 726.5B tok/yr

Break-even volume

764.22B tok/yr

at chosen API price

5-yr cumulative Δ

$23,982

savings on-prem

Unlock Your Full Token Efficiency Index (TEI) Report

/ engage

Every engagement begins here.

The Token Economics Audit is a fixed-fee $14,500, two-week diagnostic — and the required first step for every client. We baseline your TEI, expose architectural leaks, validate sovereign compliance, and deliver production-ready gateway architecture recommendations. The quarterly retainer is available afterward by application only.

Two-week diagnostic · response in 48h

Compass Advisors:Navigating the Financial Architecture of AI.