Documentation Index
Fetch the complete documentation index at: https://hc.pillargtm.com/llms.txt
Use this file to discover all available pages before exploring further.
Every behavior PILLAR commits to is a named, enforced, append-only invariant, and we can show you the exact test that proves each one.Most AI-native tools ask you to take the math on faith. PILLAR doesn’t. We publish our correctness spec, enforce it in CI, and show you the tests. This article walks through the framework so you can verify what we claim — and what we don’t.
TL;DR for non-technical readers. PILLAR makes a structural promise: every behavior we commit to is enforced by an automated test that runs on every code change. If any test fails, the build fails — no exceptions, no warning-downgrades, no “we’ll fix it later.” As of today: 105+ named correctness invariants across 18 categories, 2,800+ tests, all green on the latest commit. What this means for you in plain English:
- Procurement sign-off in days, not quarters. Hand this page to your CTO or CISO; they read enforced behaviors, not marketing copy. Every claim on this page has a citable test.
- Integration breakage caught automatically. When PILLAR ships a new connector or refactors a scoring formula, the existing invariants either keep holding or the build turns red — no silent regressions slip into production.
- Falsifiable correctness claim. Most “trust us, our AI works” claims aren’t testable. Every “PILLAR is correct because…” statement on this page is backed by a specific named test that runs on every release. We name them, we cite them, and we publish what they enforce.
Why this exists
In early 2026, a customer’s renewal risk score rendered a number on the dashboard that was obviously wrong to any human reading it. The root cause was two bugs in how scoring functions passed values between each other — the kind of orchestration-layer failure that unit tests don’t catch because each formula looks correct in isolation. We built The Guarantee to close that class of bug. Every invariant you’ll see below was added because a past failure slipped through a test boundary it should have hit. The framework is append-only: once a behavior is promised, it stays promised.The chain: Spec → Guarantee → Test
1. The Spec — what PILLAR commits to
Every behavior PILLAR commits to is a numbered entry in the public spec. Seven domain files cover scoring correctness, signal intelligence, multi-tenant isolation, integration fidelity, per-org configuration, UI journeys, and API contracts. Examples of concrete commitments:SPEC-SCORING-04— A pipeline run that changesrenewal_riskflows the fresh value intoaccount_priorityvia therisk_urgencyweight in the same invocation. No stale reads between scoring models.SPEC-TENANCY-03— Every public API route handler that queries an org-scoped table filters byorg_id, or is explicitly allowlisted as cross-org (admin, cron, webhook).SPEC-VALIDATION-01— Every API route that reads a JSON body validates input via Zod withparseBody— no exceptions, enforced across all 92 body-accepting routes.SPEC-INGESTION-07— When a non-terminated contracts row exists for an account, the scoring pipeline usescontracts.end_dateas the canonical renewal anchor, falling back torenewals.renewal_datefor tenants without contracts enumerated.
status: required (must be enforced), aspirational (intent tracked for future enforcement), or retired (explicitly deprecated, with a pointer to what replaced it). IDs are append-only — spec numbers never get reused.
Current count: 108 entries spanning ten domains (scoring, vertical intelligence, configuration, ingestion, UI, tenancy, contract, signals, validation, ops). The Vertical Intelligence domain (SPEC-VERTICAL-*) is the fastest-growing surface at 37 entries, reflecting the multi-layer reconciliation work that closes the gap between “PILLAR canonicalizes all 50 states + DC + 26 federal datasets” and “PILLAR’s per-district numbers reconcile to the state DOE’s own published values” — and the new Round 8 federal-data canonical-shape layer that closes “is the federal-dataset row right?“
2. The Guarantee — how each commitment is enforced
Every required spec entry has at least one named Guarantee that enforces it in code. IDs followG-<category>-<NN>, where the category letter maps to a class of correctness failure:
| Code | Category | What it enforces |
|---|---|---|
| F | Freshness | Cross-model staleness — when score A feeds score B, B sees the current run’s A |
| M | Monotonicity | Directional correctness — worse input raises risk, better raises priority |
| B | Bounds | Scores ∈ [0, 100]. No NaN. No Infinity. |
| D | Decomposition | Weighted contributions sum to the final score within ±1 |
| S | Signals | Every signal traces back to a triggered rule; no ghost signals |
| R | Rules | Rule catalog well-formedness (unique IDs, valid score_models) |
| C | Calibration | Weights sum to 1.0 per formula |
| T | Tenancy | Multi-tenant isolation at every layer (routes, helpers, POST bodies) |
| I | Ingestion | CRM connector + field-mapping fidelity |
| P | Performance | Per-call latency budgets under load |
| W | forecast-Weights | Per-org forecast category weight resolution |
| O | Org-config | Per-org business configuration resolution |
| A | Audit-shape | scoring_audit persisted-row contract |
| H | Hermeticity | CI independence from external state |
| V | Validation | API-boundary input validation via Zod |
| U | UI | User-facing journey correctness (Playwright) |
| E | Endpoint | API input → output contract invariants |
| X | Vertical Intelligence | State DOE canonicalization, federal Title program flows, NAEP cross-validation, accreditation cycles, state procurement calendars — the external-knowledge surfaces that horizontal Revenue AI platforms structurally cannot answer |
3. The Tests — evidence the invariants hold
Every Guarantee has at least one automated test whose description starts with the Guarantee ID, so coverage is directly attributable. Four strategies:- Fixed-example tests — hand-picked inputs exercising known-tricky cases.
- Property-based tests — fast-check generates 100s to 1000s of random inputs per assertion; any failure shrinks to a minimal counterexample.
- Endpoint-contract tests — lock the shape of every API response plus invariants about the output data (e.g. decomposition sums to score within ±1).
- UI tests — Playwright exercises user-facing pages end-to-end.
What makes the claim falsifiable
Anyone can write tests. The structural claim that makes “every behavior is enforced” provable is the build-level check of the chain itself:- Add a spec entry without a Guarantee → build fails (
spec-check.test.tscross-references specid↔ Guaranteespec_ref). - Add a Guarantee without a matching test → build fails (
index.test.tscross-references registry entries ↔it("<ID>: ...")descriptions). - Orphan test referencing a Guarantee that doesn’t exist → build fails (same cross-check, in the other direction).
- Retire a Guarantee without naming its replacement → build fails.
What this gets you, the customer
- Procurement sign-off in days, not quarters. Hand your CTO or CISO the spec. They read 61 behaviors, not 61 pages of marketing.
- Confidence at the seams. Every time PILLAR ships an integration (Salesforce, HubSpot, Gong, Pendo, Slack), a new Ingestion Guarantee (
G-I-*) locks the field-mapping + override fidelity. Breaking it in a future refactor fails the build. - Zero “vibe-coded” surprises. Input at every API boundary goes through Zod schema validation, enforced by the
G-V-01/G-V-02invariants across 92 route files. - A changelog that doesn’t lie. Every scoring-model update bumps
MODEL_VERSIONand updates the golden-fixture snapshot, tracked byG-D-02. The Changelog references affected SPEC / Guarantee IDs for each release. - Silent breakage caught automatically. The integration-health canary (
G-I-10/G-I-11) runs every 15 minutes and pages the on-call when a tenant’s OAuth token expires, a sync stalls, or a connection flips toerror. No more “the dashboard is wrong and nobody noticed until the customer asked.”
Categories in depth
Freshness (F)
When the scoring pipeline runs, one model’s output often feeds another —renewal_risk flows into account_priority via the risk_urgency weight, and pipeline_hygiene flows into forecast_confidence. The freshness Guarantees (G-F-01 through G-F-05) prove the downstream model sees the current run’s upstream value, not a stale cache. This closes the class of bug where a customer saw the right renewal risk on the account page but the wrong priority ranking in triage — because the priority calculation had read last night’s risk score, not this morning’s.
Tenancy (T)
Six Guarantees (G-T-01 through G-T-06) enforce multi-tenant isolation at three layers: pure scoring functions (the compute is scoped to a context object, never a global pool), route handlers (every query filters by org_id), and downstream workflow helpers (every helper re-applies the org filter, because a task’s source_id could reference a resource in another org). Expanded after a 2026-04-08 pen-test-style review found 11 gaps in task-completion helpers and 4 gaps in the plays POST body handler.
Ingestion (I)
Eleven Guarantees (G-I-01 through G-I-11) lock the fidelity of data flowing from your CRM + tooling into PILLAR’s scoring pipeline. Direct field mappings copy verbatim. Picklist translations fall back gracefully when a value is missing. Contracts-object renewal anchors override legacy renewal-date fields. Usage snapshots drive renewal_risk via a NEUTRAL baseline on insufficient data (missing usage never downgrades; only observed decline does). The integration-health canary detects expired OAuth tokens and stalled scoring pipelines 15 minutes after they start, not days later when a customer notices.
Validation (V)
Every API route that reads a JSON body orsearchParams runs input through a Zod schema before touching it. The G-V-01 and G-V-02 Guarantees enforce this across 92 body-accepting routes and all query-parsing routes, with a shrinking exemption allowlist tracked in the test file — adding a new unvalidated route requires explicit review.
Hermeticity (H)
G-H-01 ensures the integration-tests CI job runs against a local Supabase CLI stack, not a cloud Supabase credential. Eliminates an entire class of paste-corruption failures where a rotated credential would silently break every PR check. CI must be runnable from a clean clone without access to production secrets.
UI (U)
Three Playwright-backed Guarantees (G-U-01 through G-U-03) verify the public-surface integrity layer: the /login page responds non-5xx and renders an interactive email input, and /api/architects/unsubscribe?t=<invalid> returns a branded 200 HTML page rather than leaking a stack trace. Data-aware UI invariants (account detail scoring, triage ordering, signal feed tenancy) are tracked as aspirational SPEC entries and will land with the hermetic Playwright fixture seed.
Vertical Intelligence (X) — the canonicalization layer
The largest category at 40 Guarantees (G-X-01 through G-X-40), covering external-knowledge surfaces that horizontal Revenue AI platforms structurally cannot answer: state DOE assessment + accountability + graduation data across all 50 states + DC, 26 federal datasets (8 IPEDS components + 8 Higher Ed sources + 10 K-12 sources), federal Title program allocations, NAEP cross-validation, accreditation review cycles, state procurement calendars, and cooperative-contract eligibility. Runtime-truth status (May 2026): 51 jurisdictions covered (50 states + DC + federal); 26 federal datasets ingested with 890,000+ canonical rows; 47 state-funding adapters live; 129 MCP tools across 14 categories — 63 in vertical_intelligence (all live and queryable). Per-district coverage is at 51/51 jurisdictions for assessment proficiency (5.03M cells across ~19,700 LEAs), cohort graduation (391k cells), accountability status (24k cells), and engagement/chronic absenteeism (104k cells). K-12 state funding allocations: 46 of 51 jurisdictions (90.2%), 114,699 per-LEA rows, 7.94B captured (IPEDS SFA + 11 state-specific programs). Per-state DOE deep ingest for 27+ states at recent-year grain plus federal EDFacts SY 2020-21 backfill closes the long tail. 550 Guarantee tests pass on every commit; the schema, ingest pipeline, MCP wrappers, and canonical-shape validators are all enforced — every commit blocks merge unless every row landing in the 26 federal-data + 47 state-funding tables passes itsG-X-31 through G-X-40 validator.
The headline claim:
PILLAR canonicalizes 51 jurisdictions (50 state DOEs + DC + federal) with a documented policy footprint, a structural honesty layer, and sixteen independent layers of accuracy verification: Round 1-5 reconciliation layer (closes “is the state-DOE proficiency number right?”)Why this matters for buyers: state DOEs each express proficiency on a different scale, suppression with different sentinels, accountability in 4-tier vs 5-tier vs A-F, with subgroup labels that vary across all 50 states. Each state DOE essentially publishes data that’s only legible inside its own bureaucracy. PILLAR’s canonicalization now spans 51 jurisdictions — taking Tennessee’s “Approached/Met/Exceeded” and Louisiana’s “Mastery and above” and Wisconsin’s “Advanced+Meeting” and forcing them all into one comparableRound 8 federal-data canonical-shape layer (closes “is the federal-dataset row right?” — 10 NEW Guarantees)
- Macro-level reconciliation against state-published statewide aggregates with 24-state coverage (
G-X-25)- Micro-level spot-checks against 17 hand-validated district fixtures across 13 states including the load-bearing LDOE R36→036 alias (
G-X-26)- External NAEP trend-direction cross-validation for the 11 Tier-1 states with a live MCP route at
/api/vertical/state-naep-comparison(G-X-27)- Silent-corruption canary on every ingested cell with queue-backed weekly review via the
value_unknown_alarmstable (G-X-28)- Federal Title pass-through reconciliation between EDFacts allocations and SEA-published disbursements (
G-X-29)- Per-district Title allocation spot-checks closing the loop on “we know proficiency AND federal allocation are right for the same district” (
G-X-30)
- IPEDS-extension shape discipline for Human Resources / Admissions / Academic Year Tuition / Academic Libraries / Enrollment by CIP — including biennial-even-year discipline on EF-CIP and pre-2014 collection_status discipline on Academic Libraries (
G-X-31)- OPEID padding integrity on the institution_crosswalk join — the only authorized path between UNITID-keyed (IPEDS, Scorecard, Carnegie) and OPEID-keyed (FSA CDR/GE/NSLDS/HCM, NC-SARA) datasets (
G-X-32)- College Scorecard shape validators on institution-level + field-of-study tables (
G-X-33)- Carnegie 2025 four-dimension derivation discipline —
is_r1/is_r2MUST be derivable fromresearch_activity_designation(G-X-34)- FSA regulatory-status discipline preventing accidental publish-rate inference during the 2019-2023 GE rescission gap; CDR + HCM enums locked (
G-X-35)- SHEEO SHEF + NC-SARA state-level shape with USPS-keyed JSONB integrity (
G-X-36)- CRDC biennial discipline — collection year MUST be even; suspensions ≤ 2× total enrollment sanity check (
G-X-37)- CCD School Universe + EDGE locale enum — title_i_status, charter_status, magnet_status, virtual_indicator, locale_code all locked (
G-X-38)- OSEP IDEA Part B + K-12 federal program state-aggregate shape (
G-X-39)- NCES EDGE entity-type-conditional ID-length + NIEER 0-10 quality benchmark hard cap (
G-X-40)
pct_proficient_or_above column with a documented policy footprint. The sixteen verification layers above mean the resulting unified surface isn’t just “structurally faithful” — it’s been independently checked against state-published aggregates, hand-validated district fixtures, federal NAEP, federal EDFacts allocation tables, SEA pass-through reports, AND every Round 8 federal-dataset row passes a canonical-shape validator before it can land in the table.
What this is NOT: numerical perfection at the per-district per-cell level for every single one of the ~13,000 US LEAs. The Round 1 backfill seed includes 24 states with reconciliation aggregates and 17 districts with locked spot-check cells (out of ~91 districts × 27 states = ~2,500 possible spot-checks); coverage expands per the published backfill runbooks (docs/RECONCILIATION_BACKFILL_RUNBOOK.md, docs/SPOT_CHECK_BACKFILL_RUNBOOK.md). Build-time Guarantees enforce the lookup-table shape + helper behavior; runtime production crons (separate from the build-time gate) compare freshly-ingested cells against the locked fixtures and alert on drift.
Honesty contract baked into the response shape:
STANDARDS_CROSS_FAMILY_NOTE(G-X-15) — every cross-family comparison carries a “cut-scores differ; not directly comparable” caveatcontinuity_break: true(G-X-14) — flagged on year-over-year transitions where state assessment family changed (PARCC→MCAP, FSA→FAST, AIR→Cambium)CCMR_COMPOSITE_NOTE(G-X-19) — warns against cross-state composite comparisonGROWTH_VS_LEVEL_NOTE(G-X-20) — prevents conflating growth with absolute proficiencynaep_disagrees: true(G-X-27) — surfaces when state cut-score recalibrations diverge from NAEP
Related reading
- Scoring Overview — how the five scoring formulas connect to the Decomposition (D) + Calibration (C) Guarantees.
- Signal Overview — how the eight signal families connect to the Signals (S) + Rules (R) Guarantees.
- Data Readiness — the enforcement layer that checks CRM data quality before scoring runs, covered by the Ingestion (I) Guarantees.
- Changelog — every release references the SPEC and Guarantee IDs it affected.