Skip to content

Comments

Add results.json schema validation and source tracking#237

Draft
PavelMakarchuk wants to merge 2 commits intoPolicyEngine:mainfrom
PavelMakarchuk:content-pipeline-results
Draft

Add results.json schema validation and source tracking#237
PavelMakarchuk wants to merge 2 commits intoPolicyEngine:mainfrom
PavelMakarchuk:content-pipeline-results

Conversation

@PavelMakarchuk
Copy link
Collaborator

Summary

Adds a policyengine.results module with two small pieces that support the blog post content pipeline:

  • Schema validation (schema.py): Pydantic models that validate results.json at generation time
  • Source tracking (tracking.py): Helper that auto-captures line numbers for traceability

What it does

Schema validation

from policyengine.results import ResultsJson, ResultsMetadata, ValueEntry

results = ResultsJson(
    metadata=ResultsMetadata(title="SALT Cap Repeal", repo="PolicyEngine/analyses"),
    values={
        "budget_impact": ValueEntry(
            value=-15.2e9,
            display="$15.2 billion",
            source_line=47,
            source_url="https://github.com/.../analysis.py#L47",
        ),
    },
)
results.write("results.json")

Catches errors at generation time:

  • Missing source_line or source_url on any value
  • Table rows with wrong column count
  • Chart alt text that's too short (< 20 chars)
  • Missing required metadata fields

Source tracking

from policyengine.results import tracked_value

budget = reform_revenue - baseline_revenue
results["values"]["budget_impact"] = tracked_value(
    value=budget,
    display=f"${abs(budget)/1e9:.1f} billion",
    repo="PolicyEngine/analyses",
)
# Automatically captures source_line and builds source_url

Why

Every number in a PolicyEngine blog post should link to the exact line of code that produced it. This module ensures the results.json contract is valid before it reaches the resolve-posts build step.

Test plan

  • 11 unit tests passing (schema validation, source tracking, edge cases)
  • CI passes

🤖 Generated with Claude Code

PavelMakarchuk and others added 2 commits February 23, 2026 19:46
New `policyengine.results` module with two pieces:

- `schema.py`: Pydantic models (ResultsJson, ValueEntry, TableEntry,
  ChartEntry) that validate results.json at generation time. Catches
  missing source_line/source_url, row/column mismatches in tables, and
  vague alt text on charts before they reach the blog build step.

- `tracking.py`: `tracked_value()` helper that captures the caller's
  line number via `inspect` and builds the source_url automatically.
  Eliminates repetitive inspect.currentframe() boilerplate in analysis
  scripts.

These support the blog post content pipeline where every number in a
published post links back to the exact line of analysis code that
produced it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use model_dump(mode="json") instead of json.loads(model_dump_json())
  to avoid unnecessary serialize→parse→serialize round-trip
- Create parent directories automatically so callers don't need
  to mkdir first
- Add trailing newline to output file
- Add test for nested directory creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant