Score A Completed Run
python3 scripts/score_run.py --input runs/example-run.json --output runs/example-run.scored.json
The scorer accepts the canonical top-level results list, compatibility keys such as runs or answers, and bare lists.
Read The Output
score_answer is auto-filled from the final-answer matcher. Manual fields such as score_reasoning, penalties, and notes are preserved but not automatically assigned.
Compare Runs
Use the scored rows, normalized usage fields, and raw provider artifacts together. The historical run pages show one static presentation of those outputs.