Tag
#reproducibility
2 posts tagged reproducibility.
- methodology
Designing a Reproducible AI-Security Eval Harness
A reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and report — that lets a stranger re-run your number.
- methodology
Reproducible LLM Scanner Benchmarks: What Everyone Forgets to Pin
An LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible. Here's the full list of what to pin and why.