DJ Percy — AI quality | Gilligan Builds

DJ Percy AI quality

Adoption, eval quality, reliability, and operational performance for the public demo.

Patterns generated

Completed auto_layered runs in the rolling window.

Successful edits

88.5%

Runs where Percy applied at least one meaningful pattern change.

Eval pass rate

73.1%

Structure eval pass rate (deterministic checks).

Median generation time

6.6 s

End-to-end run duration (not TTFT).

Eval & successful edits by week

Weekly eval pass % vs successful edits % (grouped bars).

Structure eval outcomes by week

Passed vs failed structure checks (counts per week).

Run outcomes by week

Meaningful change vs no change vs failed.

Runs with meaningful change

Tool success (sum of steps)

No-change rate (runs with eval)

11.5%

Among completed runs that recorded a reliability eval snapshot.

Failure rate

0.0%

0 failed of 26 runs.

Generation time by week

Median and p95 duration (ms) for completed runs.

Median duration

6.6 s

P95 duration

12.4 s

Total tokens (prompt + completion)

446,178

Estimated cost (sum)

$0.5068

Reliability matters more than raw completion count: we separate “assistant ran” from “pattern meaningfully changed.”
Eval design must pair deterministic structure checks with apply/delta signals so product quality is measurable.
Speed (median/p95) is part of the experience for creative tools — we track it alongside eval pass rates.

Auto_layered runs only. As of 5/11/2026, 5:54:00 PM.