Full board: every accepted judge group. Detect scores are comparable within each judge model + parameter group (SPEC §3).
| Rank | Δ | Operator | Official score | Model | Harness kind | Scaffold | Reasoning | Prompt | Judge | Created | Promoted | Submission |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Target | — | Claude Opus 4.6 — Detect SOTA reference | 45.6% paper | — | — | — | — | — | — | — | — | no Open EVMBench submission |
| Target | — | GPT-5.3-Codex — Detect reference | 39.2% paper | — | — | — | — | — | — | — | — | no Open EVMBench submission |
| #1 | new | @antfleet-ops | 71.8% 84/117 | gpt-5.5 | single-shot | frontier-models-fleet-single-shot | xhigh | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #2 | new | @antfleet-ops | 61.5% 72/117 | gpt-5.5 | single-shot | frontier-models-fleet-single-shot | medium | sha256:ce3e260 | gpt-5 (high) | 2026-06-19 | 2026-06-19 | record |
| #3 | new | @antfleet-ops | 59.0% 69/117 | gpt-5.5 | single-shot | frontier-models-fleet-slither-augmented-single-shot | medium | sha256:f415edd | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #4 | new | @antfleet-ops | 54.7% 64/117 | claude-opus-4-8+gpt-5.5 consensus | single-shot | antfleet-two-model-multishot-v3p1-cli | off | sha256:d151f4f | gpt-5.5 (high) | 2026-06-21 | 2026-06-21 | record |
| #5 | new | @antfleet-ops | 43.6% 51/117 | claude-opus-4-8+gpt-5.5 consensus | single-shot | antfleet-two-model-consensus | off | sha256:d151f4f | gpt-5 (high) | 2026-06-18 | 2026-06-18 | record |
| #6 | new | @antfleet-ops | 43.6% 51/117 | gpt-5.4 | single-shot | frontier-models-fleet-single-shot | medium | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #7 | new | @antfleet-ops | 35.9% 42/117 | claude-opus-4-8 | single-shot | frontier-models-fleet-single-shot-opus-tailored | max | sha256:2fbf232 | gpt-5.5 (high) | 2026-06-21 | 2026-06-21 | record |
| #8 | new | @antfleet-ops | 35.0% 41/117 | composer-2.5 | single-shot | cursor-fleet-single-shot | off | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-21 | 2026-06-21 | record |
| #9 | new | @antfleet-ops | 28.2% 33/117 | claude-opus-4-8 | single-shot | frontier-models-fleet-single-shot | off | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #10 | new | @antfleet-ops | 27.4% 32/117 | zai-org-glm-5-2 | single-shot | oss-fleet-single-shot | low | sha256:ce3e260 | gpt-5 (high) | 2026-06-19 | 2026-06-19 | record |
| #11 | new | @antfleet-ops | 26.5% 31/117 | moonshotai-kimi-k2-7-code | single-shot | oss-fleet-single-shot | low | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #12 | new | @antfleet-ops | 26.5% 31/117 | google-gemini-3-5-flash | single-shot | frontier-models-fleet-single-shot | on | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #13 | new | @antfleet-ops | 18.8% 22/117 | qwen/qwen3.7-max | single-shot | oss-fleet-single-shot | api-def | sha256:ce3e260 | gpt-5 (high) | 2026-06-19 | 2026-06-19 | record |
| #14 | new | @antfleet-ops | 17.9% 21/117 | minimax-m3 | single-shot | oss-fleet-single-shot | on | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-19 | 2026-06-19 | record |
| #15 | new | @antfleet-ops | 16.2% 19/117 | deepseek-v4-pro | single-shot | oss-fleet-single-shot | on | sha256:ce3e260 | gpt-5 (high) | 2026-06-18 | 2026-06-18 | record |
| #16 | new | @antfleet-ops | 13.7% 16/117 | google-gemini-3-1-pro-preview | single-shot | frontier-models-fleet-single-shot | off | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #17 | new | @antfleet-ops | 9.4% 11/117 | x-ai-grok-4-3 | single-shot | frontier-models-fleet-single-shot | on | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |
| #18 | new | @antfleet-ops | 4.3% 5/117 | llama-3-3-70b | single-shot | oss-fleet-single-shot | off | sha256:ce3e260 | gpt-5.5 (high) | 2026-06-20 | 2026-06-20 | record |