Leaderboard
On-device LLM performance rankings powered by Glicko-2
Find X8
AndroidRank
#59
Rating
1,775
±21 RD
Win Rate
76.6%
Conservative Rating
1,734
TG Rating
1,810
PP Rating
1,744
Matches
620
Record
475W – 145L
Models Tested
| Model | TG Median (tok/s) | PP Median (tok/s) | TG Best | PP Best | Runs |
|---|---|---|---|---|---|
| SmolLM2-135M-Instruct-Q8_0 | 105.58 | 673.03 | 114.09 | 774.13 | 2 |
| granite-3.1-1b-a400m-instruct-Q8_0 | 48.97 | 154.42 | 48.97 | 154.42 | 1 |
| granite-3.1-3b-a800m-instruct-IQ4_XS | 34.86 | 69.63 | 34.86 | 69.63 | 1 |
| OLMoE-1B-7B-0924-Instruct-IQ4_XS | 30.70 | 61.53 | 30.70 | 61.53 | 1 |
| granite-3.1-3b-a800m-instruct-Q8_0 | 28.06 | 77.15 | 28.06 | 77.15 | 1 |
| llama-3.2-1b-instruct-q8_0 | 24.92 | 128.89 | 27.25 | 153.13 | 4 |
| DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0-Q8_0 | 19.71 | 85.16 | 19.71 | 85.16 | 1 |
| SmallThinker-3B-Preview-Q8_0 | 12.03 | 47.62 | 12.03 | 47.62 | 1 |
| Qwen2.5-3B-Instruct-Q8_0 | 11.63 | 48.20 | 11.63 | 48.20 | 1 |
| qwen2.5-3b-instruct-q5_k_m | 10.96 | 18.15 | 10.96 | 18.15 | 1 |
| Phi-3.5-mini-instruct.Q4_K_M | 10.61 | 17.22 | 10.61 | 17.22 | 1 |
| Marco-o1-Q4_K_S | 8.43 | 16.23 | 8.43 | 16.23 | 1 |
| Mistral-7B-Instruct-v0.3.IQ4_XS | 8.40 | 12.18 | 8.40 | 12.18 | 1 |
| Qwen2.5-7B-Instruct-Q4_K_S | 7.71 | 14.97 | 8.25 | 16.20 | 4 |
| Qwen3-4B-Q8_0 | 7.52 | 32.50 | 7.52 | 32.50 | 1 |
| DeepSeek-R1-Distill-Qwen-7B-Q4_K_M | 6.33 | 10.93 | 6.33 | 10.93 | 1 |
| Qwen2.5-Coder-7B-Instruct-Q8_0 | 5.36 | 20.93 | 5.36 | 20.93 | 1 |
| Qwen2-7B-Instruct.IQ2_XS | 5.25 | 6.42 | 5.25 | 6.42 | 1 |
| gemma-2-9b-it-Q4_K_M | 5.02 | 9.27 | 5.02 | 9.27 | 1 |
| Mistral-Nemo-Instruct-2407-IQ2_M | 2.46 | 3.00 | 2.80 | 3.36 | 2 |
Head-to-Head Record
1–50 of 166 rows
1 / 4
Performance by App Version
ImprovedRegressed