Jump to content

Talk:Language model benchmark

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

comment

[edit]

If some of the benchmarks look weirdly obscure, then apologies. My criteria is simply: If a frontier model is advertised by showing how good it is on *this* or *that* benchmark, then I will put that benchmark in. For example, today I put in "Vibe-Eval", not because it is particularly interesting (I think it is not), but simply because the latest Google Gemini 2.5 (2025-06-05) advertised its ability on Vibe-Eval, so I had to put it in. pony in a strange land (talk) 20:58, 7 June 2025 (UTC)[reply]

MLCommons

[edit]

A coherent text on the subject, published in 2024 and added by me to Sources, describes something called MLCommons as the only benchmark standardization game in town. I do not know anything about it, but here is a questions to the one who know: is the omission of MLCommons in this long list deliberate (e.g., due to the bias of the source I have used) or accidental? Викидим (talk) 19:37, 12 September 2025 (UTC)[reply]