Benchmark Measuring Head Models

Researchers develop new LiveBench benchmark for measuring AI models’ response accuracy

A group of researchers has developed a new benchmark, dubbed LiveBench, to ease the task of evaluating large language models’ question-answering capabilities. The researchers released the benchmark on ...

ZDNet

This new AI benchmark measures how much models lie

I wore the world's first HDR10 smart glasses TCL's new E Ink tablet beats the Remarkable and Kindle Anker's new charger is one of the most unique I've ever seen Best laptop cooling pads Best flip ...

Wired

A New Benchmark for the Risks of AI

MLCommons, a nonprofit that helps companies measure the performance of their artificial intelligence systems, is launching a new benchmark to gauge AI’s bad side too. The new benchmark, called ...

Business Wire

MLCommons Launches AILuminate, First-of-Its-Kind Benchmark to Measure the Safety of Large Language Models

SAN FRANCISCO--(BUSINESS WIRE)--MLCommons today released AILuminate, a first-of-its-kind safety test for large language models (LLMs). The v1.0 benchmark – which provides a series of safety grades for ...

Business Wire

Thunk.AI Releases “Hi-Fi” Benchmark to Measure AI Automation Reliability

SEATTLE--(BUSINESS WIRE)--Thunk.AI today announced the release of a new “Hi-Fi” benchmark designed to rigorously measure the reliability of AI agentic automation. The benchmark models enterprise ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results