Bench

The open-source CLI behind InferenceBench. Every result ships as a cryptographically signed envelope — hardware fingerprint, software stack, dataset hash, RNG seed, metrics — that anyone can re-verify against Sigstore's public transparency log.

View the signed leaderboard →Read the docs GitHub

Install + run in one minute

Python 3.12+. Apache 2.0. The CLI ships with plugins for LLM inference, MT, code generation, vision, voice transcription, and embeddings retrieval.

pip install inferencebench

# Try it against any OpenAI-compatible LLM endpoint:
bench run llm.inference.chatbot-short \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --engine vllm --base-url http://localhost:8000/v1 \
  --signing-mode dev --dev-key cosign.key \
  --output ./envelopes/

# Anyone can verify the result:
bench verify ./envelopes/*.json

Why bench

Signed envelopes

Every result is hashed + signed. Verify with `bench verify` against Sigstore's public Rekor log — no shared secrets, no trusted PDFs.

Hardware-fingerprinted

GPU, driver, CUDA, NVML power draw, RAPL CPU energy — all captured in the envelope. The exact silicon and software stack that produced the number is part of the record.

Vendor-neutral

No bias toward NVIDIA, AMD, OpenAI, or Anthropic. Same suite runs against vLLM, SGLang, faster-whisper-server, OpenAI-compatible APIs, etc.

Open-core, Apache 2.0

The CLI is fully open source. Plugins for llm.inference, llm.quality, llm.mt, code.generation, vision.understanding, voice.transcription, embeddings.retrieval — all shippable.

Explore

Source code Marathon corpus (HF)Voice ASR validation (HF)

Browse the signed benchmark leaderboard — 52 hardware-fingerprinted envelopes across 11 suites, each re-verifiable with bench verify — or read the full CLI documentation.