# EmbedEval

> Binary LLM evaluation framework. Pass/Fail judgments, trace-centric records, error-analysis-first workflow.

## About

EmbedEval is a CLI tool for evaluating LLM outputs using binary pass/fail judgments, built on Hamel Husain's evaluation principles. It prioritizes manual error analysis before automation, cheap assertion checks before LLM-as-judge, and complete trace records over summary metrics.

## Install

```bash
# npm (global)
npm install -g embedeval

# npx (no install)
npx embedeval <command>

# curl installer
curl -fsSL https://raw.githubusercontent.com/Algiras/embedeval/main/install.sh | bash
```

npm: https://www.npmjs.com/package/embedeval

## Core Commands

- `embedeval run <dataset>` - Run evaluation suite
- `embedeval annotate` - Manual annotation UI for error analysis
- `embedeval report` - Generate evaluation report
- `embedeval compare` - Compare two evaluation runs

## Philosophy

- **Binary only** — PASS or FAIL, no debating 1–5 scales
- **Error analysis first** — inspect traces before automating
- **Cheap evals first** — assertions before LLM-as-judge
- **Trace-centric** — complete session records, not just scores
- **Single annotator** — "benevolent dictator" model avoids disagreement noise

## Source

- Repository: https://github.com/Algiras/embedeval
- Documentation: https://algiras.github.io/embedeval/
- npm: https://www.npmjs.com/package/embedeval
- License: MIT
- Language: TypeScript