Every prompt change gets a SHA, a parent, and a commit message. Every eval run is persistent. Compare any two versions. Gate your CI pipeline on pass rate.
Not another playground. Pressmark plugs into how you already work — CLI, CI, and a zero-config web UI for your team.
asyncio.Semaphore. Each row result is written immediately — a crash mid-run doesn't lose completed work.pip install.Configure per-eval in JSON or TOML. Mix and match — an eval run can have multiple scorers and aggregates each independently.
| Type | Config keys | How it scores | Output |
|---|---|---|---|
| exact_match | case_sensitive |
Strip whitespace, compare to expected | 0 or 1 |
| contains | substring · all_of · any_of |
Substring presence check, case-folded | 0 or 1 |
| regex_match | pattern · flags |
re.search on output |
0 or 1 |
| llm_judge | criteria · model · threshold |
LLM grades output against criteria 0–10 | 0.0 – 1.0 |
| semantic_sim | threshold · model |
Cosine similarity vs expected embedding | 0.0 – 1.0 |
~/.pressmark/pressmark.db. No setup wizard, no migrations to run.{{variable}} slots, model, and params. Produces a SHA and version number.expected key feeds scorers that need a ground truth.--min-pass-rate to exit non-zero in CI when quality drops.$ pip install pressmark $ pressmark init ✓ Database created at ~/.pressmark/pressmark.db $ pressmark prompt commit sentiment \ --system "Classify as POSITIVE, NEGATIVE, or NEUTRAL." \ --user "{{text}}" \ --model openai/gpt-4o-mini \ --message "initial version" ✓ v1 · sha: a3f91b $ pressmark dataset import sentiment-test data.jsonl ✓ 52 rows imported $ pressmark eval run sentiment sentiment-test \ --scorer '{"type": "exact_match"}' \ --min-pass-rate 0.90 Running eval ━━━━━━━━━━━━━━━━━━━━ 100% 52/52 Pass rate: 92.3% ✓ $ pressmark ui → http://127.0.0.1:7820
Copy pressmark.example.toml to pressmark.toml or use environment variables. Both work.
[pressmark] db_path = "~/.pressmark/pressmark.db" default_model = "openai/gpt-4o-mini" eval_concurrency = 5 [openrouter] api_key = "sk-or-v1-..." [web] host = "127.0.0.1" port = 7820
# Same settings, env-var style PRESSMARK_DB_PATH=~/.pressmark/pressmark.db PRESSMARK_DEFAULT_MODEL=openai/gpt-4o-mini PRESSMARK_EVAL_CONCURRENCY=5 OPENROUTER_API_KEY=sk-or-v1-... PRESSMARK_HOST=127.0.0.1 PRESSMARK_PORT=7820