Pressmark — Prompt Version Control

pressmark — Prompt: sentiment-classifier

sentiment-classifier · v2 — 7c2d18

stable passing

SYSTEM
You are a sentiment classifier. Classify the input text as
POSITIVE, NEGATIVE, or NEUTRAL. Reply with the label only.

USER
{{text}}

# model_params
temperature: 0.0 max_tokens: 8 model: gpt-4o-mini

EVAL PASS RATE

92% 48 / 52 rows

What it does

Built for prompt engineers who ship

Not another playground. Pressmark plugs into how you already work — CLI, CI, and a zero-config web UI for your team.

Content-addressed versioning

Every commit produces a 12-char SHA from its exact content. Identical prompts get the same SHA — no duplicate storage, no false diffs.

Unified diffs

Compare any two versions with colorized unified diff — additions, removals, and context. Available in CLI and the web UI side-by-side.

Dataset management

Import JSONL files or create rows manually. One dataset can run against multiple prompt versions. Results link back to the exact row and version.

Async eval runner

Configurable concurrency with asyncio.Semaphore. Each row result is written immediately — a crash mid-run doesn't lose completed work.

Eval comparison

Compare two eval runs: pass-rate delta, per-scorer breakdown, Chart.js bar chart. Rows marked gained or regression for instant triage.

Zero-config web UI

FastAPI + Jinja2 + HTMX. No npm, no webpack, no build step. SSE streams eval progress live. Works from a single pip install.

Evaluation

Five built-in scorers

Configure per-eval in JSON or TOML. Mix and match — an eval run can have multiple scorers and aggregates each independently.

Type	Config keys	How it scores	Output
exact_match	`case_sensitive`	Strip whitespace, compare to expected	0 or 1
contains	`substring · all_of · any_of`	Substring presence check, case-folded	0 or 1
regex_match	`pattern · flags`	`re.search` on output	0 or 1
llm_judge	`criteria · model · threshold`	LLM grades output against criteria 0–10	0.0 – 1.0
semantic_sim	`threshold · model`	Cosine similarity vs expected embedding	0.0 – 1.0

Get started

From install to first eval in minutes

Install and initialize

Creates a SQLite database at ~/.pressmark/pressmark.db. No setup wizard, no migrations to run.

Commit your first prompt

System prompt, user template with {{variable}} slots, model, and params. Produces a SHA and version number.

Import a dataset

JSONL with one row per line. Each row's keys map to template variables. The expected key feeds scorers that need a ground truth.

Run an eval

Results stream to the terminal with a Rich progress bar. Pass --min-pass-rate to exit non-zero in CI when quality drops.

Open the web UI

Browse prompt history, run comparisons, inspect per-row results, and diff versions in the browser.

terminal

$ pip install pressmark

$ pressmark init
✓ Database created at ~/.pressmark/pressmark.db

$ pressmark prompt commit sentiment \
    --system "Classify as POSITIVE, NEGATIVE, or NEUTRAL." \
    --user "{{text}}" \
    --model openai/gpt-4o-mini \
    --message "initial version"
✓ v1 · sha: a3f91b

$ pressmark dataset import sentiment-test data.jsonl
✓ 52 rows imported

$ pressmark eval run sentiment sentiment-test \
    --scorer '{"type": "exact_match"}' \
    --min-pass-rate 0.90
Running eval  ━━━━━━━━━━━━━━━━━━━━ 100%  52/52
Pass rate: 92.3% ✓

$ pressmark ui
→ http://127.0.0.1:7820

CI pipeline integration

Drop one command into your GitHub Actions workflow. Exits with code 1 when pass rate drops below your threshold — no post-processing needed.

pressmark eval run … --min-pass-rate 0.90

Configuration

One file, no surprises

Copy pressmark.example.toml to pressmark.toml or use environment variables. Both work.

pressmark.toml

[pressmark]
db_path          = "~/.pressmark/pressmark.db"
default_model    = "openai/gpt-4o-mini"
eval_concurrency = 5

[openrouter]
api_key          = "sk-or-v1-..."

[web]
host             = "127.0.0.1"
port             = 7820

environment variables

# Same settings, env-var style

PRESSMARK_DB_PATH=~/.pressmark/pressmark.db
PRESSMARK_DEFAULT_MODEL=openai/gpt-4o-mini
PRESSMARK_EVAL_CONCURRENCY=5

OPENROUTER_API_KEY=sk-or-v1-...

PRESSMARK_HOST=127.0.0.1
PRESSMARK_PORT=7820

Prompt version control for LLM engineers