Chart digitization is a learned skill — clicking gets faster, precision improves, you develop a feel for which workflow fits which chart. Deliberate practice on charts with known answers is the fastest way to build it.
Five practice charts, easiest first. Each has an answer key and a walkthrough. Work in order or jump to the type you’re extracting this week.
The five workshops
| # | Workshop | Type | Difficulty | Time | What you learn |
|---|---|---|---|---|---|
| 1 | Extract a simple bar chart | 5 bars | ★☆☆☆☆ | 5 min | The four-step loop end-to-end |
| 2 | Multi-series line chart | 3 series × 12 months | ★★☆☆☆ | 12 min | Per-series extraction discipline |
| 3 | Dense scatter with auto-extract | 250 points, 3 clusters | ★★★☆☆ | 8 min | Color-based auto-extraction |
| 4 | Log-scale chart | Semi-log, 12 points | ★★★★☆ | 10 min | Log calibration trick |
| 5 | Kaplan-Meier reconstruction | 2-arm step function | ★★★★★ | 15 min | Step-function precision; survival data |
Each is self-contained. The recommended sequence builds progressively, but jump if you came from a specific problem.
What you’ll have after all five
- A baseline accuracy number per chart type — the thing you want when deciding how much to trust your extractions.
- Familiarity with manual-click and color-based-auto workflows.
- Intuition for which chart types reward extra precision.
- Self-knowledge about your failure modes — most operators have a recurring error the workshops surface.
How to grade yourself
Every workshop publishes ground-truth values. After your extraction:
- Export your data as CSV.
- Open your CSV and the answer key in a spreadsheet or Python.
- Compute MAE per series.
- Compare to the workshop’s target.
Most workshops target MAE under 1.5% of y-range — careful-operator level. If you’re above, the workshop’s “common mistakes” section usually identifies what went wrong.
The Python recipe for MAE:
import csv
# Load your extracted CSV
extracted = {}
with open('my_extraction.csv') as f:
for row in csv.DictReader(f):
extracted[row['x']] = float(row['y'])
# Compare to ground truth (paste from workshop answer key)
truth = {'Acme': 36.6, 'Bolt': 23.5, 'Crux': 61.5, 'Delta': 17.5, 'Echo': 52.7}
mae = sum(abs(extracted[k] - v) for k, v in truth.items()) / len(truth)
y_range = max(truth.values()) - min(truth.values())
print(f"MAE: {mae:.2f} ({100*mae/y_range:.1f}% of y-range)") For log charts, see the log workshop’s grading section — you want log-space MAE, not linear.
The charts in detail
Workshop 1: bar chart
Five bars, vendor satisfaction on 0-100. No tricks. Cleanest introduction to the four-step workflow.
Workshop 2: multi-series line
Three product lines across twelve months, with crossings. Per-series discipline prevents losing track of which point belongs where. AI fails by swapping series at crossings; you’ll do it cleanly with named groups.
Workshop 3: dense scatter
250 points in three color-coded clusters. Where manual clicking stops scaling. Color-based auto-extraction with per-cluster tolerance — 90 seconds instead of 15 minutes.
Workshop 4: log-scale chart
Twelve points of exponential decay on a semi-log y-axis. The chart type AI gets most catastrophically wrong (40%+ MAE). Teaches the log-calibration trick — calibrate at visible powers of ten, toggle to log.
Workshop 5: Kaplan-Meier survival curve
Two-arm step function at 6-month intervals. Standard shape for clinical trial reporting and a frequent meta-analysis reconstruction target. Teaches step-corner placement and per-arm extraction. Pairs with our meta-analysis guide.
Where these charts come from
All five come from our open-source benchmark harness. The same charts score AI tools in our comparison posts — so “ChatGPT had 41% MAE on the log chart” is the exact same chart you can practice on.
Chart code, ground-truth JSON, and AI runners are in benchmarks/.
What to do after the workshops
- Researchers / meta-analysts: Data extraction for meta-analysis — full systematic-review workflow.
- Financial analysts: Chart screenshot to Excel — XLSX export with chart embedded.
- Asking if AI works: The limits of AI chart extraction — pillar post.
Want more workshops?
Workshop 6 (dual-axis), 7 (stacked area), and 8 (forest plots) are planned. To request a chart type, see the GitHub issues.
Try it on your own chart
Upload an image, click your data points, calibrate the axes, and export CSV. Under three minutes, no login required for a single export.
Open the extractor