How to Extract Data from a Graph Image
A practical four-step method for digitizing data points from chart images, plus fixes for log axes, multi-series plots, and low-resolution scans.
To extract data from a graph image, upload the picture into a digitizer, click each data point, calibrate two known values on each axis, and export the resulting numbers as CSV or XLSX. The whole loop takes under five minutes once the chart is clean.
This guide is the long version. Use it when the chart is awkward — log scales, dense scatter, multi-series legends, a JPEG someone took with a phone — and you need the result to survive peer review or an audit.
The four-step method
The method has four steps because every digitizer worth using has four screens: image, points, axes, data. Skip none of them.
- Upload the chart image.
- Place points on every value you want to extract.
- Calibrate the axes with two known coordinates per axis.
- Export the data as CSV or XLSX.
DataFromChart implements exactly these four steps as the workflow layers. The order matters: placing points before calibrating axes means a misplaced axis only requires you to fix the axis, not re-click 40 points.
Step 1: upload a clean chart image
The clearer the image, the less time you spend nudging points. Aim for a PNG of the original figure at native resolution — anything where individual gridlines are crisp at 100% zoom.
If the source is a PDF, render the page to PNG at 300 DPI before uploading. Browser screenshots of a journal viewer are usually enough, but a re-rendered page is always sharper. We cover this in detail in our guide on extracting data from a chart in a PDF.
Crop tightly around the plot area before uploading. The fewer pixels of caption, gridlines, and legend, the easier the axes are to calibrate. Don’t crop the axis tick labels themselves — you’ll need them visible to calibrate against known values.
JPEG compression hurts on charts with thin lines. If you only have a JPEG, run it through a sharpening pass or, when possible, request the original PNG/SVG from the author. For Kaplan-Meier curves and similar dense plots, this difference alone can shift extracted values by 1–2%.
Step 2: place points
Click each data point you care about. Don’t click everything — click what you’ll actually analyze.
For a line chart, click visible markers if there are markers, or click at each gridline intersection if the line is smooth. For a scatter plot, click every dot. For a bar chart, click the top edge of each bar.
Zoom in. A point placed 3 pixels off on a 600-pixel-wide chart is a 0.5% error baked in before you’ve done anything else. Most digitizers, including ours, support panzoom — use it.
If the chart has multiple series, work one series at a time and group your points before moving on. This becomes critical at export — without grouping, a 4-series chart becomes one undifferentiated cloud of (x, y) pairs.
When you have hundreds of points
Manual clicking does not scale past ~50 points. For dense scatter plots, line charts with no markers, or any case where you’d otherwise be clicking for an hour, use color-based automatic extraction.
DataFromChart includes a WebPlotDigitizer-style color picker: select the color of the series, set a tolerance, and the tool finds every pixel matching that color and snaps points along it. We benchmarked this against manual placement on a 200-point IPCC temperature anomaly chart and the mean absolute error was 0.02 degrees — below the line thickness.
Try it on a chart you actually need. Open the extractor, upload your image, and you’ll have a CSV in under three minutes. No login required for a single export.
Step 3: calibrate the axes
Calibration is two known points per axis. Anything more is overkill; anything less is impossible.
For the X axis, drop a start line on a tick where you know the X value, and an end line on another tick where you know the X value. Type those values in. Repeat for Y.
The longer the calibration interval, the smaller the percentage error in your data. If the X axis spans 1990–2020 and you calibrate at 1995 and 2000, a one-pixel misplacement at either endpoint translates to a multi-year error across the full range. Calibrate at the leftmost and rightmost visible ticks instead.
Log axes
For log-scale axes, calibrate at two visible powers of ten. The tool needs to know two real values and their two pixel positions; whether the axis is linear or logarithmic between them is a property of how it’s displayed, not how it’s calibrated.
If your digitizer doesn’t have a “log axis” toggle, calibrate as if linear, export, then take 10^value in a spreadsheet. The math is identical. DataFromChart supports both linear and logarithmic axes natively.
Non-orthogonal axes
If the chart is rotated, skewed, or the X and Y axes aren’t perpendicular (rare in scientific publishing, common in old scanned figures), you need a four-point calibration, not two. Most modern digitizers handle this implicitly by calibrating X start, X end, Y start, Y end as four independent line positions rather than two crossed axes.
Step 4: export
Once axes are calibrated, every point you placed converts from pixel position to real value via linear interpolation. The formula is:
value = ((point_px - axis_start_px) / (axis_end_px - axis_start_px))
* (end_value - start_value) + start_value For log axes, the same formula applies in log space — the tool takes log10 of your axis values, interpolates, then raises 10 to the result.
CSV is fine for most analyses. XLSX is better when you want to ship the chart with the data, since DataFromChart embeds the original chart image and the axis labels (with units) into the workbook. That matters for reproducibility — the next person to open the file can verify visually that the extracted (x, y) pairs come from where you said they did. See our chart screenshot to Excel walkthrough for what the file contains.
Tricky cases
Most charts go through the four steps without incident. These are the ones that don’t.
Log axes
Already covered above. The key trap is calibrating at non-power-of-ten ticks: don’t. If the visible ticks are 1, 10, 100, 1000, calibrate at 1 and 1000. For the full walkthrough including semi-log vs log-log and ln-vs-log10 confusions, see our log chart extraction guide.
Multi-series charts
Work one series at a time, label the group, then move on. Common failure: a 5-series chart exported as one column of 200 points with no series identifier. Recovering series identity post-export requires re-running the digitization. Don’t skip this.
Color-based extraction shines here because each color is implicitly a series. Run it five times — once per color — and you have five labeled groups without manually grouping anything. For dense scatter specifically — where each color is a cluster of hundreds of points — see our scatter plot extraction guide.
Low-resolution scans
Below ~600 pixels wide, individual data points become ambiguous. Two failures are common: gridlines and data lines merge into one color band, and JPEG blocking introduces step-artifacts that the eye reads as data. Both inflate error.
Mitigations: zoom the chart in your browser before screenshotting (this resamples cleanly in most browsers), or request the original. If neither is possible, expect 3–5% noise on extracted values and report that uncertainty.
Color-based extraction failures
Color picking fails when the series shares its color with axes, gridlines, or text. Set the tolerance lower, or mask the offending areas with a quick crop before re-uploading. It also fails on anti-aliased lines where the line color blends to white at the edges — keep tolerance high enough to catch the blended pixels, but not so high that you catch the gridline.
Stacked bars and area charts
These need a different mental model: each “value” is the difference between two visible edges, not a single point. Click both the top and the bottom of each band, then subtract in a spreadsheet. Don’t try to eyeball the absolute heights.
Accuracy tips
Three things drive accuracy. None of them are clever.
Source resolution. A chart at 300+ DPI behaves; a chart at 72 DPI does not. Re-render from PDF when you can.
Endpoint placement. Calibration error scales with the inverse of the calibration interval. Always calibrate at the longest visible interval — leftmost to rightmost tick on X, topmost to bottommost tick on Y.
Calibration value precision. If the axis label reads “1.0e6” and you type “1000000” — fine. If you type “1e6” and the parser reads it as the string “1e6” — not fine. Most modern digitizers parse scientific notation; verify yours does before trusting the output.
After exporting, plot the extracted data on the same axes as the original chart and overlay them. If the curves diverge anywhere, you have a calibration problem, not a clicking problem.
What tool to use
We’re biased — we built DataFromChart. The honest summary: it’s the best fit if you want XLSX export with the chart embedded, want color-based extraction without installing anything, and want a UI that works on a 13-inch laptop. WebPlotDigitizer is the long-standing reference and is still excellent for complex axis types. A full comparison lives in our WebPlotDigitizer alternatives roundup.
For academic work specifically — systematic reviews, meta-analyses, dose-response work — the tool matters less than the methods reporting. We cover that in detail in our data extraction for meta-analysis guide.
CTA
Open the extractor, drop in your chart, and you’ll have clean CSV or XLSX in the time it took to read this paragraph. No installation, no account required for a quick run.
FAQ
Can I extract data from a graph image for free?
Yes. DataFromChart is free for the core extraction workflow — upload, points, axes, export. WebPlotDigitizer is free and open source. Most digitizers in the category have a free tier that covers single-chart use.
How accurate is data extracted from a chart image?
On a clean source image with careful calibration, expect 0.5–2% mean absolute error against the underlying values. Below 600px source width or with sloppy axis calibration, errors of 5%+ are common. Always overlay your extracted data on the original to sanity-check.
What if the chart uses a log scale?
Calibrate at two visible powers of ten and toggle the axis to logarithmic in your digitizer. If the tool has no log toggle, calibrate as linear and exponentiate post-export — the result is identical.
Can I extract data from a hand-drawn or sketched chart?
Yes, with caveats. Axis calibration still works as long as you can place two known values per axis. Accuracy will be poor (often 5–10% error) because hand-drawn charts have inconsistent line widths and rarely use precise tick spacing.
Can I extract data from a 3D chart?
Not reliably. 3D bar charts and pie charts both project values through perspective, which destroys the linear pixel-to-value mapping that digitizers rely on. Recreate the chart as 2D if you can; otherwise expect significant error.
How do I extract data from a chart in a PDF?
Export the PDF page as PNG (300 DPI), then upload to your digitizer. The full walkthrough is in our PDF chart extraction guide.
What’s the difference between CSV and XLSX export?
CSV is plain text — pure (x, y) values. XLSX preserves the chart image alongside the data and includes axis labels with units, which makes it easier to verify and share. We dig into this in chart screenshot to Excel.
How do I cite digitized data in a publication?
Cite the original figure (the paper and figure number) as the data source. Cite the digitization tool in the methods section. For systematic reviews and meta-analyses, follow PRISMA reporting requirements — full detail in our meta-analysis guide.