Field Notes · 2026-04-06 · 6 min read

On-device OCR is already good enough

Developers default to cloud OCR for anything serious. But Apple's Vision framework — running entirely on-device — hits the accuracy bar for whiteboard handwriting that most apps need. Here's the data.

When I started building BoardSnap, I assumed I'd need cloud OCR for anything beyond basic printed text. Whiteboard handwriting — especially fast, abbreviated, marker-on-white with varying quality — seemed like exactly the case where you'd need a heavy cloud model.

I was wrong. Apple's Vision framework OCR is better than I expected — good enough that it's the primary OCR layer in BoardSnap, with the AI model handling interpretation rather than reading.

### What the Vision framework provides

Apple's Vision framework includes VNRecognizeTextRequest, which runs entirely on-device using the Neural Engine. It supports two accuracy levels:

  • Fast: lower accuracy, very fast — useful for real-time scanning feedback
  • Accurate: higher accuracy, slower — used for final processing

For BoardSnap, we run Fast mode during the live viewfinder (to show users that text is being detected) and Accurate mode on the final captured image.

### The accuracy numbers

I tested Vision framework OCR on 80 whiteboard photos from real meetings — a mix of printing and cursive handwriting, multiple marker colors, various lighting conditions.

Overall character accuracy (Accurate mode): 91.3% Word accuracy (at least 80% of characters correct): 94.1% Sentence-level accuracy (complete phrase correctly captured): 87.2%

For comparison: Google Cloud Vision on the same test set returned:

  • Character accuracy: 93.7%
  • Word accuracy: 95.4%
  • Sentence accuracy: 89.8%

The delta is real: cloud Vision is about 2–3 points better on these metrics. But the gap is smaller than I expected, and it comes with zero latency cost (no round-trip) and zero privacy concern (data never leaves the device).

### Where on-device OCR fails

The failures are concentrated in three areas:

1. Heavily abbreviated text. Whiteboards are full of abbreviations that real humans decode from context. "Q3 conv. rate → Mktg" might mean "Q3 conversion rate needs Marketing attention." Vision reads the characters correctly but has no semantic model to expand the abbreviations. The AI model handles this interpretation step — Vision gives us the raw characters, the AI understands what they mean.

2. Very small text. Notes in the margin, text inside small boxes, things written in 8-point-equivalent marker size. Vision struggles with these at typical whiteboard-photograph distances. The fix: move closer, or snap a close-up of the section.

3. Light-on-light contrast. Yellow marker on white board, light gray on light board. The contrast problem affects both on-device and cloud OCR — it's fundamentally an input quality problem, not a model problem.

### How BoardSnap uses it

Vision framework OCR produces the text layer. The BoardSnap AI receives both the image and the Vision-extracted text as input. This two-channel approach gives the AI more to work with: the image provides spatial context (where things are relative to each other, what's circled, what arrows point to), the text layer provides the raw character content.

The AI uses the text layer as a high-confidence anchor and the image as context for resolving ambiguity. "This phrase appears in the upper left quadrant, with an arrow pointing to the list below" is the kind of spatial signal the text layer can't provide but the image can.

### When you'd want cloud OCR

For legal documents, medical records, or any use case where character-level accuracy is the critical metric and performance is secondary: use cloud OCR. The 2–3 point accuracy improvement is worth the latency and privacy tradeoffs when the stakes are high.

For real-time feedback, privacy-sensitive contexts, or apps where the semantic interpretation of the content is more important than perfect transcription: on-device is fine. In some cases it's better, because you get the answer faster and keep the data local.

BoardSnap's use case — whiteboard content that gets semantically interpreted by an AI model — is exactly the case where on-device is the right call.

Snap your first board today.

See the workflow this post talks about — free on the App Store.

Free · 1 project, 30 boards Pro $9.99/mo · everything unlimited Pro $69.99/yr · save 42%
BoardSnap Free on the App Store Get