Field Notes · 2026-04-10 · 8 min read

VisionKit vs Google ML Kit for whiteboard detection

When I started building BoardSnap, picking the document scanner was a real decision. VisionKit or Google ML Kit. Here's the real comparison — not the marketing version.

BoardSnap is iOS-only. That's a positioning decision I made intentionally — but it also meant I had a genuine choice between Apple's VisionKit and Google's ML Kit for the whiteboard scanning pipeline. Both run on-device. Both do document detection. Both are free.

I spent about two weeks testing both before committing to VisionKit. Here's the full comparison.

### What they're both trying to do

Document scanning on a phone involves two distinct problems:

Detection — find the rectangular document (or whiteboard) in the camera frame, draw a bounding quad around it, and track it as the phone moves.
Rectification — once the user captures, use the quad to perform perspective correction, transforming the tilted photo into a flat, top-down image.

Both VisionKit and ML Kit handle both steps. The high-level APIs abstract a lot of the complexity. Where they differ is in the details.

### VisionKit: what it gets right

Apple's document scanner (VNDocumentCameraViewController) is deeply integrated with the operating system. It uses the same computer vision models that power Notes.app document scanning, which has been through years of real-world iteration across hundreds of millions of iPhones.

The quad detection is excellent on whiteboards specifically — better than on photographs or printed documents, in my testing. I think this is because whiteboards have high-contrast edges, and VisionKit's models are tuned for document edges rather than photographic depth cues.

The API is also extremely simple. VNDocumentCameraViewController is a modal UIKit controller that handles the full scanning flow — live preview, edge detection, capture, and cropping. You present it, implement a delegate, and get back VNDocumentCameraScan with pre-corrected images. The whole integration is under 50 lines of Swift.

The accuracy on real whiteboard captures was approximately 94% in my test set. I ran 50 photos from real meetings — different lighting conditions, different marker colors, different board sizes. VisionKit correctly detected and rectified 47. The three failures were all edge cases: a very light board in a sun-washed room, a board with a large dark poster next to it that confused the edge detector, and a board shot almost directly from the side (which is an inherently ambiguous case).

### Google ML Kit: what it gets right

ML Kit's document scanner is cross-platform. If you're building a React Native or Flutter app, ML Kit gets you whiteboard scanning on both iOS and Android without maintaining two native pipelines. That's a real advantage for certain teams.

ML Kit also gives you more control. The GmsDocumentScanner API lets you configure the scanner mode, set page limits, and control whether the result goes to a URI or a bitmap. For apps that need to do further image processing after scanning, that control is valuable.

Accuracy in my test set: approximately 87%. ML Kit failed on 6 of 50, including two that VisionKit handled correctly. The ML Kit failures were concentrated on low-contrast whiteboards — light markers on a slightly yellowed board — which is a genuine real-world case in older conference rooms.

### What VisionKit doesn't do

VisionKit can't be embedded in a custom camera UI. The VNDocumentCameraViewController is a full-screen modal. You can't take an existing camera feed, run VisionKit detection on it, and draw your own quad overlay. The detection and the UI are bundled.

For BoardSnap, this was actually fine — we want the clean capture flow, not a custom UI. But if you need a custom camera experience (say, a scanning view embedded inside a larger interface), VisionKit forces you to either use the standard modal or build your own detection from scratch using lower-level Vision framework APIs.

ML Kit can be run on individual frames from any camera feed, which makes it more composable.

VisionKit is iOS-only. If you ever want an Android version, you've painted yourself into a corner. This is the main reason to think hard before committing — though if you're building iOS-first intentionally, it's not a real objection.

### Why I chose VisionKit

Three reasons:

1. Accuracy matters more than composability. For a whiteboard-to-action-plan app, the scan quality is the foundation of everything downstream. A 7-point accuracy gap (94% vs 87%) compounds. If 13% of captures need manual correction versus 6%, that's a different product experience.

2. The API simplicity translates to maintainability. I'm a solo builder. Code I don't have to write is code I don't have to debug. VisionKit's 50-line integration beats ML Kit's more configurable but more verbose setup for my use case.

3. iOS-only is a feature, not a compromise. I made this bet on purpose. iPhone-first means I can use every Apple platform advantage — VisionKit, the Neural Engine, the camera hardware assumptions — without designing for a lowest common denominator. ML Kit's cross-platform strength is a weakness for me specifically.

### The bottom line

If you're building iOS-first and you want whiteboard or document detection: use VisionKit. The accuracy is better, the integration is simpler, and you get the benefit of Apple's continuous model improvements through OS updates.

If you're building cross-platform or you need a composable camera pipeline: ML Kit is the right call. The accuracy delta is real but not catastrophic, and the flexibility is worth it for the right architecture.

BoardSnap uses VisionKit. If I were building the Android version of this app, I'd use ML Kit and accept the tradeoff.

Frequently asked

Is VisionKit free to use?

Yes. VisionKit is part of Apple's SDK and available to any iOS developer with no additional licensing cost. It runs entirely on-device with no network call required.

Can VisionKit scan more than one page at a time?

Yes — VNDocumentCameraViewController supports multi-page scanning. Each board snap in BoardSnap is a single page, but the API can handle multi-page document flows natively.

Why doesn't BoardSnap support Android?

BoardSnap is intentionally iOS-first. The bet is that deep integration with Apple hardware — VisionKit, the Neural Engine, the iPhone camera — produces a better product than a cross-platform compromise. Android is not planned for the near term.

Snap your first board today.

See the workflow this post talks about — free on the App Store.

Download on the App Store

Free · 1 project, 30 boards Pro $9.99/mo · everything unlimited Pro $69.99/yr · save 42%