Making the camera feel like a document scanner
A photo of a whiteboard and a scanned whiteboard are completely different things. Here's how VisionKit plus UX decisions close the gap and make the iPhone camera feel purpose-built.
The iPhone camera takes excellent photos. But photos and scans are different things, and the difference matters when the goal is to extract text from a whiteboard.
A photo captures what the camera sees — the perspective, the lighting, the angle. A scan captures a normalized representation of the document — corrected perspective, even exposure, flat orientation. The same whiteboard, photographed and scanned, produces meaningfully different results for OCR and AI analysis.
Making the BoardSnap camera feel like a scanner — not a camera — required specific choices at both the technical and UX level.
### Technical: VisionKit's document camera
VisionKit's VNDocumentCameraViewController is Apple's document scanner, adapted for our use. It does three things that a plain camera doesn't:
1. Real-time quad detection. As you move the camera, VisionKit continuously finds the bounding rectangle of the whiteboard and draws it on the viewfinder. This is the yellow quad that users see. It's telling them: "I see the whiteboard, and here's how I'm going to crop it."
The quad makes the intent of the capture explicit in a way that a plain camera viewfinder doesn't. With a camera app, you frame the shot and hope for the best. With VisionKit, you can see the detection happening and adjust your position until the quad covers what you want.
2. Perspective correction. When you capture with VisionKit, it applies a homographic transformation based on the quad corners — flattening the perspective so the captured image looks like a top-down scan, regardless of the angle you photographed from.
This is the most important step. A photo taken at a 30-degree angle has massive perspective distortion. The text on the near side of the board is much larger than the text on the far side. OCR struggles with distorted text. VisionKit's correction produces a flat, uniform image where text is consistent in size across the whole board.
3. Exposure normalization. VisionKit adjusts exposure to optimize for document legibility — darker backgrounds, higher contrast on text — rather than for photographic aesthetics.
### UX: making it feel intentional
The technical capabilities are necessary but not sufficient. The UX has to make the scanning intent explicit and guide the user toward good captures.
The viewfinder border. We added a subtle border to the camera viewfinder — rounded corners, gradient stroke — that distinguishes the scanning view from the standard camera app. This signals "you're in document mode" before a word of copy is read.
The quad color. Apple's default VisionKit quad is yellow. We kept it yellow — it's distinctive and high-visibility against most whiteboard backgrounds. But we changed the quad behavior: it animates to a more prominent state when confidence is high (thicker stroke, full color) and a subtle state when confidence is low (thin stroke, desaturated). This gives users a legibility signal without text UI.
The scanning progress indicator. When the user captures, there's a brief scanning animation — a horizontal line sweeping the board image — before the confirmation screen. This animation serves no functional purpose (the processing doesn't work that way). It serves a perceptual purpose: it makes the capture feel like a scan rather than a snapshot.
Research on scanner UX shows that adding a visual scan animation to what is functionally an instant capture increases user confidence in the output quality. They believe a scan was performed. That belief changes their evaluation of the output.
The confirmation screen. After capture, showing the perspective-corrected image — not the original photo — is the critical moment where the user experiences the scan vs. photo difference. The corrected image looks like a document. The user sees evidence that something happened beyond just taking a picture.
### The result
Users who've tried the feature describe BoardSnap as "a scanner," not "a camera app." They use the word "scan" when describing what they do. This is the language of document capture, not photography — and it's a direct result of the technical and UX choices that make the camera behave like a dedicated scanner.
For OCR and AI analysis, the difference is in the output quality: perspective-corrected images with normalized exposure produce consistently better AI summaries than raw photos. The UX choices that make it feel like a scanner also make the captures work like scanner inputs.
Snap your first board today.
See the workflow this post talks about — free on the App Store.