For ML Engineers · Training workflow

Training workflows for ML engineers who make reproducible runs routine.

Designing a training workflow on a whiteboard lets the team align on data splits, compute requirements, evaluation protocol, and the checkpointing strategy before anyone writes config files. BoardSnap captures the full workflow spec before the session ends.

Download on the App Store Free to start. Pro from $9.99/mo or $69.99/yr.

Why ml engineers love this workflow

Reproducible ML training is a systems problem: the data split logic, the randomness seeds, the compute allocation, the evaluation protocol, and the experiment tracking configuration all need to be designed and documented before the first run. Ad-hoc training leads to results that can't be reproduced and experiments that can't be compared.

BoardSnap reads the training workflow diagram, the data split strategy, the compute configuration, the evaluation steps, and the checkpointing and tracking requirements and produces a structured training spec. Every engineer on the team trains with the same protocol.

The exact flow

  1. Design the data split strategy

    Draw the train/validation/test split. Note the split method — temporal, stratified, by entity. Mark any leakage risks.

  2. Specify the training loop

    Write the optimizer, learning rate schedule, batch size, and stopping criteria. These are the training hyperparameters that need to be documented.

  3. Define the evaluation protocol

    List the metrics, the evaluation frequency, and the validation set used for model selection. The evaluation protocol is the basis for comparison.

  4. Plan checkpointing and experiment tracking

    Write what gets checkpointed, how often, and where. Note the experiment tracking system and what metadata to log.

  5. Snap the training workflow

    Open BoardSnap and capture. The full training spec is documented before anyone writes a config file.

What you'll get out of it

  • Training protocol is documented before the first run — ensuring reproducibility
  • Data leakage risks are identified and addressed in the design phase
  • Evaluation protocol is consistent across experiments — enabling fair comparison
  • Compute requirements are specified — no surprise GPU bill
  • The training spec is shareable with any engineer who needs to reproduce a run

Frequently asked

Can BoardSnap read training hyperparameters and configuration written on a whiteboard?

Yes. Numeric parameters — learning rates, batch sizes, epoch counts — are captured as written alongside their labels. The training configuration is documented as a structured spec.

How does this help with experiment reproducibility?

When the training protocol — data split, hyperparameters, evaluation criteria — is documented before runs start, any team member can reproduce the experiment from the spec. The BoardSnap document is the experiment registration.

What if the training workflow changes based on early experiment results?

Snap the updated workflow design after each significant change. The history of training workflow designs is preserved in your project — you can see how the protocol evolved in response to experiment results.

Can I use this alongside a formal experiment tracking tool like MLflow or W&B?

Yes. BoardSnap captures the design intent and qualitative decisions. MLflow or W&B captures the quantitative run results. Use both — the whiteboard document is the 'why' that the tracking tool doesn't capture.

ML Engineers: try this on your next training workflow.

Three taps. Action items in your hand before the room clears.

Free · 1 project, 30 boards Pro $9.99/mo · everything unlimited Pro $69.99/yr · save 42%
BoardSnap Free on the App Store Get