Goal
Create the fastest jigsaw puzzle solver AI in the world, with a physical robot that can autonomously solve puzzles in real-time. This is a multi-phase research project combining computer vision, supervised learning, reinforcement learning, and robotics.
Phase 1: Data Generation ✅
Built a native macOS app that generates high-quality jigsaw puzzle training data at scale.
- Openverse integration: search and download tens of thousands of Creative Commons images directly into projects, with licence and attribution preserved
- Puzzle cutting engine with CGPath cubic bezier curves for realistic interlocking edges with randomised control points
- Adjacent pieces share edge curves (forward/reversed traversal) for perfect interlocking fit
- Configurable grid sizes from 1x2 to 100x100
- AI normalization pipeline: centre-crop, resize, pad all pieces to uniform square canvas for ML training
- Batch processing: process multiple images in one go with per-item progress tracking
- Pieces exported as transparent PNGs with metadata JSON (bounding boxes, neighbour lists, grid positions)
Phase 2: Dataset Generation ✅
Structured ML dataset creation from 2-piece puzzles with proper data science methodology.
- Four pair categories: correct match, wrong shape match (same edges, different image), wrong image match (same image, different edges), wrong nothing (different both)
- Image-level train/test/valid splits to prevent data leakage
- Shared grid edges enable shape-match pairs across images for harder negatives
- Larger grids produce more pair positions and diverse piece types (corners, edges, interior)
- Datasets persisted as independent entities (survive project deletion), exportable to external directories
Phase 3: Model Architecture and Training ✅
Siamese Neural Network training infrastructure built directly into the app.
- Architecture presets: reusable SNN configurations with 3 built-in defaults (Quick Test, Recommended, High Capacity)
- Configurable ConvBlocks, comparison methods, and hyperparameters
- Auto-generates self-contained PyTorch training scripts + requirements.txt
- Local training: auto venv creation, pip install, subprocess execution with live epoch progress
- Cloud training via SSH: uploads dataset + scripts via scp, runs on remote GPU instances, streams live progress, auto-downloads results
- Experiment tracking: script SHA-256 hashes, preset names, notes, timestamps, metrics comparison table
- Auto-imports metrics.json and Core ML model on completion
Phase 4: Piece Matching Model (next)
Train the Siamese Neural Network on generated datasets to learn piece compatibility. Evaluate matching accuracy across different puzzle complexities and ambiguous textures.
Phase 5: Assembly Solver (planned)
Develop an RL agent or optimization-based solver that uses pairwise match scores to plan and execute full puzzle assembly.
Phase 6: Physical Robot (planned)
Computer vision for physical piece detection via overhead camera, robot arm integration with vacuum gripper for pick-rotate-place operations. Bridge the virtual and physical worlds.
Tech Decisions
- Native macOS app in Swift + SwiftUI for maximum performance on image processing
- Pure native implementation with no external dependencies (Core Graphics)
- PyTorch for model training with Core ML export for on-device inference
- SSH-based cloud training for GPU access without leaving the app
- Four-level project hierarchy: Project > Cut > CutImageResult > Pieces
- Full persistence layer with JSON manifests for all entities
Outcome
Built a native macOS puzzle generator app producing training data at scale with bezier-curve interlocking pieces.