Mihaly Dani

Goal

Create the fastest jigsaw puzzle solver AI in the world, with a physical robot that can autonomously solve puzzles in real-time. This is a multi-phase research project combining computer vision, supervised learning, reinforcement learning, and robotics.

Phase 1: Data Generation ✅

Built a native macOS app that generates high-quality jigsaw puzzle training data at scale.

Openverse integration: search and download tens of thousands of Creative Commons images directly into projects, with licence and attribution preserved
Puzzle cutting engine with CGPath cubic bezier curves for realistic interlocking edges with randomised control points
Adjacent pieces share edge curves (forward/reversed traversal) for perfect interlocking fit
Configurable grid sizes from 1x2 to 100x100
AI normalization pipeline: centre-crop, resize, pad all pieces to uniform square canvas for ML training
Batch processing: process multiple images in one go with per-item progress tracking
Pieces exported as transparent PNGs with metadata JSON (bounding boxes, neighbour lists, grid positions)

Phase 2: Dataset Generation ✅

Structured ML dataset creation from 2-piece puzzles with proper data science methodology.

Four pair categories: correct match, wrong shape match (same edges, different image), wrong image match (same image, different edges), wrong nothing (different both)
Image-level train/test/valid splits to prevent data leakage
Shared grid edges enable shape-match pairs across images for harder negatives
Larger grids produce more pair positions and diverse piece types (corners, edges, interior)
Datasets persisted as independent entities (survive project deletion), exportable to external directories

Phase 3: Model Architecture and Training ✅

Siamese Neural Network training infrastructure built directly into the app.

Architecture presets: reusable SNN configurations with 3 built-in defaults (Quick Test, Recommended, High Capacity)
Configurable ConvBlocks, comparison methods, and hyperparameters
Auto-generates self-contained PyTorch training scripts + requirements.txt
Local training: auto venv creation, pip install, subprocess execution with live epoch progress
Cloud training via SSH: uploads dataset + scripts via scp, runs on remote GPU instances, streams live progress, auto-downloads results
Experiment tracking: script SHA-256 hashes, preset names, notes, timestamps, metrics comparison table
Auto-imports metrics.json and Core ML model on completion

Phase 4: Piece Matching Model (next)

Train the Siamese Neural Network on generated datasets to learn piece compatibility. Evaluate matching accuracy across different puzzle complexities and ambiguous textures.

Phase 5: Assembly Solver (planned)

Develop an RL agent or optimization-based solver that uses pairwise match scores to plan and execute full puzzle assembly.

Phase 6: Physical Robot (planned)

Computer vision for physical piece detection via overhead camera, robot arm integration with vacuum gripper for pick-rotate-place operations. Bridge the virtual and physical worlds.

Tech Decisions

Native macOS app in Swift + SwiftUI for maximum performance on image processing
Pure native implementation with no external dependencies (Core Graphics)
PyTorch for model training with Core ML export for on-device inference
SSH-based cloud training for GPU access without leaving the app
Four-level project hierarchy: Project > Cut > CutImageResult > Pieces
Full persistence layer with JSON manifests for all entities

Outcome

Built a native macOS puzzle generator app producing training data at scale with bezier-curve interlocking pieces.

Jigsaw Puzzle Solver AI