Taking our ESP32 LLM from prototype to our badge: Q8_0 quantization, inline PIE assembly achieving 16 int8 MACs per cycle, three-phase training, and SAM robotic TTS. This is where the badge gets its voice.
When Speech-to-Text models were too big for our 8MB PSRAM, we pivoted to a clever alternative: contrastive learning to match spoken audio directly to pre-embedded intents. Here's how we trained that system.
How we trained a custom 'Hey Daisy' wake word detector using confusable negatives, synthetic voice generation, and deployed it on ESP32-S3 with EdgeNeuron TFLite.