The journey from Gemma 3 4B (17.5 tok/s CPU) to Gemma 4 E2B (25.5 tok/s GPU) on the Jetson Orin Nano. Covers model testing, QAT quantization, the JetPack CUDA rabbithole, CMA traps, and the keepalive architecture that makes it all work.
Run local LLMs with Ollama, manage conversations via OpenwebUI, and get AI code completion in VS Code with Continue. A complete local AI stack setup guide.
How to run Fabric with local models through LM Studio for custom AI patterns and workflows. Setup, integration, and practical use cases for prompt-based automation.