Llm on Julien.cloud

Llm on Julien.cloudhttps://julien.cloud/tags/llm/Recent content in Llm on Julien.cloudHugo -- gohugo.ioen© 2026 JulienTue, 09 Jun 2026 12:00:00 +0200Running Ollama on a Jetson Orin Nano: From Gemma 3 to Gemma 4 with GPU Accelerationhttps://julien.cloud/blog/jetson-nano-ollama-edge-inference/Tue, 09 Jun 2026 12:00:00 +0200https://julien.cloud/blog/jetson-nano-ollama-edge-inference/The journey from Gemma 3 4B (17.5 tok/s CPU) to Gemma 4 E2B (25.5 tok/s GPU) on the Jetson Orin Nano. Covers model testing, QAT quantization, the JetPack CUDA rabbithole, CMA traps, and the keepalive architecture that makes it all work.LLM Gateway for OpenCode: Building a Local LiteLLM Routerhttps://julien.cloud/blog/llm-gateway-for-opencode-building-a-local-litellm-router/Sun, 07 Jun 2026 12:00:00 +0200https://julien.cloud/blog/llm-gateway-for-opencode-building-a-local-litellm-router/27 models from 5 providers in LiteLLM, exposed to OpenCode through smart routers that pick the right model tier by prompt content, not context size. Runs locally via Docker with caching, spend tracking, and one endpoint.OpenCode Go: Can $10/Month Open Models Replace Frontier APIs?https://julien.cloud/blog/opencode-go-models-2026/Sat, 30 May 2026 23:30:00 +0200https://julien.cloud/blog/opencode-go-models-2026/12 open coding models benchmarked against Claude and GPT-5.5. DeepSeek V4 Flash handles 70% of tasks at 12x cheaper than DeepSeek V4 Pro. MiMo-V2.5 is now the cheapest high-volume option at 30,100 req/5h. Qwen3.7 Max leads on SWE-bench Pro (60.6%). Kimi K2.6 leads on agentic coding. Here’s how to route between them.Unveiling the World of AI Chatbots: A Diverse Explorationhttps://julien.cloud/blog/unveiling-the-world-of-ai-chatbots-a-diverse-exploration/Mon, 03 Mar 2025 20:16:48 +0000https://julien.cloud/blog/unveiling-the-world-of-ai-chatbots-a-diverse-exploration/Beyond ChatGPT: a curated list of 20+ AI chatbot platforms covering frontier models, research tools, and developer-focused interfaces with their unique strengths.Boost Your AI Workflow: A Guide to Using Ollama, OpenwebUI, and Continuehttps://julien.cloud/blog/boost-your-ai-workflow-with-ollama-openwebui-and-continue/Thu, 25 Jul 2024 18:56:33 +0000https://julien.cloud/blog/boost-your-ai-workflow-with-ollama-openwebui-and-continue/Run local LLMs with Ollama, manage conversations via OpenwebUI, and get AI code completion in VS Code with Continue. A complete local AI stack setup guide.Leveraging Fabric and LM Studio for Advanced AIhttps://julien.cloud/blog/leveraging-fabric-and-lm-studio-for-advanced-ai/Thu, 06 Jun 2024 20:28:43 +0000https://julien.cloud/blog/leveraging-fabric-and-lm-studio-for-advanced-ai/How to run Fabric with local models through LM Studio for custom AI patterns and workflows. Setup, integration, and practical use cases for prompt-based automation.