[{"data":1,"prerenderedAt":391},["ShallowReactive",2],{"post-local-ai-lm-studio-gemma-prospecting-en":3,"surround-/en/blog/local-ai-lm-studio-gemma-prospecting-en":386},{"id":4,"title":5,"body":6,"canonical_id":368,"category":369,"date_created":370,"date_updated":370,"description":371,"extension":372,"head":373,"image":374,"lang":375,"layout":376,"meta":377,"navigation":378,"ogImage":373,"path":379,"reading_time":380,"robots":373,"schemaOrg":373,"seo":381,"sitemap":383,"stem":384,"__hash__":385},"blog/en/blog/16.local-ai-lm-studio-gemma-prospecting.md","I tested local AI: the future against exploding token costs",{"type":7,"value":8,"toc":357},"minimark",[9,19,22,38,43,70,80,83,107,120,124,140,146,178,185,188,192,195,198,209,212,215,234,237,241,244,264,267,271,274,300,303,307,310,313,316,324,328],[10,11,12,13,18],"p",{},"Over the past year, I've built a series of projects with and around AI: agents to find content ideas, ",[14,15,17],"a",{"href":16},"/en/blog/mcp-server-saas-feedback","MCP servers",", scripts that qualify data, automations.",[10,20,21],{},"I'm increasingly running into the explosion of token costs. Claude or Codex subscriptions aren't enough anymore, and API costs are downright exorbitant for some models (Opus, I'm looking at you…).",[10,23,24,25,29,30,33,34,37],{},"That's why I tested whether ",[26,27,28],"strong",{},"local AI"," could be a solid alternative. I installed ",[26,31,32],{},"LM Studio",", loaded Google's ",[26,35,36],{},"Gemma 4",", and plugged it into a small internal prospecting POC. Here's how it went.",[39,40,42],"h2",{"id":41},"lm-studio-running-a-local-ai-without-touching-the-terminal","LM Studio: running a local AI without touching the terminal",[44,45,49,50],"div",{"className":46},[47,48],"text-center","mx-auto","\n    ",[51,52,53,54,53,59,53,63,49],"picture",{},"\n      ",[55,56],"source",{"srcSet":57,"type":58},"/img/blog/illustration/lm-studio-interface.avif","image/avif",[55,60],{"srcSet":61,"type":62},"/img/blog/illustration/lm-studio-interface.webp","image/webp",[64,65],"img",{"src":66,"alt":67,"loading":68,"className":69},"/img/blog/illustration/lm-studio-interface.jpg","LM Studio interface with the local API server running","lazy",[48],[10,71,72,79],{},[26,73,74],{},[14,75,32],{"href":76,"rel":77},"https://lmstudio.ai/",[78],"nofollow"," is a free desktop app (Mac, Windows, Linux) that lets you download and run open source LLMs. No command line, no Docker, no Python to install.",[10,81,82],{},"The interface is simple:",[84,85,86,94,101],"ul",{},[87,88,89,90,93],"li",{},"A built-in ",[26,91,92],{},"model store"," (connected to Hugging Face) to download Llama, Mistral, DeepSeek, Qwen, Gemma…",[87,95,96,97,100],{},"A ",[26,98,99],{},"chat interface"," to talk directly with the loaded model.",[87,102,96,103,106],{},[26,104,105],{},"local API server",", OpenAI-compatible, that you activate with one click.",[10,108,109,110,114,115,119],{},"That last point is what made me choose LM Studio. By checking a box in the ",[111,112,113],"em",{},"Developer"," tab, LM Studio exposes a server on ",[116,117,118],"code",{},"localhost:1234"," with the same API format as OpenAI's. As simple as it is effective. An app coded to call ChatGPT can be redirected to the local LLM by just changing the base URL.",[39,121,123],{"id":122},"why-googles-gemma-4","Why Google's Gemma 4",[44,125,49,127],{"className":126},[47,48],[51,128,53,129,53,132,53,135,49],{},[55,130],{"srcSet":131,"type":58},"/img/blog/illustration/gemma4.avif",[55,133],{"srcSet":134,"type":62},"/img/blog/illustration/gemma4.webp",[64,136],{"src":137,"alt":138,"loading":68,"className":139},"/img/blog/illustration/gemma4.jpg","Google Gemma 4",[48],[10,141,142,143,145],{},"For the model, I picked ",[26,144,36],{},", released in early April 2026 by Google DeepMind, under the Apache 2.0 license. The Gemma 4 family comes in several flavors depending on the machine you have at hand:",[84,147,148,154,160,166,172],{},[87,149,150,153],{},[26,151,152],{},"E2B"," (2.3 billion effective parameters): designed for mobile and edge.",[87,155,156,159],{},[26,157,158],{},"E4B"," (4.5 billion): a good balance for a standard laptop.",[87,161,162,165],{},[26,163,164],{},"12B Unified",": multimodal (text, image, audio).",[87,167,168,171],{},[26,169,170],{},"26B A4B"," (Mixture-of-Experts): 3.8B parameters activated per token, for more advanced reasoning.",[87,173,174,177],{},[26,175,176],{},"31B Dense",": the most capable version, for beefier machines.",[10,179,180,181,184],{},"On my ",[26,182,183],{},"2023 MacBook Pro with 36 GB of RAM",", I tested and was able to run the E4B and 12B versions properly.",[10,186,187],{},"On quality, let's be honest: Gemma 4 doesn't replace a Claude Opus or GPT on complex reasoning or long-form writing. But on targeted, repetitive tasks (classifying, extracting, scoring, summarizing), the answers are clean and the latency stays reasonable.",[39,189,191],{"id":190},"the-test-plugging-a-prospecting-web-app-into-the-local-ai","The test: plugging a prospecting web app into the local AI",[10,193,194],{},"Beyond offline chat, what I was mostly interested in was application integration. Connecting local AI to a web app, the kind of task where I usually rely on a cloud model like GPT, Gemini or Sonnet. For this test, I had an internal POC in mind: a prospecting web app that pulls leads and qualifies them via AI.",[10,196,197],{},"The need is classic:",[84,199,200,203,206],{},[87,201,202],{},"Pull a list of leads (company name, industry, size, website).",[87,204,205],{},"Ask an LLM to score each lead against criteria I define.",[87,207,208],{},"Categorize the leads and prioritize the best ones.",[10,210,211],{},"On a few hundred leads a day, doing this with the OpenAI or Claude API starts to cost a few euros per day, multiplied by 365. With a local AI, the marginal cost drops to zero.",[10,213,214],{},"The technical workflow was simple:",[216,217,218,221,228,231],"ol",{},[87,219,220],{},"I turn on the API server in LM Studio.",[87,222,223,224,227],{},"In my web app, I replace the OpenAI URL with ",[116,225,226],{},"http://localhost:1234/v1"," in the SDK client.",[87,229,230],{},"I send my qualification prompts exactly like before.",[87,232,233],{},"Gemma 4 returns a score and a category in JSON format.",[10,235,236],{},"Impressively easy: as long as LM Studio is running, I have a usable AI, with or without internet. And as a bonus, the prospects' data never leaves my machine, which is comforting from a GDPR standpoint.",[39,238,240],{"id":239},"the-limits-i-ran-into","The limits I ran into",[10,242,243],{},"Local AI isn't magic, and there are a few caveats to keep in mind:",[84,245,246,252,258],{},[87,247,248,251],{},[26,249,250],{},"You need a powerful machine."," My 2023 MacBook with 36 GB of RAM holds up well, but on an entry-level laptop, you're limited to the smaller models.",[87,253,254,257],{},[26,255,256],{},"Latency is still higher"," than a cloud API call on a comparable model, since data centers have far more powerful GPUs.",[87,259,260,263],{},[26,261,262],{},"Open source models don't yet rival"," frontier models on complex tasks. For lead qualification, it's largely enough. For coding, I keep using Codex and Claude Code.",[10,265,266],{},"For me, local AI isn't a universal replacement for cloud APIs, but a complement for specific use cases.",[39,268,270],{"id":269},"why-i-think-its-going-to-become-a-standard","Why I think it's going to become a standard",[10,272,273],{},"Despite these limits, I think local AI will take up more and more space in the years to come. A few reasons:",[84,275,276,282,288,294],{},[87,277,278,281],{},[26,279,280],{},"Cloud costs."," Token prices are dropping, but the volumes consumed are exploding even faster with agents. Internalizing some repetitive workloads becomes economically interesting.",[87,283,284,287],{},[26,285,286],{},"Consumer hardware is improving."," What requires a MacBook Pro M3 with 36 GB today will probably run on an entry-level machine in a few years. Consumer chips increasingly embed processors dedicated to AI.",[87,289,290,293],{},[26,291,292],{},"Open source models are progressing fast"," in quality-to-size ratio. Gemma, Llama, Qwen, Mistral are closing part of the gap with closed models.",[87,295,296,299],{},[26,297,298],{},"The data question."," For sensitive industries (healthcare, legal, finance, HR), sending data to OpenAI or Anthropic is legally tricky. Local AI solves part of that.",[10,301,302],{},"The pattern that seems to be emerging is a mix: local models for volume and repetitive tasks, cloud models for complexity and advanced reasoning. The right tool for the right job.",[39,304,306],{"id":305},"conclusion","Conclusion",[10,308,309],{},"This first dive into local AI made me want to dig deeper. LM Studio makes installing and exposing a local API very simple, and Gemma 4 is capable enough for targeted tasks like lead qualification.",[10,311,312],{},"On my next projects that involve volume on repetitive tasks, local AI will clearly be among the candidates for my stack.",[10,314,315],{},"And you, have you already calculated what your cloud AI calls cost you, and tested a local setup?",[10,317,318,319,323],{},"📌 If you want to think about the place of AI (local or cloud) in your product, ",[14,320,322],{"href":321},"/en/services/product-engineering","discover my Product Engineering services",".",[39,325,327],{"id":326},"sources","Sources",[84,329,330,336,343,350],{},[87,331,332],{},[14,333,335],{"href":76,"rel":334},[78],"LM Studio – Local AI on your computer",[87,337,338],{},[14,339,342],{"href":340,"rel":341},"https://lmstudio.ai/docs/developer/core/server",[78],"LM Studio as a Local LLM API Server – Documentation",[87,344,345],{},[14,346,349],{"href":347,"rel":348},"https://deepmind.google/models/gemma/gemma-4/",[78],"Gemma 4 – Google DeepMind",[87,351,352],{},[14,353,356],{"href":354,"rel":355},"https://codersera.com/blog/gemma-4-complete-guide-2026/",[78],"Gemma 4 Complete Guide 2026 – Codersera",{"title":358,"searchDepth":359,"depth":359,"links":360},"",2,[361,362,363,364,365,366,367],{"id":41,"depth":359,"text":42},{"id":122,"depth":359,"text":123},{"id":190,"depth":359,"text":191},{"id":239,"depth":359,"text":240},{"id":269,"depth":359,"text":270},{"id":305,"depth":359,"text":306},{"id":326,"depth":359,"text":327},"16","ai","2026-06-08T12:00:00.000Z","Over the past year, I've built a series of projects with and around AI: agents to find content ideas, MCP servers, scripts that qualify data, automations.","md",null,"/img/blog/blog16.jpg","en","page",{},true,"/en/blog/local-ai-lm-studio-gemma-prospecting",6,{"description":382,"title":5},"Feedback on running an AI locally with LM Studio and Google's Gemma 4 for a prospecting web app.",{"loc":379},"en/blog/16.local-ai-lm-studio-gemma-prospecting","roVtqksZ-zyDdmI6w13dKujmnlzPERPqp88fTTfBCkU",[373,387],{"title":388,"path":389,"stem":390,"children":-1},"What if AI content production meant the death of SEO?","/en/blog/ai-content-production-death-of-seo","en/blog/15.ai-content-production-death-of-seo",1780919246445]