How do I get my local AI working?

The promise of local AI fascinates me: the ability to run powerful AI models directly on my own hardware, without relying on cloud services. For me, this goes beyond just technical freedom; it also touches on the growing debate surrounding the ethical implications of AI, such as privacy, transparency, and the responsible use of these technologies. How can I ensure I maintain control and make ethically sound choices? My search for a concrete solution led me to LocalAI, and I'd like to share my initial experiences with it.

The Leap of Faith with LocalAI: The All-in-One Approach

My goal was to have a working local AI environment as quickly as possible. LocalAI proved to be an excellent option for this with its "All-in-One" approach. What I found appealing was that it involves a Docker container that downloads and configures everything needed—from the AI engine to the necessary dependencies—in one go. I prepared myself for a bit of a wait, because downloading AI models, as I expected, takes a considerable amount of time. A good cup of coffee was certainly a welcome addition while the gigabytes were rolling in.

Below is an example of the docker compose.yaml I used. If you have a modern NVIDIA card, you can use an NVIDIA image and be sure to uncomment the deploy section.

services:
  api:
    image: localai/localai:latest-aio-cpu
    # For a specific version:
    #image: localai/localai:v2.29.0-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    #image: localai/localai:v2.29.0-aio-gpu-nvidia-cuda-11
    # image: localai/localai:v2.29.0-aio-gpu-nvidia-cuda-12
    #image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    #image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 5m
      timeout: 20m
      retries: 5
    ports:
      - 8510:8080
    environment:
      - DEBUG=true
      # ...
    volumes:
      - /mnt/localai/models:/models:cached
    # decomment the following piece if running with Nvidia GPUs
    #deploy:
    #  resources:
    #    reservations:
    #      devices:
    #        - driver: nvidia
    #          count: 1
    #          capabilities: [gpu]

The NVIDIA card is not immediately available in Docker itself. To use it in a container, you'll need to add the following to the runtimes in /etc/docker/daemon.json:

"runtimes": {
	"nvidia": {
    		"path": "/usr/bin/nvidia-container-runtime",
    		"runtimeArgs": []
}
},
"default-runtime": "nvidia"

Don't forget to restart the Docker daemon.

Exploring the Local AI Environment

After the initial setup and downloading the base models, the LocalAI web interface opened. This is where the magic happens. I could view all the components clearly and immediately start downloading new models.

My first interaction was with a locally downloaded version of a chat model, similar to ChatGPT-4. I clicked the chat button and asked a simple question. The speed of the response obviously depends heavily on the available hardware, and I immediately noticed that my system was doing its best. The response came, and it gave me an immediate sense of control over the AI.

Curious about the performance of specific models, I immediately downloaded Google's latest local AI model, Gemma-3 ( gemma-3-1b-it ). This model works surprisingly smoothly when asking a question. The response was fast and accurate, which bodes well for future experiments.

What I've noticed so far is that the large language models (LLMs) work excellently without extensive prompts or additional data. The responses are fast, and the models perform their tasks efficiently. However, as soon as I include additional contextual data, such as information from my Home Assistant setup, I noticed that the model takes considerably longer to process. It's an interesting tension between the depth of context and the speed of the response.

You can find more about LocalAI at https://localai.io

A New Chapter in Digital Experiments

My first goal has been achieved: I now have a functioning local AI environment. LocalAI's all-in-one approach has made setting up this environment surprisingly straightforward. It's clear there's still much to discover about using local AI for specific tasks. But the foundation has been laid, and I'm looking forward to further experimenting with the possibilities this local AI offers me. I'm also looking forward to exploring the additional features the LocalAGI and LocalRecall extensions bring.

LocalAI: No cloud, no limits, no compromise.

KVK	89728459
VAT	NL005257868B32
IBAN	NL43 BUNQ 2152 7032 19
BIC	BUNQNL2AXXX

How do I get my local AI working?

The Leap of Faith with LocalAI: The All-in-One Approach

Exploring the Local AI Environment

A New Chapter in Digital Experiments

Articles from this series