Ollama: A Complete Guide to Running and Managing Large Language Models Locally

1. Introduction

The rise of Large Language Models (LLMs) like GPT-4, LLaMA, and Mistral has transformed the way developers build intelligent applications. From natural language understanding to AI assistants, these models are becoming central in software development. But most of them rely on cloud-based APIs, which raises concerns about privacy, cost, and dependency on external providers.

This is where Ollama comes in. Ollama is an open-source project that makes it simple to run, manage, and interact with large language models locally on your own machine. Whether you’re experimenting with AI, developing an application, or just want to avoid sending sensitive data to the cloud, Ollama provides a practical solution.

In this guide, we’ll explore what Ollama is, why it matters, how to install and configure it, and how to interact with models using Bash and Python. By the end, you’ll have a full setup ready to run locally hosted LLMs, automated to start with your system, and connected to your development workflow.

2. What Is Ollama and Why Choose It?

At its core, Ollama is a local runtime and model manager for large language models. Instead of relying on remote APIs, Ollama allows you to pull, run, and interact with models directly on your computer.

Key Features:

Local-first approach → Your data never leaves your machine.
Prebuilt models available → You can quickly pull models like llama2, mistral, codellama, and more.
Simple CLI interface → Easy commands for pulling, running, and managing models.
REST API → Can integrate with your applications seamlessly.
Cross-platform → Works on macOS, Linux, and Windows (with WSL support).

Why Choose Ollama?

Privacy → Keep conversations and data local.
Performance → With a powerful GPU, you can run models at good speeds without internet latency.
Flexibility → Experiment with multiple models and compare outputs.
Cost efficiency → Avoid paying per-token API charges.
Integration → Use Ollama in scripts, automation pipelines, or backend services.

If you’ve ever wanted to run your own “ChatGPT” locally, Ollama is one of the best ways to do it.

3. Installation

Ollama is straightforward to install. The exact method depends on your operating system.

macOS

Run the following command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

This will download the Ollama binary and set it up on your system.

Linux

For Linux distributions:

curl -fsSL https://ollama.com/install.sh | sh

The same script works across most modern Linux distros. It places Ollama in /usr/local/bin.

You may need to restart your terminal or run:

source ~/.bashrc

to make sure the ollama command is available.

Windows

Currently, Ollama supports Windows via WSL2. You’ll need to install Ubuntu (or another supported distro) inside WSL and then run the Linux installation script.

The installation script will ask you to your sudo password to elevate execution to create all services files.

4. Running Ollama

After installation, you can confirm Ollama is working with:

ollama --version

To run a model (example: LLaMA 2):

ollama run llama2

This command will:

Pull the llama2 model if it’s not already downloaded.
Start an interactive session where you can chat with the model directly.

You can exit with Ctrl+D or Ctrl+C.

5. Adding Ollama as a Service and Making It Autostart

For server setups or developers who want Ollama always available, running it as a service makes sense.

Linux (systemd)

Create a new systemd service file:

sudo nano /etc/systemd/system/ollama.service

Paste:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin"

[Install]
WantedBy=default.target

Then enable and start:

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Now Ollama will run in the background and autostart on boot.

macOS

On macOS, you can use launchd or rely on starting it manually. The default installer often sets Ollama to autostart already.

6. How to Search for Models

Ollama provides a simple way to discover available models.

Run:

ollama list

This shows the models you’ve already pulled.

For browsing available models online, you can visit:
👉 https://ollama.com/search

7. Differences Between Models

Not all models are equal. The main differences come down to:

Architecture
- LLaMA 2: General-purpose conversational AI.
- Mistral: Fast, lightweight, optimized for smaller setups.
- CodeLlama: Optimized for programming tasks.
Size
Models come in different parameter sizes (7B, 13B, 70B). Larger models are more powerful but require more GPU a memory.
Use Case
- Chat → llama2, mistral.
- Coding → codellama.
- Creative writing → llama2-creative.
- Fast inference on smaller machines → mistral.

Choosing a model depends on your hardware and task.

8. How to Pull Models

To download a model:

ollama pull llama2

This fetches the latest version of LLaMA 2.
For a specific size:

ollama pull llama2:13b

Models are stored locally in Ollama’s data directory, so once downloaded, you don’t need the internet to use them.

9. Request Examples in Bash and Python

Ollama runs a REST API on port 11434 by default when the service is running.

Example: Bash (using `curl`)

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Write a haiku about Docker containers"
}'

Output will be JSON with the model’s response.

Example: Python

import requests

url = "http://localhost:11434/api/generate"
payload = {
    "model": "llama2",
    "prompt": "Explain quantum computing in simple terms"
}

response = requests.post(url, json=payload)
print(response.json()["response"])

This integrates Ollama into any Python project. You can build chatbots, automation scripts, or even plug it into your backend.

10. Conclusion

Ollama is a powerful tool for anyone who wants to run and manage large language models locally. It eliminates cloud dependency, improves privacy, and provides a flexible way to experiment with AI.

We covered:

What Ollama is and why it matters
Installation on different platforms
Running models and making Ollama a service
Searching, comparing, and pulling models
Request examples with Bash and Python

By now, you should have a working Ollama setup capable of running LLMs directly on your own hardware. Whether you’re building AI-powered apps, experimenting with coding assistants, or just curious about LLMs, Ollama gives you the control and freedom to do it locally.

In a world where AI is increasingly centralized, tools like Ollama put power back in the hands of developers.

Ollama: A Complete Guide to Running and Managing Large Language Models Locally

1. Introduction

2. What Is Ollama and Why Choose It?

Key Features:

Why Choose Ollama?

3. Installation

macOS

Linux

Windows

4. Running Ollama

5. Adding Ollama as a Service and Making It Autostart

Linux (systemd)

macOS

6. How to Search for Models

7. Differences Between Models

8. How to Pull Models

9. Request Examples in Bash and Python

Example: Bash (using `curl`)

Example: Python

10. Conclusion

Read next

Building My Home AI Server – Part 5: The Graphics Card Dilemma

Building My Home AI Server – Part 4: Power Supply, Assembly, and Ubuntu Installation

Building My Home AI Server – Part 3: Storage Wars and First Boot

Ollama: A Complete Guide to Running and Managing Large Language Models Locally

1. Introduction

2. What Is Ollama and Why Choose It?

Key Features:

Why Choose Ollama?

3. Installation

macOS

Linux

Windows

4. Running Ollama

5. Adding Ollama as a Service and Making It Autostart

Linux (systemd)

macOS

6. How to Search for Models

7. Differences Between Models

8. How to Pull Models

9. Request Examples in Bash and Python

Example: Bash (using curl)

Example: Python

10. Conclusion

Read next

Building My Home AI Server – Part 5: The Graphics Card Dilemma

Building My Home AI Server – Part 4: Power Supply, Assembly, and Ubuntu Installation

Building My Home AI Server – Part 3: Storage Wars and First Boot

Example: Bash (using `curl`)