1. Introduction
The rise of Large Language Models (LLMs) like GPT-4, LLaMA, and Mistral has transformed the way developers build intelligent applications. From natural language understanding to AI assistants, these models are becoming central in software development. But most of them rely on cloud-based APIs, which raises concerns about privacy, cost, and dependency on external providers.
This is where Ollama comes in. Ollama is an open-source project that makes it simple to run, manage, and interact with large language models locally on your own machine. Whether you’re experimenting with AI, developing an application, or just want to avoid sending sensitive data to the cloud, Ollama provides a practical solution.
In this guide, we’ll explore what Ollama is, why it matters, how to install and configure it, and how to interact with models using Bash and Python. By the end, you’ll have a full setup ready to run locally hosted LLMs, automated to start with your system, and connected to your development workflow.
2. What Is Ollama and Why Choose It?
At its core, Ollama is a local runtime and model manager for large language models. Instead of relying on remote APIs, Ollama allows you to pull, run, and interact with models directly on your computer.
Key Features:
- Local-first approach → Your data never leaves your machine.
- Prebuilt models available → You can quickly pull models like
llama2,mistral,codellama, and more. - Simple CLI interface → Easy commands for pulling, running, and managing models.
- REST API → Can integrate with your applications seamlessly.
- Cross-platform → Works on macOS, Linux, and Windows (with WSL support).
Why Choose Ollama?
- Privacy → Keep conversations and data local.
- Performance → With a powerful GPU, you can run models at good speeds without internet latency.
- Flexibility → Experiment with multiple models and compare outputs.
- Cost efficiency → Avoid paying per-token API charges.
- Integration → Use Ollama in scripts, automation pipelines, or backend services.
If you’ve ever wanted to run your own “ChatGPT” locally, Ollama is one of the best ways to do it.
3. Installation
Ollama is straightforward to install. The exact method depends on your operating system.
macOS
Run the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
This will download the Ollama binary and set it up on your system.
Linux
For Linux distributions:
curl -fsSL https://ollama.com/install.sh | sh
The same script works across most modern Linux distros. It places Ollama in /usr/local/bin.
You may need to restart your terminal or run:
source ~/.bashrc
to make sure the ollama command is available.
Windows
Currently, Ollama supports Windows via WSL2. You’ll need to install Ubuntu (or another supported distro) inside WSL and then run the Linux installation script.
The installation script will ask you to your sudo password to elevate execution to create all services files.
4. Running Ollama
After installation, you can confirm Ollama is working with:
ollama --version
To run a model (example: LLaMA 2):
ollama run llama2
This command will:
- Pull the
llama2model if it’s not already downloaded. - Start an interactive session where you can chat with the model directly.
You can exit with Ctrl+D or Ctrl+C.
5. Adding Ollama as a Service and Making It Autostart
For server setups or developers who want Ollama always available, running it as a service makes sense.
Linux (systemd)
Create a new systemd service file:
sudo nano /etc/systemd/system/ollama.service
Paste:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin"
[Install]
WantedBy=default.target
Then enable and start:
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Now Ollama will run in the background and autostart on boot.
macOS
On macOS, you can use launchd or rely on starting it manually. The default installer often sets Ollama to autostart already.
6. How to Search for Models
Ollama provides a simple way to discover available models.
Run:
ollama list
This shows the models you’ve already pulled.
For browsing available models online, you can visit:
👉 https://ollama.com/search
7. Differences Between Models
Not all models are equal. The main differences come down to:
- Architecture
- LLaMA 2: General-purpose conversational AI.
- Mistral: Fast, lightweight, optimized for smaller setups.
- CodeLlama: Optimized for programming tasks.
- Size
Models come in different parameter sizes (7B, 13B, 70B). Larger models are more powerful but require more GPU a memory. - Use Case
- Chat →
llama2,mistral. - Coding →
codellama. - Creative writing →
llama2-creative. - Fast inference on smaller machines →
mistral.
- Chat →
Choosing a model depends on your hardware and task.
8. How to Pull Models
To download a model:
ollama pull llama2
This fetches the latest version of LLaMA 2.
For a specific size:
ollama pull llama2:13b
Models are stored locally in Ollama’s data directory, so once downloaded, you don’t need the internet to use them.
9. Request Examples in Bash and Python
Ollama runs a REST API on port 11434 by default when the service is running.
Example: Bash (using curl)
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Write a haiku about Docker containers"
}'
Output will be JSON with the model’s response.
Example: Python
import requests
url = "http://localhost:11434/api/generate"
payload = {
"model": "llama2",
"prompt": "Explain quantum computing in simple terms"
}
response = requests.post(url, json=payload)
print(response.json()["response"])
This integrates Ollama into any Python project. You can build chatbots, automation scripts, or even plug it into your backend.
10. Conclusion
Ollama is a powerful tool for anyone who wants to run and manage large language models locally. It eliminates cloud dependency, improves privacy, and provides a flexible way to experiment with AI.
We covered:
- What Ollama is and why it matters
- Installation on different platforms
- Running models and making Ollama a service
- Searching, comparing, and pulling models
- Request examples with Bash and Python
By now, you should have a working Ollama setup capable of running LLMs directly on your own hardware. Whether you’re building AI-powered apps, experimenting with coding assistants, or just curious about LLMs, Ollama gives you the control and freedom to do it locally.
In a world where AI is increasingly centralized, tools like Ollama put power back in the hands of developers.