Integrating local large language models (LLMs) into your Python projects using Ollama is a great strategy for improving privacy, reducing costs, and building offline-capable AI-powered apps.
Ollama is an open-source platform that makes it straightforward to run modern LLMs locally on your machine. Once you’ve set up Ollama and pulled the models you want to use, you can connect to them from Python using the ollama library.
Here’s a quick demo:
In this tutorial, you’ll integrate local LLMs into your Python projects using the Ollama platform and its Python SDK.
You’ll first set up Ollama and pull a couple of LLMs. Then, you’ll learn how to use chat, text generation, and tool calling from your Python code. These skills will enable you to build AI-powered apps that run locally, improving privacy and cost efficiency.
Get Your Code: Click here to download the free sample code that you’ll use to integrate LLMs With Ollama and Python.
Take the Quiz: Test your knowledge with our interactive “How to Integrate Local LLMs With Ollama and Python” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Integrate Local LLMs With Ollama and PythonTest your understanding of using Ollama with Python to run local LLMs, chat, generate text, and call tools. Build private offline apps today.
Prerequisites
To work through this tutorial, you’ll need the following resources and setup:
- Ollama installed and running: You’ll need Ollama to use local LLMs. You’ll get to install it and set it up in the next section.
- Python 3.8 or higher: You’ll be using Ollama’s Python software development kit (SDK), which requires Python 3.8 or higher. If you haven’t already, install Python on your system to fulfill this requirement.
- Models to use: You’ll use
llama3.2:latestandcodellama:latestin this tutorial. You’ll download them in the next section. - Capable hardware: You need relatively powerful hardware to run Ollama’s models locally, as they may require considerable resources, including memory, disk space, and CPU power. You may not need a GPU for this tutorial, but local models will run much faster if you have one.
With these prerequisites in place, you’re ready to connect local models to your Python code using Ollama.
Step 1: Set Up Ollama, Models, and the Python SDK
Before you can talk to a local model from Python, you need Ollama running and at least one model downloaded. In this step, you’ll install Ollama, start its background service, and pull the models you’ll use throughout the tutorial.
Get Ollama Running
To get started, navigate to Ollama’s download page and grab the installer for your current operating system. You’ll find installers for Windows 10 or newer and macOS 14 Sonoma or newer. Run the appropriate installer and follow the on-screen instructions. For Linux users, the installation process differs slightly, as you’ll learn soon.
On Windows, Ollama will run in the background after installation, and the CLI will be available for you. If this doesn’t happen automatically for you, then go to the Start menu, search for Ollama, and run the app.
On macOS, the app manages the CLI and setup details, so you just need to launch Ollama.app.
If you’re on Linux, install Ollama with the following command:
$ curl -fsSL https://ollama.com/install.sh | sh
Once the process is complete, you can verify the installation by running:
$ ollama -v
If this command works, then the installation was successful. Next, start Ollama’s service by running the command below:
$ ollama serve
That’s it! You’re now ready to start using Ollama on your local machine. In some Linux distributions, such as Ubuntu, this final command may not be necessary, as Ollama may start automatically when the installation is complete. In that case, running the command above will result in an error.



