Jupyter AI with local Inference using Ollama

Posted on November 12, 2025

Building AI Chatbot with Local Inference using Ollama on a debian machine

More details here on ollama installation: Ollama on Linux

Create an installation directory, for example, ~/local

mkdir -p ~/local

To install Ollama locally on Debian as a non-root user for a csh shell, follow these steps

curl -fsSL https://ollama.com/download/ollama-linux-amd64.tgz | sudo tar zx -C ~/local

Add the ollama executable to your PATH

echo 'set path = (~/local/bin $path)' >> ~/.cshrc

Source the cshrc file

source ~/.cshrc

Set up the Ollama server to run in the background as a service

Create this file with the requisite needed directories:

`~/.config/systemd/user/ollama.service`

And add this text into it (replace /home/user with home directory):

[Unit]
Description=Ollama LLM Server
After=network.target

[Service]
ExecStart=/home/user/local/bin/ollama serve
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Start the service with:

systemctl --user start ollama.service

Enable the service to start at boot up:

systemctl --user enable ollama.service

Check the status of the service:

systemctl --user status ollama.service

Check the Ollama version to confirm it’s running:

ollama -v

You can now download and enable use of a model:

ollama pull llama3

ollama pull deepseek-coder

ollama pull deepseek-coder:6.7b

ollama pull qwen2

The models are downloaded in this directory:

~/.ollama

List currently available/downloaded models:

ollama list

Print ollama activity:

ollama ps

Create a virtual environment for python:

python3 -m venv jaienv

(Replace jaienv with your desired environment name).

Activate the virtual environment (in csh):

source jaienv/bin/activate.csh

Once activated, your shell prompt should change to indicate the active environment (e.g., (jaienv)). Install JupyterLab and Jupyter AI:

pip install jupyterlab~=4.0
pip install "jupyter-ai[all]"

Register the environment as a Jupyter kernel: This step makes your new virtual environment and its installed packages available as a selectable kernel within the Jupyter notebook interface.

python3 -m ipykernel install --user --name=jaienv --display-name "Jupyter AI Environment"

Start JupyterLab:

jupyter lab

Alternately, use your own chat script:

import requests
import json
import os
import sys
import uuid

# Configuration
OLLAMA_API_URL = "http://localhost:11434/api/chat"
# The MODEL_NAME is now set via command line argument or defaults
MODEL_NAME = "deepseek-coder:1.3b" 

# System message to define the model's behavior
SYSTEM_PROMPT = "You are a helpful assistant running on a local machine."

# List to maintain conversation history
history = []

def chat_with_ollama(user_input):
    """Sends the current conversation history to Ollama and gets a response."""
    global history, MODEL_NAME

    # Add the user's message to the history
    history.append({"role": "user", "content": user_input})

    data = {
        "model": MODEL_NAME, # This uses the dynamically set model name
        "messages": history,
        "stream": False,
        "timeout": 360000
    }

    try:
        response = requests.post(OLLAMA_API_URL, json=data, timeout=600)
        response.raise_for_status() 
        
        result = response.json()
        message = result['message']
        
        # Add the assistant's message to the history
        history.append(message)
        
        return message['content']

    except requests.exceptions.RequestException as e:
        print(f"\nError communicating with Ollama: {e}")
        print(f"Please ensure Ollama is installed, running, and the model '{MODEL_NAME}' is pulled.")
        sys.exit(1)

def save_output_to_file(content):
    """Saves the last assistant response to a text file."""

    uid = uuid.uuid4().hex
    filename = f"chat_output_{uid}.txt"
    print("filename="+filename)
    try:
        with open(filename, "a") as f:
            f.write("-" * 40 + "\n")
            f.write(f"Model Used: {MODEL_NAME}\n")
            f.write(f"User Prompt: {history[-2]['content']}\n")
            f.write(f"Assistant Response:\n{content}\n")
            f.write("-" * 40 + "\n\n")
        print(f"[System] Successfully saved the last response to '{filename}'.")
    except IOError as e:
        print(f"[System] Error saving file: {e}")

def main():
    """Main loop for the chat interface."""
    global MODEL_NAME, history

    # Check for command line argument for model name
    if len(sys.argv) > 1:
        MODEL_NAME = sys.argv[1]
    
    # Initialize history with system prompt after model name is set
    history.append({"role": "system", "content": SYSTEM_PROMPT})

    print(f"--- Simple Ollama Chat Interface (Model: {MODEL_NAME}) ---")
    print("Type 'exit' or 'quit' to end the chat.")
    print("Type 'save' to save the last assistant response to a file.")
    print("------------------------------------------------------------------")

    while True:
        try:
            user_input = input("\nYou: ")
            
            if user_input.strip().lower() in ['exit', 'quit']:
                print("\nGoodbye!")
                break
            
            if user_input.strip().lower() == 'save':
                if len(history) >= 2:
                    save_output_to_file(history[-1]['content'])
                else:
                    print("[System] No response to save yet. Start chatting first.")
                continue

            if not user_input.strip():
                continue

            print(f"Ollama ({MODEL_NAME}): ", end='', flush=True)
            response_text = chat_with_ollama(user_input)
            print(response_text)
            
        except (KeyboardInterrupt, EOFError):
            print("\nExiting chat.")
            break

if __name__ == "__main__":
    main()