Ollama + Spring Boot Integration: Build Your Own Local AI API

With the rise of local AI models, tools like Ollama make it easy to run powerful models on your machine. Combine that with Spring Boot, and you can build your own AI-powered backend API—without relying on external services.

In this guide, you’ll learn how to:

Run Ollama locally
Integrate it with Spring Boot
Build a REST API that talks to LLMs
Handle real-world use cases

📌 Architecture Overview

Client (Postman / UI)
        ↓
Spring Boot REST API
        ↓
Ollama Local API (localhost:11434)
        ↓
LLM Model (llama3 / mistral / etc.)

👉 Your Spring Boot app acts as a middleware layer between users and the LLM.

⚙️ Step 1: Run Ollama

Install and start a model:

ollama run llama3

👉 This starts Ollama server at:

http://localhost:11434

🌐 Step 2: Ollama API Endpoint

POST request:

http://localhost:11434/api/generate

Sample Request:

{
  "model": "llama3",
  "prompt": "Explain Java threads"
}

🏗️ Step 3: Create Spring Boot Project

Use dependencies:

Spring Web
Lombok (optional)

📦 Step 4: Create Request/Response DTO

class OllamaRequest {
    private String model;
    private String prompt;
}

class OllamaResponse {
    private String response;
}

🔌 Step 5: Service Layer (Calling Ollama)

Using Spring’s modern HTTP client:

import org.springframework.web.client.RestClient;
import org.springframework.stereotype.Service;

@Service
public class OllamaService {

    private final RestClient restClient = RestClient.create("http://localhost:11434");

    public String generate(String prompt) {
        String requestBody = """
        {
          "model": "llama3",
          "prompt": "%s"
        }
        """.formatted(prompt);

        return restClient.post()
                .uri("/api/generate")
                .body(requestBody)
                .retrieve()
                .body(String.class);
    }
}

🎯 Step 6: REST Controller

import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/ai")
public class AIController {

    private final OllamaService service;

    public AIController(OllamaService service) {
        this.service = service;
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String prompt) {
        return service.generate(prompt);
    }
}

▶️ Step 7: Test Your API

👉 Call from browser/Postman:

http://localhost:8080/api/ai/ask?prompt=Explain multithreading in Java

🔄 Step 8: Dynamic Model Selection

Make your API flexible:

public String generate(String model, String prompt) {
    String body = """
    {
      "model": "%s",
      "prompt": "%s"
    }
    """.formatted(model, prompt);

    return restClient.post()
            .uri("/api/generate")
            .body(body)
            .retrieve()
            .body(String.class);
}

👉 Now you can use:

llama3 → general
mistral → fast
codellama → coding

⚡ Step 9: Async Processing (Important)

Use CompletableFuture for non-blocking calls:

import java.util.concurrent.CompletableFuture;

public CompletableFuture<String> generateAsync(String prompt) {
    return CompletableFuture.supplyAsync(() -> generate(prompt));
}

🧠 Real-World Use Cases

🤖 AI Chatbot

Use llama3 for conversation

💻 Coding Assistant

Use codellama

📄 Content Generator

Blogs, emails, documentation

🧩 Agentic AI

Decision-making workflows

🔐 Best Practices

✔ Run Ollama on same server for low latency
✔ Use thread pool (ExecutorService)
✔ Add timeout handling
✔ Cache frequent responses
✔ Validate user input

⚠️ Common Issues

❌ Empty Response

👉 Ollama streams response → handle properly

❌ Slow Performance

👉 Use smaller models like:

mistral
phi3

❌ Memory Issues

👉 Large models require high RAM

🚀 Advanced Enhancements

🔄 Streaming response handling
🧠 Context memory (chat history)
🔍 RAG (Retrieval-Augmented Generation)
🔐 Authentication layer
📊 Logging + monitoring

🎯 Conclusion

By integrating Ollama with Spring Boot, you can:

Build private AI systems
Avoid external API costs
Create scalable AI-powered applications

👉 This is perfect for:

Enterprise apps
Internal tools
AI experiments