Ollama + Spring Boot Integration: Build Your Own Local AI API

With the rise of local AI models, tools like Ollama make it easy to run powerful models on your machine. Combine that with Spring Boot, and you can build your own AI-powered backend API—without relying on external services.

In this guide, you’ll learn how to:

  • Run Ollama locally
  • Integrate it with Spring Boot
  • Build a REST API that talks to LLMs
  • Handle real-world use cases

📌 Architecture Overview

Client (Postman / UI)

Spring Boot REST API

Ollama Local API (localhost:11434)

LLM Model (llama3 / mistral / etc.)

👉 Your Spring Boot app acts as a middleware layer between users and the LLM.


⚙️ Step 1: Run Ollama

Install and start a model:

ollama run llama3

👉 This starts Ollama server at:

http://localhost:11434

🌐 Step 2: Ollama API Endpoint

POST request:

http://localhost:11434/api/generate

Sample Request:

{
"model": "llama3",
"prompt": "Explain Java threads"
}

🏗️ Step 3: Create Spring Boot Project

Use dependencies:

  • Spring Web
  • Lombok (optional)

📦 Step 4: Create Request/Response DTO

class OllamaRequest {
private String model;
private String prompt;
}
class OllamaResponse {
private String response;
}

🔌 Step 5: Service Layer (Calling Ollama)

Using Spring’s modern HTTP client:

import org.springframework.web.client.RestClient;
import org.springframework.stereotype.Service;

@Service
public class OllamaService {

private final RestClient restClient = RestClient.create("http://localhost:11434");

public String generate(String prompt) {
String requestBody = """
{
"model": "llama3",
"prompt": "%s"
}
""".formatted(prompt);

return restClient.post()
.uri("/api/generate")
.body(requestBody)
.retrieve()
.body(String.class);
}
}

🎯 Step 6: REST Controller

import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/ai")
public class AIController {

private final OllamaService service;

public AIController(OllamaService service) {
this.service = service;
}

@GetMapping("/ask")
public String ask(@RequestParam String prompt) {
return service.generate(prompt);
}
}

▶️ Step 7: Test Your API

👉 Call from browser/Postman:

http://localhost:8080/api/ai/ask?prompt=Explain multithreading in Java

🔄 Step 8: Dynamic Model Selection

Make your API flexible:

public String generate(String model, String prompt) {
String body = """
{
"model": "%s",
"prompt": "%s"
}
""".formatted(model, prompt);

return restClient.post()
.uri("/api/generate")
.body(body)
.retrieve()
.body(String.class);
}

👉 Now you can use:

  • llama3 → general
  • mistral → fast
  • codellama → coding

⚡ Step 9: Async Processing (Important)

Use CompletableFuture for non-blocking calls:

import java.util.concurrent.CompletableFuture;

public CompletableFuture<String> generateAsync(String prompt) {
return CompletableFuture.supplyAsync(() -> generate(prompt));
}

🧠 Real-World Use Cases

🤖 AI Chatbot

  • Use llama3 for conversation

💻 Coding Assistant

  • Use codellama

📄 Content Generator

  • Blogs, emails, documentation

🧩 Agentic AI

  • Decision-making workflows

🔐 Best Practices

✔ Run Ollama on same server for low latency
✔ Use thread pool (ExecutorService)
✔ Add timeout handling
✔ Cache frequent responses
✔ Validate user input


⚠️ Common Issues

❌ Empty Response

👉 Ollama streams response → handle properly


❌ Slow Performance

👉 Use smaller models like:

  • mistral
  • phi3

❌ Memory Issues

👉 Large models require high RAM


🚀 Advanced Enhancements

  • 🔄 Streaming response handling
  • 🧠 Context memory (chat history)
  • 🔍 RAG (Retrieval-Augmented Generation)
  • 🔐 Authentication layer
  • 📊 Logging + monitoring

🎯 Conclusion

By integrating Ollama with Spring Boot, you can:

  • Build private AI systems
  • Avoid external API costs
  • Create scalable AI-powered applications

👉 This is perfect for:

  • Enterprise apps
  • Internal tools
  • AI experiments