Introduction: The Shift to Data Sovereignty in AI #
In the rapidly evolving landscape of Generative AI, Retrieval-Augmented Generation (RAG) has emerged as the standard architecture for grounding Large Language Models (LLMs) with proprietary data. While services like Pinecone or Weaviate Cloud offer convenience, they introduce significant challenges regarding data privacy, latency, and, most notably, cost at scale.
For enterprise Java developers, Spring AI combined with a self-hosted Milvus instance represents the “Holy Grail” of vector search: open-source, massive scalability, and complete control over your data infrastructure.
This guide provides a comprehensive, step-by-step walkthrough of setting up a production-ready Milvus node using Docker and integrating it with a Spring Boot application using Spring AI. We will go beyond “Hello World” and discuss architectural trade-offs, index types, and metadata filtering strategies.
Why Milvus and Spring AI? #
Before writing code, it is crucial to understand why this specific stack is gaining traction in the US and EU enterprise sectors.
1. Milvus: The Cloud-Native Vector Database #
Milvus is not just a wrapper around Lucene. It is a cloud-native vector database built from the ground up to separate storage and computation.
- Scalability: It can handle billions of vectors.
- Performance: It utilizes advanced indexing algorithms (HNSW, IVF_FLAT) accelerated by SIMD instructions.
- Ecosystem: It supports a rich set of SDKs and integrates seamlessly with the broader AI ecosystem.
2. Spring AI: The Portable Service Abstraction #
Spring AI brings the “Write Once, Run Anywhere” philosophy to AI engineering. By implementing the VectorStore interface, Spring AI allows you to switch between vector databases (e.g., from simple in-memory testing to Milvus production) with zero code changes—only configuration tweaks.
Part 1: Infrastructure Setup (Docker & Milvus) #
To simulate a production environment, we will not run Milvus in “Embedded” mode (which is for testing). We will set up a Standalone Milvus instance using Docker Compose.
The Architecture of Standalone Milvus #
A standalone Milvus setup actually consists of three components:
- Milvus: The core engine handling vector computation.
- Etcd: Stores metadata and handles service discovery.
- MinIO (S3 compatible): Stores the actual persistence data (logs and index files).
docker-compose.yml
#
Create a directory named milvus-env and create the following file. We will also include Attu, an excellent GUI for managing Milvus.
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.3.13
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: milvus-etcd:2379
MINIO_ADDRESS: milvus-minio:9000
MINIO_ACCESS_KEY_ID: minioadmin
MINIO_SECRET_ACCESS_KEY: minioadmin
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
attu:
container_name: attu
image: zilliz/attu:v2.3.10
environment:
MILVUS_URL: milvus-standalone:19530
ports:
- "8000:3000"
depends_on:
- "standalone"
networks:
default:
name: milvus
Launching the Stack #
Run the following command in your terminal:
docker-compose up -d
Once running, verify the installation:
- MinIO Console:
http://localhost:9001(User/Pass: minioadmin) - Attu UI:
http://localhost:8000- Connect to Milvus using the default standalone address. Since Attu is in the same docker network, it connects automatically. If accessing from host, ensure port mapping is correct.
Note: If you are running on an Apple Silicon (M1/M2/M3) chip, Milvus runs via Rosetta 2 seamlessly, but ensure your Docker Desktop allows experimental features if you encounter platform warnings.
Part 2: Spring Boot Project Configuration #
Now that our database is running, let’s configure the application. We assume you are using Java 17+ and Spring Boot 3.2+.
1. Maven Dependencies #
We need two primary starters: one for the embedding model (to turn text into vectors) and one for the Milvus integration.
<dependencies>
<!-- Spring AI Core -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-SNAPSHOT</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- OpenAI for Embeddings (You can also use Ollama or Transformers) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Milvus Vector Store -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId>
</dependency>
<!-- Spring Boot Starter Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
Note: As Spring AI is rapidly evolving, ensure you have the Spring Milestones and Snapshots repositories configured in your pom.xml.
2. Application Configuration (application.yml)
#
This is where the magic happens. We configure Spring AI to talk to our local Milvus instance.
spring:
application:
name: spring-ai-milvus-demo
ai:
openai:
api-key: ${OPENAI_API_KEY}
embedding:
options:
model: text-embedding-3-small # Cost-effective model
vectorstore:
milvus:
client:
host: localhost
port: 19530
username: "" # Default is empty for standalone
password: ""
collection-name: vector_store
embedding-dimension: 1536 # CRITICAL: Must match OpenAI model dimension
index-type: IVF_FLAT # Index algorithm
metric-type: COSINE # Similarity metric
Configuration Deep Dive:
embedding-dimension: This is the most common source of errors. If you use OpenAI’stext-embedding-3-small, the dimension is 1536. If you usellama3via Ollama, it might be 4096. If this doesn’t match the collection schema, Milvus will reject the insert.metric-type:COSINEis generally preferred for NLP tasks because it measures the angle between vectors (semantic similarity) regardless of magnitude.L2(Euclidean distance) is better for strict matching.
Part 3: Implementation - The Vector Service #
Let’s create a service that handles document ingestion (ETL) and retrieval (Search).
1. The Service Layer #
We will inject the VectorStore interface. This is the beauty of Spring AI: our code doesn’t technically know it’s using Milvus.
package com.springdevpro.milvus.service;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
@Service
public class RagService {
private final VectorStore vectorStore;
@Autowired
public RagService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
/**
* Ingests data into Milvus.
* In a real app, this would parse PDFs or JSONs.
*/
public void loadKnowledgeBase(List<String> textChunks) {
List<Document> documents = textChunks.stream()
.map(content -> new Document(content, Map.of("ingestion_date", "2024-05-22")))
.toList();
// This call:
// 1. Calls OpenAI to get embeddings
// 2. Connects to Milvus
// 3. Checks if collection exists (creates if not)
// 4. Inserts vectors
vectorStore.add(documents);
}
/**
* Semantic Search
*/
public List<Document> search(String query) {
// Search for top 5 results with a similarity threshold of 0.7
return vectorStore.similaritySearch(
SearchRequest.query(query)
.withTopK(5)
.withSimilarityThreshold(0.7)
);
}
}
2. The Controller #
Expose endpoints to test our setup.
package com.springdevpro.milvus.controller;
import com.springdevpro.milvus.service.RagService;
import org.springframework.ai.document.Document;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.stream.Collectors;
@RestController
@RequestMapping("/api/rag")
public class MilvusController {
private final RagService ragService;
public MilvusController(RagService ragService) {
this.ragService = ragService;
}
@PostMapping("/ingest")
public String ingest(@RequestBody List<String> chunks) {
ragService.loadKnowledgeBase(chunks);
return "Indexed " + chunks.size() + " documents into Milvus.";
}
@GetMapping("/search")
public List<String> search(@RequestParam String query) {
List<Document> results = ragService.search(query);
return results.stream()
.map(Document::getContent)
.collect(Collectors.toList());
}
}
Part 4: Advanced Milvus Setup & Tuning #
Getting it running is one thing; making it production-ready is another. Here are the critical considerations for the “Spring AI + Milvus” keywords strategy.
1. Understanding Index Types #
In application.yml, we specified IVF_FLAT. Why?
- FLAT: 100% recall (perfect accuracy) but slow on large datasets because it brute-force scans every vector.
- IVF_FLAT (Inverted File): Divides vectors into clusters (Voronoi cells). Search only checks the closest clusters. Much faster, slight loss in accuracy.
- HNSW (Hierarchical Navigable Small World): The industry standard for high performance. It builds a multi-layer graph. It is incredibly fast but consumes more memory.
Recommendation: For datasets < 1M vectors, IVF_FLAT is fine. For > 1M or high-concurrency low-latency needs, switch config to HNSW.
To use HNSW in Spring AI, you simply update the YAML:
index-type: HNSW
index-parameters: '{"M":16,"efConstruction":200}'
2. Metadata Filtering (The “Hybrid Search”) #
Pure vector search isn’t enough. Often, you want to “Find contracts similar to X but only from year 2023”.
Spring AI supports the Filter Expression Language. Milvus handles this efficiently by using scalar indexes alongside vector indexes.
public List<Document> searchWithFilter(String query, String year) {
FilterExpressionBuilder b = new FilterExpressionBuilder();
Expression filter = b.eq("ingestion_date", year).build();
return vectorStore.similaritySearch(
SearchRequest.query(query)
.withTopK(5)
.withFilterExpression(filter)
);
}
Note: Ensure your metadata keys in the Document object do not contain special characters that conflict with Milvus schema rules.
3. Consistency Levels #
Milvus offers tunable consistency (Strong, Bounded, Session, Eventually). By default, Milvus might prioritize speed over immediate consistency. If you insert a document and immediately search for it, you might miss it. To fix this in testing, you often need to force a sync or wait a few milliseconds. In Spring AI, the default implementation typically handles the necessary “flush” operations for you, but be aware of the “Bounded Staleness” concept in distributed systems.
Troubleshooting Common Issues #
1. io.milvus.exception.ServerException: dimension mismatch
#
Cause: You created a collection with OpenAI embeddings (1536 dim), then tried to switch to a local Ollama model (4096 dim) without dropping the collection.
Fix: Connect to Attu (localhost:8000), delete the collection vector_store, and restart your Spring Boot app. Milvus collections are immutable regarding dimension.
2. Connection Refused #
Cause: Docker networking issues.
Fix: Ensure host: localhost works if running the JAR outside Docker. If running the Spring App inside a container, use host: milvus-standalone.
3. Metadata Search Fails #
Cause: Milvus requires explicit scalar indexing for efficient filtering on some fields in older versions, though newer versions handle dynamic schemas better. Fix: Enable dynamic schema in Milvus configuration if you have unpredictable metadata fields. Spring AI enables Dynamic Schema by default for Milvus.
Conclusion: The Business Case for Self-Hosting #
Integrating Spring AI with Milvus moves your organization from “AI experimentation” to “AI ownership.”
- Cost Control: You are not paying per read/write unit (like in AWS DynamoDB or Pinecone). Your cost is simply the EC2/VM cost.
- Privacy: Your vectors (mathematical representations of your IP) never leave your VPC.
- Performance: Network latency is minimized when your Vector Store sits in the same Kubernetes cluster as your Spring Boot services.
This setup forms the backbone of a robust RAG pipeline. In upcoming articles, we will explore how to add Redis for caching vector results and how to use Spring Cloud Gateway to rate-limit access to your expensive LLM APIs.
Stay tuned to Spring DevPro for more architecture drills.
References: