Flora | Your Data. Your AI. Your Infrastructure.

Get running in minutes

The default deployment ships as a composable stack with clear service boundaries. Start with Docker Compose for local validation, then promote to Kubernetes with Helm when you are ready for production controls.

Architecture overview

A typical Flora environment is built around independent services that can scale separately by workload profile.

Doctor handles ingestion and normalized extraction from enterprise document sources.
TEI generates embeddings locally and feeds vectors into Qdrant collections.
Qdrant enforces retrieval-time constraints with metadata and role-aware filters.
vLLM serves low-latency generation with PagedAttention for efficient GPU usage.

Installation

Use one of the following commands to bring up Flora in a controlled environment. Replace image tags and values files according to your release policy.

Ingestion Pipeline

Doctor connectors can be configured for internal knowledge systems so parsing and normalization happen entirely on your network perimeter.

Supported formats

Out of the box, Flora supports PDF, DOCX, TXT, and structured metadata attachments for traceable ingestion.

Embeddings

Deploy TEI close to your API layer to reduce embedding latency and keep all vectorization traffic internal.

Model selection

Choose multilingual or domain-specific models based on recall targets, memory budget, and throughput requirements.

Vector Storage

Define shard and replication strategy in Qdrant according to collection size, SLA, and fault tolerance targets.

RBAC filtering

Attach role metadata at ingestion time and apply payload filters during retrieval so unauthorized chunks never reach generation.

Inference

Tune max model length, batch size, and request concurrency to stabilize latency under peak load.

PagedAttention

PagedAttention optimizes memory paging for KV cache management, allowing high-throughput serving with predictable GPU utilization.

API Reference

The Flora API surface includes ingestion, search, answer generation, and admin endpoints. Use this section to integrate services and automate platform workflows.

Introduction to Flora