Predicato is an open-source framework for building knowledge graphs that evolve over time. Extract entities and relationships from documents using local ML models, store them with full temporal context, and query across your entire knowledge base — all without sending a single byte to an external API.
Go library with a Python client. Embedded databases. Local ML via Rust FFI. No API keys, no vendor lock-in, no recurring costs.
View on GitHub Learn MoreMost knowledge systems require cloud APIs for every extraction and query. Predicato runs entirely on your hardware — no API keys, no per-call costs, no data leaving your network.
Embeddings (Qwen3-0.6B), text generation (SmolLM2), NER (GLiNER), and reranking all run locally via Rust FFI. Models auto-download from HuggingFace and cache locally. Zero external API calls required.
Every fact tracks two time dimensions: when it was recorded and when it was actually true. Query "what did we know as of last Tuesday" or "show me facts that were later corrected." Essential for compliance and audit trails.
No per-query fees. No per-extraction charges. No monthly API bills that scale with usage. Extract once, query forever. Your knowledge, your infrastructure, your budget.
Predicato separates expensive extraction from flexible modeling — extract facts once, build multiple graph views without re-processing.
Documents, conversations, and events are processed through entity extraction (GLiNER + NLP) to produce entities, relationships, embeddings, contextual triples, and conditional rules. These persist durably in PostgreSQL with vector embeddings — extraction is expensive, so you only do it once.
Backends: PostgreSQL + VectorChord (production), DuckDB (embedded)
Facts from the store are resolved (entity deduplication, relationship merging), enriched with community detection, and assembled into a queryable graph. Regenerate the graph with different parameters without re-extracting — experiment freely.
Backends: CozoDB, DuckDB+DuckPGQ, Ladybug (embedded); Neo4j, Memgraph (external)
Extract structured knowledge from unstructured sources. Search across it intelligently. Keep it current as the world changes.
Feed in PDFs, text, HTML, or any document. Predicato extracts entities, relationships, contextual triples (with conditions, temporality, certainty, and scope), and conditional rules (IF-THEN-UNLESS patterns) — all with source attribution.
Combine semantic similarity (cosine on embeddings), BM25 keyword matching, and graph traversal in a single query. Five reranking strategies: RRF, MMR, cross-encoder, node distance, and episode mentions.
Automatically identify and merge duplicate entities across sources. "Dr. Smith," "John Smith MD," and "J. Smith, Internal Medicine" resolve to the same node in the graph.
Add new documents to an existing graph without reprocessing everything. Predicato resolves new entities against existing ones and integrates new relationships into the graph incrementally.
Discover clusters of related entities automatically. Useful for identifying topic areas, organizational structures, or conceptual groupings within your knowledge base.
Predicato extracts IF-THEN-UNLESS patterns from documents: "If CPT 70553 AND outpatient setting, THEN prior auth required, UNLESS emergency." These rules are queryable and auditable.
Start embedded, scale to external services. Swap any component without changing your application code.
Embedded: CozoDB, DuckDB+DuckPGQ, Ladybug
External: Neo4j, Memgraph
Local: go-candle (Qwen3-0.6B)
Cloud: OpenAI, Gemini
Local: go-candle (SmolLM2-360M)
Cloud: OpenAI, any compatible API
Local: GLiNER (ONNX)
Cloud: GLiNER2 API, LLM-based
Give your AI agents persistent, queryable memory that tracks what they've learned over time. Multi-turn conversations become knowledge that persists across sessions.
Go beyond vector search. Predicato's hybrid search (semantic + keyword + graph traversal) with 5 reranking strategies produces more relevant retrieval than embeddings alone.
Bi-temporal tracking means you always know what you knew and when. Source attribution traces every fact back to its origin document. Essential for regulated industries.
Air-gapped environments, on-premises installations, or simply reducing cloud spend. Predicato's embedded stack means no internet required after initial model download.
Built for real workloads, not just demos.
If a cloud provider is configured and fails, Predicato falls back to local models automatically. Token usage tracking and cost calculation built in.
GroupID-based isolation lets multiple applications or organizations share a single Predicato instance without data leakage.
Write-ahead logging for embedded databases. Telemetry with DB persistence for operational monitoring. MCP server support for tool integration.