Data Model
Hoard stores indexed content in a structured data model with entities (documents) and chunks (text spans).
Schema Overview
┌─────────────────┐ ┌─────────────────┐│ entities │──────<│ chunks ││ (documents) │ │ (text spans) │└─────────────────┘ └─────────────────┘ │ │ ▼ ▼┌─────────────────┐ ┌─────────────────┐│ entities_fts │ │ chunks_fts │└─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ embeddings │ └─────────────────┘Entities Table
Entities represent documents — files, notes, bookmarks, etc.
CREATE TABLE entities ( id TEXT PRIMARY KEY, source TEXT NOT NULL, source_id TEXT NOT NULL, entity_type TEXT NOT NULL, title TEXT, uri TEXT, mime_type TEXT, tags TEXT, -- JSON array tags_text TEXT, -- Space-joined for FTS metadata JSON, sensitivity TEXT DEFAULT 'normal', created_at DATETIME, updated_at DATETIME, synced_at DATETIME, last_seen_at DATETIME, tombstoned_at DATETIME, content_hash TEXT, -- SHA256[:32] connector_name TEXT, -- Provenance connector_version TEXT, UNIQUE(source, source_id));Entity Fields
| Field | Type | Description |
|---|---|---|
id | TEXT | SHA256 hash of source:source_id (first 32 hex chars) |
source | TEXT | Connector source name (e.g., “obsidian”) |
source_id | TEXT | Stable ID within source |
entity_type | TEXT | Type: “document”, “note”, “bookmark” |
title | TEXT | Human-readable title |
uri | TEXT | Link to original (file://, obsidian://) |
mime_type | TEXT | MIME type (text/markdown, etc.) |
tags | JSON | Array of tags |
metadata | JSON | Source-specific fields |
sensitivity | TEXT | ”normal”, “sensitive”, or “secret” |
content_hash | TEXT | SHA256 hash for change detection |
connector_name | TEXT | Which connector indexed this |
Chunks Table
Chunks are text spans within entities — the unit of search and retrieval.
CREATE TABLE chunks ( id TEXT PRIMARY KEY, -- {entity_id}:{chunk_index} entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE, chunk_index INTEGER NOT NULL, content TEXT NOT NULL, content_hash TEXT NOT NULL, -- SHA256[:32] char_offset_start INTEGER, char_offset_end INTEGER, chunk_type TEXT DEFAULT 'semantic', created_at DATETIME DEFAULT CURRENT_TIMESTAMP, UNIQUE(entity_id, chunk_index));Chunk Fields
| Field | Type | Description |
|---|---|---|
id | TEXT | Format: {entity_id}:{index} |
entity_id | TEXT | Parent entity reference |
chunk_index | INTEGER | Position in document (0-based) |
content | TEXT | The text content |
content_hash | TEXT | Hash for deduplication |
char_offset_start | INTEGER | Start position in original |
char_offset_end | INTEGER | End position in original |
chunk_type | TEXT | How chunk was created |
Chunk ID Format
Chunk IDs are predictable and stable:
{entity_id}:{chunk_index}
Example: abc123def456:2 ──────────── │ │ └── Third chunk (0-indexed) └── Entity ID (SHA256 hash prefix)This format enables:
- Precise citations
- Direct chunk retrieval
- Stable references over time
Identity Rules
Source IDs must be stable across content edits:
| Source | source_id Format | Stability |
|---|---|---|
| Local files | Absolute path | Rename = new entity |
| Obsidian | {vault}:{relative_path} | Rename = new entity |
| Chrome bookmarks | Bookmark ID | Stable |
| Notion export | Page UUID from filename | Stable |
Tombstoning
When content is deleted or moved:
- On sync:
last_seen_at = now() - On disappearance:
tombstoned_at = now()
Tombstoned entities remain in the database indefinitely (soft delete). This preserves history and allows recovery if files are temporarily moved.
Full-Text Search Tables
FTS5 virtual tables for fast text search:
-- Entity title and tagsCREATE VIRTUAL TABLE entities_fts USING fts5( title, tags_text, content='entities');
-- Chunk contentCREATE VIRTUAL TABLE chunks_fts USING fts5( content, content='chunks');Triggers keep FTS tables in sync with main tables.
Embeddings Table
Vector embeddings for semantic search (optional):
CREATE TABLE embeddings ( chunk_id TEXT PRIMARY KEY REFERENCES chunks(id), model TEXT NOT NULL, vector BLOB NOT NULL, -- Float32 array dims INTEGER NOT NULL, created_at DATETIME DEFAULT CURRENT_TIMESTAMP);Orchestrator Tables (Beta)
Orchestrator stores coordination data alongside entities and memory.
Agents and Capabilities
CREATE TABLE agents ( id TEXT PRIMARY KEY, name TEXT NOT NULL UNIQUE, agent_type TEXT NOT NULL, status TEXT NOT NULL, registered_at DATETIME, last_heartbeat_at DATETIME, scopes TEXT, metadata JSON);
CREATE TABLE agent_capabilities ( agent_id TEXT NOT NULL, capability TEXT NOT NULL, proficiency TEXT DEFAULT 'standard');Tasks, Dependencies, and Artifacts
CREATE TABLE tasks ( id TEXT PRIMARY KEY, workflow_id TEXT, workflow_step_id TEXT, name TEXT NOT NULL, status TEXT NOT NULL, assigned_agent_id TEXT, input_data JSON, output_summary TEXT, estimated_cost_usd REAL);
CREATE TABLE task_dependencies ( task_id TEXT NOT NULL, depends_on_task_id TEXT NOT NULL, dependency_type TEXT DEFAULT 'completion');
CREATE TABLE task_artifacts ( id TEXT PRIMARY KEY, task_id TEXT NOT NULL, artifact_type TEXT NOT NULL, name TEXT NOT NULL, content_text TEXT, content_blob_path TEXT);Events and Cost Ledger
CREATE TABLE events ( id TEXT PRIMARY KEY, event_type TEXT NOT NULL, agent_id TEXT, task_id TEXT, workflow_id TEXT, payload JSON NOT NULL, published_at DATETIME, expires_at DATETIME);
CREATE TABLE cost_ledger ( id INTEGER PRIMARY KEY AUTOINCREMENT, agent_id TEXT NOT NULL, task_id TEXT, workflow_id TEXT, model TEXT NOT NULL, provider TEXT NOT NULL, tokens_input INTEGER, tokens_output INTEGER, estimated_cost_usd REAL);
CREATE TABLE cost_budgets ( scope_type TEXT NOT NULL, scope_id TEXT, period TEXT NOT NULL, amount_usd REAL NOT NULL);Workflows and Steps
CREATE TABLE workflows ( id TEXT PRIMARY KEY, name TEXT NOT NULL, definition JSON NOT NULL, status TEXT NOT NULL, trigger_type TEXT);
CREATE TABLE workflow_steps ( id TEXT PRIMARY KEY, workflow_id TEXT NOT NULL, step_key TEXT NOT NULL, status TEXT NOT NULL, task_id TEXT);Orchestrator Statuses
Task statuses:
pendingqueuedclaimedrunningcompletedfailedcancelledtimed_out
Workflow statuses:
draftrunningpausedcompletedfailedcancelled
Workflow step statuses:
pendingreadyrunningcompletedfailedskipped
Search Results
Search returns chunks grouped by entity:
{ "results": [ { "entity_id": "abc-123", "entity_title": "Project Notes", "source": "local_files", "uri": "file:///path/to/file.md", "chunks": [ { "chunk_id": "abc-123:2", "content": "...", "score": 0.87, "char_offset_start": 1200, "char_offset_end": 1850 } ] } ], "next_cursor": null}Note: Results use uri (not entity_uri).
Next Steps
- Chunking — How documents are split
- Search — How search uses this model
- Connector SDK — EntityInput and ChunkInput types