Skip to content

Data Model

Hoard stores indexed content in a structured data model with entities (documents) and chunks (text spans).

Schema Overview

┌─────────────────┐ ┌─────────────────┐
│ entities │──────<│ chunks │
│ (documents) │ │ (text spans) │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ entities_fts │ │ chunks_fts │
└─────────────────┘ └─────────────────┘
┌─────────────────┐
│ embeddings │
└─────────────────┘

Entities Table

Entities represent documents — files, notes, bookmarks, etc.

CREATE TABLE entities (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
entity_type TEXT NOT NULL,
title TEXT,
uri TEXT,
mime_type TEXT,
tags TEXT, -- JSON array
tags_text TEXT, -- Space-joined for FTS
metadata JSON,
sensitivity TEXT DEFAULT 'normal',
created_at DATETIME,
updated_at DATETIME,
synced_at DATETIME,
last_seen_at DATETIME,
tombstoned_at DATETIME,
content_hash TEXT, -- SHA256[:32]
connector_name TEXT, -- Provenance
connector_version TEXT,
UNIQUE(source, source_id)
);

Entity Fields

FieldTypeDescription
idTEXTSHA256 hash of source:source_id (first 32 hex chars)
sourceTEXTConnector source name (e.g., “obsidian”)
source_idTEXTStable ID within source
entity_typeTEXTType: “document”, “note”, “bookmark”
titleTEXTHuman-readable title
uriTEXTLink to original (file://, obsidian://)
mime_typeTEXTMIME type (text/markdown, etc.)
tagsJSONArray of tags
metadataJSONSource-specific fields
sensitivityTEXT”normal”, “sensitive”, or “secret”
content_hashTEXTSHA256 hash for change detection
connector_nameTEXTWhich connector indexed this

Chunks Table

Chunks are text spans within entities — the unit of search and retrieval.

CREATE TABLE chunks (
id TEXT PRIMARY KEY, -- {entity_id}:{chunk_index}
entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
content_hash TEXT NOT NULL, -- SHA256[:32]
char_offset_start INTEGER,
char_offset_end INTEGER,
chunk_type TEXT DEFAULT 'semantic',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
UNIQUE(entity_id, chunk_index)
);

Chunk Fields

FieldTypeDescription
idTEXTFormat: {entity_id}:{index}
entity_idTEXTParent entity reference
chunk_indexINTEGERPosition in document (0-based)
contentTEXTThe text content
content_hashTEXTHash for deduplication
char_offset_startINTEGERStart position in original
char_offset_endINTEGEREnd position in original
chunk_typeTEXTHow chunk was created

Chunk ID Format

Chunk IDs are predictable and stable:

{entity_id}:{chunk_index}
Example: abc123def456:2
──────────── │
│ └── Third chunk (0-indexed)
└── Entity ID (SHA256 hash prefix)

This format enables:

  • Precise citations
  • Direct chunk retrieval
  • Stable references over time

Identity Rules

Source IDs must be stable across content edits:

Sourcesource_id FormatStability
Local filesAbsolute pathRename = new entity
Obsidian{vault}:{relative_path}Rename = new entity
Chrome bookmarksBookmark IDStable
Notion exportPage UUID from filenameStable

Tombstoning

When content is deleted or moved:

  1. On sync: last_seen_at = now()
  2. On disappearance: tombstoned_at = now()

Tombstoned entities remain in the database indefinitely (soft delete). This preserves history and allows recovery if files are temporarily moved.

Full-Text Search Tables

FTS5 virtual tables for fast text search:

-- Entity title and tags
CREATE VIRTUAL TABLE entities_fts USING fts5(
title, tags_text, content='entities'
);
-- Chunk content
CREATE VIRTUAL TABLE chunks_fts USING fts5(
content, content='chunks'
);

Triggers keep FTS tables in sync with main tables.

Embeddings Table

Vector embeddings for semantic search (optional):

CREATE TABLE embeddings (
chunk_id TEXT PRIMARY KEY REFERENCES chunks(id),
model TEXT NOT NULL,
vector BLOB NOT NULL, -- Float32 array
dims INTEGER NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Orchestrator Tables (Beta)

Orchestrator stores coordination data alongside entities and memory.

Agents and Capabilities

CREATE TABLE agents (
id TEXT PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
agent_type TEXT NOT NULL,
status TEXT NOT NULL,
registered_at DATETIME,
last_heartbeat_at DATETIME,
scopes TEXT,
metadata JSON
);
CREATE TABLE agent_capabilities (
agent_id TEXT NOT NULL,
capability TEXT NOT NULL,
proficiency TEXT DEFAULT 'standard'
);

Tasks, Dependencies, and Artifacts

CREATE TABLE tasks (
id TEXT PRIMARY KEY,
workflow_id TEXT,
workflow_step_id TEXT,
name TEXT NOT NULL,
status TEXT NOT NULL,
assigned_agent_id TEXT,
input_data JSON,
output_summary TEXT,
estimated_cost_usd REAL
);
CREATE TABLE task_dependencies (
task_id TEXT NOT NULL,
depends_on_task_id TEXT NOT NULL,
dependency_type TEXT DEFAULT 'completion'
);
CREATE TABLE task_artifacts (
id TEXT PRIMARY KEY,
task_id TEXT NOT NULL,
artifact_type TEXT NOT NULL,
name TEXT NOT NULL,
content_text TEXT,
content_blob_path TEXT
);

Events and Cost Ledger

CREATE TABLE events (
id TEXT PRIMARY KEY,
event_type TEXT NOT NULL,
agent_id TEXT,
task_id TEXT,
workflow_id TEXT,
payload JSON NOT NULL,
published_at DATETIME,
expires_at DATETIME
);
CREATE TABLE cost_ledger (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT NOT NULL,
task_id TEXT,
workflow_id TEXT,
model TEXT NOT NULL,
provider TEXT NOT NULL,
tokens_input INTEGER,
tokens_output INTEGER,
estimated_cost_usd REAL
);
CREATE TABLE cost_budgets (
scope_type TEXT NOT NULL,
scope_id TEXT,
period TEXT NOT NULL,
amount_usd REAL NOT NULL
);

Workflows and Steps

CREATE TABLE workflows (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
definition JSON NOT NULL,
status TEXT NOT NULL,
trigger_type TEXT
);
CREATE TABLE workflow_steps (
id TEXT PRIMARY KEY,
workflow_id TEXT NOT NULL,
step_key TEXT NOT NULL,
status TEXT NOT NULL,
task_id TEXT
);

Orchestrator Statuses

Task statuses:

  • pending
  • queued
  • claimed
  • running
  • completed
  • failed
  • cancelled
  • timed_out

Workflow statuses:

  • draft
  • running
  • paused
  • completed
  • failed
  • cancelled

Workflow step statuses:

  • pending
  • ready
  • running
  • completed
  • failed
  • skipped

Search Results

Search returns chunks grouped by entity:

{
"results": [
{
"entity_id": "abc-123",
"entity_title": "Project Notes",
"source": "local_files",
"uri": "file:///path/to/file.md",
"chunks": [
{
"chunk_id": "abc-123:2",
"content": "...",
"score": 0.87,
"char_offset_start": 1200,
"char_offset_end": 1850
}
]
}
],
"next_cursor": null
}

Note: Results use uri (not entity_uri).

Next Steps