Data Model

Hoard stores indexed content in a structured data model with entities (documents) and chunks (text spans).

Schema Overview

┌─────────────────┐       ┌─────────────────┐
│    entities     │──────<│     chunks      │
│  (documents)    │       │  (text spans)   │
└─────────────────┘       └─────────────────┘
         │                        │
         ▼                        ▼
┌─────────────────┐       ┌─────────────────┐
│  entities_fts   │       │   chunks_fts    │
└─────────────────┘       └─────────────────┘
                                  │
                                  ▼
                          ┌─────────────────┐
                          │   embeddings    │
                          └─────────────────┘

Entities Table

Entities represent documents — files, notes, bookmarks, etc.

CREATE TABLE entities (
    id TEXT PRIMARY KEY,
    source TEXT NOT NULL,
    source_id TEXT NOT NULL,
    entity_type TEXT NOT NULL,
    title TEXT,
    uri TEXT,
    mime_type TEXT,
    tags TEXT,                    -- JSON array
    tags_text TEXT,               -- Space-joined for FTS
    metadata JSON,
    sensitivity TEXT DEFAULT 'normal',
    created_at DATETIME,
    updated_at DATETIME,
    synced_at DATETIME,
    last_seen_at DATETIME,
    tombstoned_at DATETIME,
    content_hash TEXT,            -- SHA256[:32]
    connector_name TEXT,          -- Provenance
    connector_version TEXT,
    UNIQUE(source, source_id)
);

Entity Fields

Field	Type	Description
`id`	TEXT	SHA256 hash of `source:source_id` (first 32 hex chars)
`source`	TEXT	Connector source name (e.g., “obsidian”)
`source_id`	TEXT	Stable ID within source
`entity_type`	TEXT	Type: “document”, “note”, “bookmark”
`title`	TEXT	Human-readable title
`uri`	TEXT	Link to original (file://, obsidian://)
`mime_type`	TEXT	MIME type (text/markdown, etc.)
`tags`	JSON	Array of tags
`metadata`	JSON	Source-specific fields
`sensitivity`	TEXT	”normal”, “sensitive”, or “secret”
`content_hash`	TEXT	SHA256 hash for change detection
`connector_name`	TEXT	Which connector indexed this

Chunks Table

Chunks are text spans within entities — the unit of search and retrieval.

CREATE TABLE chunks (
    id TEXT PRIMARY KEY,          -- {entity_id}:{chunk_index}
    entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    content_hash TEXT NOT NULL,   -- SHA256[:32]
    char_offset_start INTEGER,
    char_offset_end INTEGER,
    chunk_type TEXT DEFAULT 'semantic',
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(entity_id, chunk_index)
);

Chunk Fields

Field	Type	Description
`id`	TEXT	Format: `{entity_id}:{index}`
`entity_id`	TEXT	Parent entity reference
`chunk_index`	INTEGER	Position in document (0-based)
`content`	TEXT	The text content
`content_hash`	TEXT	Hash for deduplication
`char_offset_start`	INTEGER	Start position in original
`char_offset_end`	INTEGER	End position in original
`chunk_type`	TEXT	How chunk was created

Chunk ID Format

Chunk IDs are predictable and stable:

{entity_id}:{chunk_index}

Example: abc123def456:2
         ────────────  │
              │        └── Third chunk (0-indexed)
              └── Entity ID (SHA256 hash prefix)

This format enables:

Precise citations
Direct chunk retrieval
Stable references over time

Identity Rules

Source IDs must be stable across content edits:

Source	source_id Format	Stability
Local files	Absolute path	Rename = new entity
Obsidian	`{vault}:{relative_path}`	Rename = new entity
Chrome bookmarks	Bookmark ID	Stable
Notion export	Page UUID from filename	Stable

Tombstoning

When content is deleted or moved:

On sync: last_seen_at = now()
On disappearance: tombstoned_at = now()

Tombstoned entities remain in the database indefinitely (soft delete). This preserves history and allows recovery if files are temporarily moved.

Full-Text Search Tables

FTS5 virtual tables for fast text search:

-- Entity title and tags
CREATE VIRTUAL TABLE entities_fts USING fts5(
    title, tags_text, content='entities'
);

-- Chunk content
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    content, content='chunks'
);

Triggers keep FTS tables in sync with main tables.

Embeddings Table

Vector embeddings for semantic search (optional):

CREATE TABLE embeddings (
    chunk_id TEXT PRIMARY KEY REFERENCES chunks(id),
    model TEXT NOT NULL,
    vector BLOB NOT NULL,         -- Float32 array
    dims INTEGER NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Orchestrator Tables (Beta)

Orchestrator stores coordination data alongside entities and memory.

Agents and Capabilities

CREATE TABLE agents (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    agent_type TEXT NOT NULL,
    status TEXT NOT NULL,
    registered_at DATETIME,
    last_heartbeat_at DATETIME,
    scopes TEXT,
    metadata JSON
);

CREATE TABLE agent_capabilities (
    agent_id TEXT NOT NULL,
    capability TEXT NOT NULL,
    proficiency TEXT DEFAULT 'standard'
);

Tasks, Dependencies, and Artifacts

CREATE TABLE tasks (
    id TEXT PRIMARY KEY,
    workflow_id TEXT,
    workflow_step_id TEXT,
    name TEXT NOT NULL,
    status TEXT NOT NULL,
    assigned_agent_id TEXT,
    input_data JSON,
    output_summary TEXT,
    estimated_cost_usd REAL
);

CREATE TABLE task_dependencies (
    task_id TEXT NOT NULL,
    depends_on_task_id TEXT NOT NULL,
    dependency_type TEXT DEFAULT 'completion'
);

CREATE TABLE task_artifacts (
    id TEXT PRIMARY KEY,
    task_id TEXT NOT NULL,
    artifact_type TEXT NOT NULL,
    name TEXT NOT NULL,
    content_text TEXT,
    content_blob_path TEXT
);

Events and Cost Ledger

CREATE TABLE events (
    id TEXT PRIMARY KEY,
    event_type TEXT NOT NULL,
    agent_id TEXT,
    task_id TEXT,
    workflow_id TEXT,
    payload JSON NOT NULL,
    published_at DATETIME,
    expires_at DATETIME
);

CREATE TABLE cost_ledger (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    agent_id TEXT NOT NULL,
    task_id TEXT,
    workflow_id TEXT,
    model TEXT NOT NULL,
    provider TEXT NOT NULL,
    tokens_input INTEGER,
    tokens_output INTEGER,
    estimated_cost_usd REAL
);

CREATE TABLE cost_budgets (
    scope_type TEXT NOT NULL,
    scope_id TEXT,
    period TEXT NOT NULL,
    amount_usd REAL NOT NULL
);

Workflows and Steps

CREATE TABLE workflows (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    definition JSON NOT NULL,
    status TEXT NOT NULL,
    trigger_type TEXT
);

CREATE TABLE workflow_steps (
    id TEXT PRIMARY KEY,
    workflow_id TEXT NOT NULL,
    step_key TEXT NOT NULL,
    status TEXT NOT NULL,
    task_id TEXT
);

Orchestrator Statuses

Task statuses:

pending
queued
claimed
running
completed
failed
cancelled
timed_out

Workflow statuses:

draft
running
paused
completed
failed
cancelled

Workflow step statuses:

pending
ready
running
completed
failed
skipped

Search Results

Search returns chunks grouped by entity:

{
  "results": [
    {
      "entity_id": "abc-123",
      "entity_title": "Project Notes",
      "source": "local_files",
      "uri": "file:///path/to/file.md",
      "chunks": [
        {
          "chunk_id": "abc-123:2",
          "content": "...",
          "score": 0.87,
          "char_offset_start": 1200,
          "char_offset_end": 1850
        }
      ]
    }
  ],
  "next_cursor": null
}

Note: Results use uri (not entity_uri).

Next Steps

Chunking — How documents are split
Search — How search uses this model
Connector SDK — EntityInput and ChunkInput types