Search
Hoard uses hybrid search combining keyword (BM25) and semantic (vector) matching for optimal results.
Hybrid Search Pipeline
┌─────────────────────────────────────────┐│ Query: "meeting notes" │└───────────────────┬─────────────────────┘ │ ┌───────────┴───────────┐ ▼ ▼┌───────────────┐ ┌───────────────┐│ BM25 Search │ │ Vector Search ││ (keywords) │ │ (semantic) │└───────────────┘ └───────────────┘ │ │ └───────────┬───────────┘ ▼ ┌───────────────────────┐ │ Reciprocal Rank Fusion│ │ (merge rankings) │ └───────────────────────┘ │ ▼ ┌───────────────────────┐ │ Group by Entity │ └───────────────────────┘ │ ▼ ┌───────────────────────┐ │ Return Results │ └───────────────────────┘Unified Results (Documents + Memory)
Search can return both indexed documents and memory entries. Use types in MCP or --types / --no-memory in the CLI to filter.
Results include:
result_typeto distinguishentityvsmemorysourcefor the connector name ormemory- One chunk for memory entries
BM25 Search
BM25 (Best Matching 25) is a keyword-based ranking algorithm:
- Exact matching — Finds documents containing query terms
- Term frequency — More occurrences = higher rank
- Document length normalization — Long docs don’t dominate
- IDF weighting — Rare terms matter more
Implemented via SQLite FTS5:
SELECT * FROM chunks_ftsWHERE chunks_fts MATCH 'meeting notes'ORDER BY rank;Strengths:
- Fast — Pure SQL
- Precise — Exact keyword matches
- No model needed — Works immediately
Weaknesses:
- Literal — “meeting” won’t find “conference”
- No synonyms — Requires exact terms
Vector Search
Vector search uses embeddings for semantic similarity:
- Embed query — Convert to vector (e.g., 384 dimensions)
- Compare — Find nearest chunk vectors
- Rank — Order by cosine similarity
Default model: sentence-transformers/all-MiniLM-L6-v2
Strengths:
- Semantic — “meeting” finds “conference”
- Conceptual — Understands meaning
- Fuzzy — Handles paraphrasing
Weaknesses:
- Requires model download (~90MB)
- Slower than pure keyword
- May miss exact matches
Reciprocal Rank Fusion
RRF merges BM25 and vector rankings:
RRF_score = 1/(k + rank_bm25) + 1/(k + rank_vector)Where k = 60 (standard constant)
This ensures:
- Documents ranked highly by both methods score best
- Neither method dominates unfairly
- Diverse results from both approaches
Corpus Size & Prefilter
| Size | Chunks | Backend |
|---|---|---|
| Any | All sizes | SQLite brute-force |
Hoard uses SQLite for all vector operations (no external vector DBs like FAISS or Chroma).
Prefilter Strategy
When corpus exceeds 50,000 chunks:
- BM25 retrieves top candidates (configurable via
prefilter_limit) - Vector search runs on candidates only
- Results merged with RRF
This avoids scanning all embeddings for large corpora.
Search Options
Basic Search
hoard search "meeting notes"Limit Results
hoard search "query" --limit 5Filter by Source
hoard search "query" --source obsidianFilter by Result Type
hoard search "query" --types entityhoard search "query" --types memoryhoard search "query" --no-memorySearch via MCP
The search tool accepts:
{ "name": "search", "arguments": { "query": "meeting notes", "limit": 20, "types": ["entity", "memory"], "source": "obsidian" }}Result Format
Results are grouped by entity:
{ "results": [ { "result_type": "entity", "entity_id": "abc-123", "entity_title": "Project Notes", "source": "local_files", "uri": "file:///path/to/file.md", "chunks": [ { "chunk_id": "abc-123:2", "content": "In the meeting, we discussed...", "score": 0.87, "char_offset_start": 1200, "char_offset_end": 1850 } ] }, { "result_type": "memory", "entity_id": "5421d0503fadb55a413761f3745891ac", "entity_title": "user_preferences", "source": "memory", "memory_key": "user_preferences", "chunks": [ { "chunk_id": "5421d0503fadb55a413761f3745891ac", "content": "Prefers concise responses.", "score": 0.71 } ] } ], "next_cursor": null}Note: Results use uri (not entity_uri) and do not include totals — use next_cursor for pagination.
Enabling Vector Search
Vector search is optional. To enable:
# Install dependenciespip install hoard[vectors]
# Build embeddingshoard embeddings buildSearch Tips
- Use multiple terms — “project meeting notes” beats “notes”
- Be specific — Include distinguishing words
- Try variations — If no results, try synonyms
- Check sync — New content needs
hoard sync
Performance Tuning
In ~/.hoard/config.yaml:
search: rrf_k: 60 # RRF constant (higher = more even blending) max_chunks_per_entity: 3 # Max chunks returned per entity
vectors: prefilter_limit: 1000 # BM25 candidates when corpus > 50K chunksNext Steps
- MCP Interface — How AI tools use search
- Configuration — Search settings
- MCP Tools — All search parameters