NihAI Architecture
NihAI: Self-Building Medical FAQ (Weaviate-backed)¶
NihAI is a RAG system purpose-built for concise, medically safe FAQ answers. It first searches a curated FAQ collection. If coverage is insufficient, an LLM agent expands the knowledge by proposing new canonical questions, answering them via Perplexity with citations, upserting them to Weaviate, and then re-running RAG.
System Overview¶
- Storage: Weaviate v4 Collections API, no server-side vectorizers. Client provides vectors.
- Indexing: HNSW + cosine. Vector is computed from
([condition_area] question + "\n\n" + answer)
usingall-MiniLM-L6-v2
(CPU device for stability on macOS). The condition_area prefix (when available) helps cluster related medical conditions. - API: FastAPI routes under
/search/nihai/*
. - Ingestion: JSON and Markdown via CLI or programmatic helpers.
- Governance: Auto-ingested answers are unverified until reviewed by approvers.
- Agent: LLM-managed Ask flow decides when to search, synthesize, and write new FAQs.
- UI: See NihAI UI for a local React admin/explorer running on
http://localhost:5176
that talks to the backend athttp://localhost:8000
.
Collections and Schema¶
NihAI
collection¶
question: text
β canonical FAQ promptanswer: text
β concise, safety-forward answerreferences: text[]
β canonicalized identifiers/URLs (e.g.,https://...
,nihai://<uuid>
)is_verified: boolean
β doctor-reviewed status (default false)approvers: text
β comma-separated reviewer namessource: text
β e.g.,perplexity
,import
,md_ingest
accuracy_confidence: number
β 0.0β1.0 heuristic for review priorityis_active: boolean
β soft delete/supersede flagcondition_area: text
β high-level medical condition or area (e.g., prenatal, diabetes, cardiology)topic: text
β subtopic within the condition area (e.g., nutrition, medication)- Vector: provided by client; computed from
([condition_area] question + "\n\n" + answer)
; HNSW + cosine distance
Reference
collection¶
title: text
url: text
β canonical locator (usenihai://<uuid>
for internal)source: text
β domain or logical source (e.g.,mayoclinic.org
,NihAI
)content_snippet: text
β optional excerpt
Entries keep a denormalized references
list for export/back-compat. Query-time expansion into full Reference
is optional.
Retrieval Flow¶
- Compute query embedding with the same model used for entries. If
condition_area
is specified, it's prepended to the query as[condition_area] query
for better semantic matching. - KNN against
NihAI
(top-k). Apply a similarity threshold and optional filters (condition_area, topic). - If top score β₯ threshold β return results directly.
- Otherwise β trigger knowledge expansion.
Tunable parameters: top-k, similarity threshold, ef/search, HNSW M.
Ask Agent (LLM-managed) and Tools¶
Tools used by the Ask agent:
- search_nihai
β non-writing RAG search against NihAI
- search_perplexity
β external web search to synthesize a concise, safety-forward answer with sources
- write_nihai
β write a new canonical Q&A into NihAI
- depricate_nihai_faq
β mark entries inactive when consolidating narrow items
Agent behavior:
1. Attempt to answer with search_nihai
using the userβs latest question.
2. If insufficient, generate 1β3 candidate canonical questions (varying specificity) and test with search_nihai
.
3. If still insufficient, select one well-scoped canonical question and call search_perplexity
to draft an answer with citations.
4. Call write_nihai
to upsert the Q&A (dedupe by UUID v5) and normalize references. Prefer balanced, general canonical prompts (e.g., βwhat fruits can I eat during pregnancyβ). When a general entry supersedes narrow ones, mark those entries inactive using depricate_nihai_faq
.
5. Re-run search_nihai
on the original question and return final results.
API Endpoints¶
POST /search/nihai/search
β direct RAG search (no writes)- Body:
{ "query": str, "max_results"?: int, "collection"?: str, "condition_area"?: str, "topic"?: str }
POST /search/nihai/ask
β expand-and-answer (LLM agent; may write)- Body (agent-style): see example payload in code docs
Ingestion¶
- JSON import:
python -m search.nihai.ingest --json /abs/path/seed.json --collection NihAI
- JSON export (Git-friendly):
python -m search.nihai.export --out /abs/path/seed.json --collection NihAI
- Markdown:
python -m search.nihai.md_ingest ./notes.md --collection NihAI
- Programmatic helpers:
insert_nihai_entry
,batch_insert_from_json
Before running CLIs, activate the venv so deps and env are loaded:
UUID Handling¶
- Deterministic UUIDs: All entries use UUID v5 format generated from the question text using
uuid.uuid5(uuid.NAMESPACE_URL, question)
. This ensures consistency across imports and prevents duplicates. - Weaviate Compatibility: Previous versions used raw SHA256 hashes which caused "422 Unprocessable Entity" errors with Weaviate. The UUID v5 format is now required for proper Weaviate integration.
- Deduplication: Since UUIDs are deterministic based on the question, reimporting the same data will update existing entries rather than create duplicates.
Reference Flow (Perplexity β NihAI)¶
The flow from Perplexityβs API to Weaviate storage:
sequenceDiagram
participant User
participant Perplexity API
participant PerplexityMedicalAPI
participant Typed Models
participant Weaviate (NihAI)
User->>PerplexityMedicalAPI: search_medical(query)
PerplexityMedicalAPI->>Perplexity API: POST /chat/completions
Perplexity API-->>PerplexityMedicalAPI: JSON {choices, citations, search_results}
PerplexityMedicalAPI->>Typed Models: parse to PerplexityMedicalResponse
Note right of Typed Models: sources: list[PerplexityReference]
Typed Models-->>PerplexityMedicalAPI: PerplexityMedicalResponse
PerplexityMedicalAPI-->>Weaviate (NihAI): write_nihai_faq(question, answer, references)
Weaviate (NihAI)->>Weaviate (NihAI): normalize_references_for_storage()
Weaviate (NihAI)-->>Weaviate (NihAI): store TEXT_ARRAY of JSON strings
See docs/search/perplexity_reference_flow.md
for a detailed, step-by-step breakdown.
Configuration¶
- Env:
WEAVIATE_HOST
,WEAVIATE_HTTP_PORT
,WEAVIATE_GRPC_PORT
. - Service (local): Docker
weaviate
service exposing HTTP 8282 β 8080, gRPC 50051 β 50051. - Python:
weaviate-client
v4 inrequirements.txt
.
Operations & Maintenance¶
- Backups: snapshot the
weaviate_data
volume. - Observability: track ingestion counts, search latency, and errors.
- Health: readiness checks on Weaviate.
- Tasks: deduplicate near-duplicates; split overly broad entries; flag missing/outdated references for review.
- Safety: Auto changes keep
is_verified=false
and preserve history. Approvers can flip verification and record names.
Testing¶
- Round-trip validation: ingest sample JSON and verify retrieval.
- API tests should hit endpoints (not modules) and fail hard if services are down.
- Performance: verify embedding + KNN latency and tune HNSW params.