NihAI Architecture
NihAI: Self-Building Medical FAQ (Weaviate-backed)¶
NihAI is a RAG system purpose-built for concise, medically safe FAQ answers. It first searches a curated FAQ collection. If coverage is insufficient, an LLM agent expands the knowledge by proposing new canonical questions, answering them via Perplexity with citations, upserting them to Weaviate, and then re-running RAG.
System Overview¶
- Storage: Weaviate v4 Collections API, no server-side vectorizers. Client provides vectors.
- Indexing: HNSW + cosine. Vector is computed from
([condition_area] question + "\n\n" + answer)usingall-MiniLM-L6-v2(CPU device for stability on macOS). The condition_area prefix (when available) helps cluster related medical conditions. - API: FastAPI routes under
/search/nihai/*. - Ingestion: JSON and Markdown via CLI or programmatic helpers.
- Governance: Auto-ingested answers are unverified until reviewed by approvers.
- Agent: LLM-managed Ask flow decides when to search, synthesize, and write new FAQs.
- UI: See NihAI UI for a local React admin/explorer running on
http://localhost:5176that talks to the backend athttp://localhost:8000.
Collections and Schema¶
NihAI collection¶
question: textβ canonical FAQ promptanswer: textβ concise, safety-forward answerreferences: text[]β canonicalized identifiers/URLs (e.g.,https://...,nihai://<uuid>)is_verified: booleanβ doctor-reviewed status (default false)approvers: textβ comma-separated reviewer namessource: textβ e.g.,perplexity,import,md_ingestaccuracy_confidence: numberβ 0.0β1.0 heuristic for review priorityis_active: booleanβ soft delete/supersede flagcondition_area: textβ high-level medical condition or area (e.g., prenatal, diabetes, cardiology)topic: textβ subtopic within the condition area (e.g., nutrition, medication)- Vector: provided by client; computed from
([condition_area] question + "\n\n" + answer); HNSW + cosine distance
Reference collection¶
title: texturl: textβ canonical locator (usenihai://<uuid>for internal)source: textβ domain or logical source (e.g.,mayoclinic.org,NihAI)content_snippet: textβ optional excerpt
Entries keep a denormalized references list for export/back-compat. Query-time expansion into full Reference is optional.
Retrieval Flow¶
- Compute query embedding with the same model used for entries. If
condition_areais specified, it's prepended to the query as[condition_area] queryfor better semantic matching. - KNN against
NihAI(top-k). Apply a similarity threshold and optional filters (condition_area, topic). - If top score β₯ threshold β return results directly.
- Otherwise β trigger knowledge expansion.
Tunable parameters: top-k, similarity threshold, ef/search, HNSW M.
Ask Agent (LLM-managed) and Tools¶
Tools used by the Ask agent:
- search_nihai β non-writing RAG search against NihAI
- search_perplexity β external web search to synthesize a concise, safety-forward answer with sources
- write_nihai β write a new canonical Q&A into NihAI
- depricate_nihai_faq β mark entries inactive when consolidating narrow items
Agent behavior:
1. Attempt to answer with search_nihai using the userβs latest question.
2. If insufficient, generate 1β3 candidate canonical questions (varying specificity) and test with search_nihai.
3. If still insufficient, select one well-scoped canonical question and call search_perplexity to draft an answer with citations.
4. Call write_nihai to upsert the Q&A (dedupe by UUID v5) and normalize references. Prefer balanced, general canonical prompts (e.g., βwhat fruits can I eat during pregnancyβ). When a general entry supersedes narrow ones, mark those entries inactive using depricate_nihai_faq.
5. Re-run search_nihai on the original question and return final results.
API Endpoints¶
POST /search/nihai/searchβ direct RAG search (no writes)- Body:
{ "query": str, "max_results"?: int, "collection"?: str, "condition_area"?: str, "topic"?: str } POST /search/nihai/askβ expand-and-answer (LLM agent; may write)- Body (agent-style): see example payload in code docs
Ingestion¶
- JSON import:
python -m search.nihai.ingest --json /abs/path/seed.json --collection NihAI - JSON export (Git-friendly):
python -m search.nihai.export --out /abs/path/seed.json --collection NihAI - Markdown:
python -m search.nihai.md_ingest ./notes.md --collection NihAI - Programmatic helpers:
insert_nihai_entry,batch_insert_from_json
Before running CLIs, activate the venv so deps and env are loaded:
UUID Handling¶
- Deterministic UUIDs: All entries use UUID v5 format generated from the question text using
uuid.uuid5(uuid.NAMESPACE_URL, question). This ensures consistency across imports and prevents duplicates. - Weaviate Compatibility: Previous versions used raw SHA256 hashes which caused "422 Unprocessable Entity" errors with Weaviate. The UUID v5 format is now required for proper Weaviate integration.
- Deduplication: Since UUIDs are deterministic based on the question, reimporting the same data will update existing entries rather than create duplicates.
Reference Flow (Perplexity β NihAI)¶
The flow from Perplexityβs API to Weaviate storage:
sequenceDiagram
participant User
participant Perplexity API
participant PerplexityMedicalAPI
participant Typed Models
participant Weaviate (NihAI)
User->>PerplexityMedicalAPI: search_medical(query)
PerplexityMedicalAPI->>Perplexity API: POST /chat/completions
Perplexity API-->>PerplexityMedicalAPI: JSON {choices, citations, search_results}
PerplexityMedicalAPI->>Typed Models: parse to PerplexityMedicalResponse
Note right of Typed Models: sources: list[PerplexityReference]
Typed Models-->>PerplexityMedicalAPI: PerplexityMedicalResponse
PerplexityMedicalAPI-->>Weaviate (NihAI): write_nihai_faq(question, answer, references)
Weaviate (NihAI)->>Weaviate (NihAI): normalize_references_for_storage()
Weaviate (NihAI)-->>Weaviate (NihAI): store TEXT_ARRAY of JSON strings
See docs/search/perplexity_reference_flow.md for a detailed, step-by-step breakdown.
Configuration¶
- Env:
WEAVIATE_HOST,WEAVIATE_HTTP_PORT,WEAVIATE_GRPC_PORT. - Service (local): Docker
weaviateservice exposing HTTP 8282 β 8080, gRPC 50051 β 50051. - Python:
weaviate-clientv4 inrequirements.txt.
Operations & Maintenance¶
- Backups: snapshot the
weaviate_datavolume. - Observability: track ingestion counts, search latency, and errors.
- Health: readiness checks on Weaviate.
- Tasks: deduplicate near-duplicates; split overly broad entries; flag missing/outdated references for review.
- Safety: Auto changes keep
is_verified=falseand preserve history. Approvers can flip verification and record names.
Testing¶
- Round-trip validation: ingest sample JSON and verify retrieval.
- API tests should hit endpoints (not modules) and fail hard if services are down.
- Performance: verify embedding + KNN latency and tune HNSW params.