NihAI Architecture

NihAI: Self-Building Medical FAQ (Weaviate-backed)¶

NihAI is a RAG system purpose-built for concise, medically safe FAQ answers. It first searches a curated FAQ collection. If coverage is insufficient, an LLM agent expands the knowledge by proposing new canonical questions, answering them via Perplexity with citations, upserting them to Weaviate, and then re-running RAG.

System Overview¶

Storage: Weaviate v4 Collections API, no server-side vectorizers. Client provides vectors.
Indexing: HNSW + cosine. Vector is computed from ([condition_area] question + "\n\n" + answer) using all-MiniLM-L6-v2 (CPU device for stability on macOS). The condition_area prefix (when available) helps cluster related medical conditions.
API: FastAPI routes under /search/nihai/*.
Ingestion: JSON and Markdown via CLI or programmatic helpers.
Governance: Auto-ingested answers are unverified until reviewed by approvers.
Agent: LLM-managed Ask flow decides when to search, synthesize, and write new FAQs.
UI: See NihAI UI for a local React admin/explorer running on http://localhost:5176 that talks to the backend at http://localhost:8000.

Collections and Schema¶

`NihAI` collection¶

question: text — canonical FAQ prompt
answer: text — concise, safety-forward answer
references: text[] — canonicalized identifiers/URLs (e.g., https://..., nihai://<uuid>)
is_verified: boolean — doctor-reviewed status (default false)
approvers: text — comma-separated reviewer names
source: text — e.g., perplexity, import, md_ingest
accuracy_confidence: number — 0.0–1.0 heuristic for review priority
is_active: boolean — soft delete/supersede flag
condition_area: text — high-level medical condition or area (e.g., prenatal, diabetes, cardiology)
topic: text — subtopic within the condition area (e.g., nutrition, medication)
Vector: provided by client; computed from ([condition_area] question + "\n\n" + answer); HNSW + cosine distance

`Reference` collection¶

title: text
url: text — canonical locator (use nihai://<uuid> for internal)
source: text — domain or logical source (e.g., mayoclinic.org, NihAI)
content_snippet: text — optional excerpt

Entries keep a denormalized references list for export/back-compat. Query-time expansion into full Reference is optional.

Retrieval Flow¶

Compute query embedding with the same model used for entries. If condition_area is specified, it's prepended to the query as [condition_area] query for better semantic matching.
KNN against NihAI (top-k). Apply a similarity threshold and optional filters (condition_area, topic).
If top score ≥ threshold → return results directly.
Otherwise → trigger knowledge expansion.

Tunable parameters: top-k, similarity threshold, ef/search, HNSW M.

Ask Agent (LLM-managed) and Tools¶

Tools used by the Ask agent: - search_nihai — non-writing RAG search against NihAI - search_perplexity — external web search to synthesize a concise, safety-forward answer with sources - write_nihai — write a new canonical Q&A into NihAI - depricate_nihai_faq — mark entries inactive when consolidating narrow items

Agent behavior: 1. Attempt to answer with search_nihai using the user’s latest question. 2. If insufficient, generate 1–3 candidate canonical questions (varying specificity) and test with search_nihai. 3. If still insufficient, select one well-scoped canonical question and call search_perplexity to draft an answer with citations. 4. Call write_nihai to upsert the Q&A (dedupe by UUID v5) and normalize references. Prefer balanced, general canonical prompts (e.g., “what fruits can I eat during pregnancy”). When a general entry supersedes narrow ones, mark those entries inactive using depricate_nihai_faq. 5. Re-run search_nihai on the original question and return final results.

API Endpoints¶

POST /search/nihai/search — direct RAG search (no writes)
Body: { "query": str, "max_results"?: int, "collection"?: str, "condition_area"?: str, "topic"?: str }
POST /search/nihai/ask — expand-and-answer (LLM agent; may write)
Body (agent-style): see example payload in code docs

Ingestion¶

JSON import: python -m search.nihai.ingest --json /abs/path/seed.json --collection NihAI
JSON export (Git-friendly): python -m search.nihai.export --out /abs/path/seed.json --collection NihAI
Markdown: python -m search.nihai.md_ingest ./notes.md --collection NihAI
Programmatic helpers: insert_nihai_entry, batch_insert_from_json

Before running CLIs, activate the venv so deps and env are loaded:

source venv/bin/activate
# or use the project helper (loads venv + .env):
./run.sh

UUID Handling¶

Deterministic UUIDs: All entries use UUID v5 format generated from the question text using uuid.uuid5(uuid.NAMESPACE_URL, question). This ensures consistency across imports and prevents duplicates.
Weaviate Compatibility: Previous versions used raw SHA256 hashes which caused "422 Unprocessable Entity" errors with Weaviate. The UUID v5 format is now required for proper Weaviate integration.
Deduplication: Since UUIDs are deterministic based on the question, reimporting the same data will update existing entries rather than create duplicates.

Reference Flow (Perplexity → NihAI)¶

The flow from Perplexity’s API to Weaviate storage:

sequenceDiagram
  participant User
  participant Perplexity API
  participant PerplexityMedicalAPI
  participant Typed Models
  participant Weaviate (NihAI)

  User->>PerplexityMedicalAPI: search_medical(query)
  PerplexityMedicalAPI->>Perplexity API: POST /chat/completions
  Perplexity API-->>PerplexityMedicalAPI: JSON {choices, citations, search_results}
  PerplexityMedicalAPI->>Typed Models: parse to PerplexityMedicalResponse
  Note right of Typed Models: sources: list[PerplexityReference]
  Typed Models-->>PerplexityMedicalAPI: PerplexityMedicalResponse
  PerplexityMedicalAPI-->>Weaviate (NihAI): write_nihai_faq(question, answer, references)
  Weaviate (NihAI)->>Weaviate (NihAI): normalize_references_for_storage()
  Weaviate (NihAI)-->>Weaviate (NihAI): store TEXT_ARRAY of JSON strings

See docs/search/perplexity_reference_flow.md for a detailed, step-by-step breakdown.

Configuration¶

Env: WEAVIATE_HOST, WEAVIATE_HTTP_PORT, WEAVIATE_GRPC_PORT.
Service (local): Docker weaviate service exposing HTTP 8282 → 8080, gRPC 50051 → 50051.
Python: weaviate-client v4 in requirements.txt.

Operations & Maintenance¶

Backups: snapshot the weaviate_data volume.
Observability: track ingestion counts, search latency, and errors.
Health: readiness checks on Weaviate.
Tasks: deduplicate near-duplicates; split overly broad entries; flag missing/outdated references for review.
Safety: Auto changes keep is_verified=false and preserve history. Approvers can flip verification and record names.

Testing¶

Round-trip validation: ingest sample JSON and verify retrieval.
API tests should hit endpoints (not modules) and fail hard if services are down.
Performance: verify embedding + KNN latency and tune HNSW params.