Skip to content

NihAI Ingestion Tools

NihAI Ingestion Tools

This doc describes how to preload or grow the NihAI FAQ knowledge base from JSON or Markdown.

Collections

  • NihAI entries live in the NihAI collection (or a collection you specify).
  • References live in the Reference collection and are denormalized as canonical strings in each entry's references list.
  • Internal FAQ references use nihai://<uuid> with source="NihAI".

Entry Schema (properties)

  • question: str (required)
  • answer: str (required)
  • references: list[str] or list of objects {title, url, source, content_snippet?} (optional)
  • is_verified: bool (default false)
  • approvers: str (default empty)
  • source: str (e.g., "import", "md_ingest", "perplexity")
  • accuracy_confidence: float (0.0–1.0; default 0.5)
  • is_active: bool (default true)
  • condition_area: str (optional; e.g., ANC)
  • topic: str (optional; topic area above the question)

An embedding vector is computed from (question + "\n\n" + answer).


JSON Ingestion

File format: a JSON array of entry objects matching the schema above. You can send these via the API at POST /search/nihai/ingest-json using the wrapper shape { "entries": [...], "collection": "NihAI" }.

Example:

[
  {
    "question": "What are warning signs of preeclampsia?",
    "answer": "Warning signs include severe headache, vision changes, ...",
    "references": [
      {"title": "ACOG Preeclampsia", "url": "https://www.acog.org/...", "source": "acog.org", "content_snippet": "Practice bulletin ..."}
    ],
    "source": "import",
    "accuracy_confidence": 0.7,
    "condition_area": "ANC",
    "topic": "Hypertensive disorders in pregnancy"
  }
]

Request body to the endpoint:

{
  "entries": [
    {
      "question": "What are warning signs of preeclampsia?",
      "answer": "Warning signs include severe headache, vision changes, ...",
      "references": [
        {"title": "ACOG Preeclampsia", "url": "https://www.acog.org/...", "source": "acog.org", "content_snippet": "Practice bulletin ..."}
      ],
      "source": "import",
      "accuracy_confidence": 0.7,
      "condition_area": "ANC",
      "topic": "Hypertensive disorders in pregnancy"
    }
  ],
  "collection": "NihAI"
}

Programmatic usage:

from search.nihai.ingest import insert_nihai_entry, batch_insert_from_json
insert_nihai_entry({"question": "...", "answer": "..."}, collection="NihAI")
batch_insert_from_json("/abs/path/nihai_seed.json", collection="NihAI")

CLI:

python -m search.nihai.ingest --json /abs/path/nihai_seed.json --collection NihAI


Export to JSON (Git-friendly)

Use the export CLI to dump a collection into the same JSON format accepted by the importer.

What it does: - Reads all objects via paging and converts stored references JSON strings back into dicts/strings - Sorts entries deterministically by question (and topic) for clean diffs - Writes a single JSON array you can check into Git and re-import elsewhere

CLI:

python -m search.nihai.export --out /abs/path/nihai_seed.json --collection NihAI

Virtual environment and env setup: - macOS/Linux (bash/zsh):

source venv/bin/activate
# or use the project helper to load venv + .env
./run.sh
- Ensure NIHAI_COLLECTION and Weaviate connection env vars are set if you omit --collection.

Notes: - The exported file can be reloaded with batch_insert_from_json or the CLI above. - Vectors are recomputed on import; only your properties are stored in the file.


Markdown Ingestion via LLM

Use an LLM to parse a .md file and propose canonical Q&A pairs; it will either: - Insert a new Q&A (action new_qa), or - Link as a reference to an existing close match (action added_reference).

CLI (auto-detects by file extension):

python -m search.nihai.md_ingest ./notes.md --collection NihAI
python -m search.nihai.md_ingest ./seed.json --collection NihAI

Requirements: - OPENAI_API_KEY set in the environment.

What gets logged: - JSON list of actions: {action: 'new_qa'|'added_reference'|'skipped', question, target_uuid?}

Notes: - The parser asks the model to return JSON with items of {question, answer, references?}. - References extracted from markdown can be either strings or objects {title, url, source, content_snippet?}; they are normalized and upserted to the Reference collection. - Similarity threshold for linking vs. new insert is currently 0.8 on vector similarity to the top hit.


  • NihAI UI — browse/edit collections locally (dev server on 5176, backend 8000).