Skip to content

Voice Interfaces

ASHAI provides two voice interfaces for different interaction patterns:

Standard Voice Chat (/voice)

Browser-based voice interface using Web Speech APIs for basic voice interaction.

Features: - Voice-to-text input using Web Speech API - Text-to-speech responses with natural voice synthesis - Same comprehensive tool access as text-based agents - Patient profile integration - Mobile-friendly responsive design

Best for: Direct patient consultations, accessibility needs, hands-free operation

Realtime Voice Chat (/voice-realtime)

Advanced WebSocket-based interface designed for ChatGPT-style real-time voice interaction.

Features: - Real-time voice streaming with OpenAI Realtime API - Continuous conversation flow - Low-latency responses - Advanced voice processing

Best for: Natural conversation flow, requires OpenAI Realtime API access

Getting Started

  1. Start the server: ./run.sh
  2. Access voice interfaces:
  3. Standard: http://localhost:8000/voice
  4. Realtime: http://localhost:8000/voice-realtime

Both interfaces provide access to the same underlying medical knowledge bases and AI capabilities through voice interaction.