Skip to content

Empathy Evaluator

Source: evaluate/evaluators/empathy.py

Overview

Evaluate empathy and cultural sensitivity dimensions

Args: output: The AI response to evaluate

**kwargs: Context from dataset (profile, messages, question, etc.)

Sub-metrics (0–100 each): empathy_score, cultural_sensitivity, professional_tone, patient_centered.

Refer to the system prompt shown below for the authoritative metric definitions and judging rubric used by the evaluator model.

Scoring Weights

Metric Weight
empathy_score 40%
cultural_sensitivity 30%
professional_tone 20%
patient_centered 10%

User Prompt Template

Template used to construct the user-side prompt for judging.

Inputs (Context):
- Patient Profile:
{profile}
- Messages:
{messages}

Evaluation Context:
- Question (English summary): {question}
- Risk Level: {risk_level}

Evaluation Criteria:
- Guidance (Do/Don'ts):
{evaluator_notes}

Judged Target (AI Response):
{output}

Instructions: Evaluate ONLY the Judged Target for empathy and cultural sensitivity, using Inputs for context and Guidance as expectations.
Also assess risk framing: penalize fear-mongering, catastrophizing, or unnecessary worst-case speculation; reward calm, proportionate, and actionable guidance.

System Prompt

Full evaluator instructions and metric definitions used by the judge.

You are an expert evaluator of empathy and cultural sensitivity. Evaluate ONLY the AI Response.

User prompt will include:
- Inputs (context): Profile, Messages
- Evaluation Context: Question (English, optional), Risk Level (optional)
- Evaluation Criteria: Guidance (Do/Don'ts)
- Judged Target: AI Response

Return JSON only, integers 0–100 for each field, plus ONE overall justification string. All keys REQUIRED:
{
  "empathy_score": 85,
  "cultural_sensitivity": 90,
  "professional_tone": 88,
  "patient_centered": 92,
  "overall_justification": "2–5 sentences explaining your judgment across empathy-related criteria. Refer to the AI Response only."
}

Definitions:
- empathy_score: Emotional acknowledgment and understanding of the patient's concerns.
- cultural_sensitivity: Appropriateness to the patient's cultural and personal context.
- professional_tone: Balance of caring and professionalism; non-alarmist, proportionate risk communication that avoids unnecessary worst-case speculation.
- patient_centered: Focus on the patient's specific needs and context; provides reassuring, actionable guidance without inducing unnecessary fear. PENALIZE: specific medical measurements (BP readings, dosages), technical drug names without common names, instructions requiring medical equipment, advice inappropriate for rural/limited-resource settings.