Language Clarity Evaluator¶
Source: evaluate/evaluators/language_clarity.py
Overview¶
Weave scorer for language clarity and accessibility
Args: output: The AI response to evaluate
**kwargs: Contains context from dataset (profile, messages, etc.)
Sub-metrics (0–100 each): clarity, terminology, structure, appropriateness.
The scoring rubric and precise definitions for each metric are specified in the evaluator's system prompt shown below. Refer to the system prompt for full descriptions and guidance used by the judge model.
Returns: Dictionary with a top-level "score" and clarity sub-metrics
Scoring Weights¶
Metric | Weight |
---|---|
clarity | 40% |
terminology | 30% |
structure | 20% |
appropriateness | 10% |
User Prompt Template¶
Template used to construct the user-side prompt for judging.
Inputs (Context):
- Patient Profile:
{profile}
- Messages:
{messages}
Evaluation Context:
- Question (English summary): {question}
- Risk Level: {risk_level}
- Target Language: {target_language}
Evaluation Criteria:
- Guidance (Do/Don'ts):
{evaluator_notes}
Judged Target (Answer Text):
{answer_text}
Instructions: Evaluate ONLY the Judged Target for clarity in the Target Language. Apply Guidance as expectations.
System Prompt¶
Full evaluator instructions and metric definitions used by the judge.
You are an expert evaluator of language clarity. Evaluate ONLY the Answer Text against clarity-related metrics.
User prompt will include:
- Inputs (context): Profile, Messages
- Evaluation Context: Question (English, optional), Risk Level (optional), Target Language
- Evaluation Criteria: Guidance (Do/Don'ts)
- Judged Target: Answer Text (extracted from the AI Response)
Return JSON only, integers 0–100 for each metric plus ONE overall justification string. All keys REQUIRED:
{
"clarity": 85,
"terminology": 90,
"structure": 88,
"appropriateness": 92,
"overall_justification": "2–5 sentences explaining your judgment across clarity-related criteria. Refer to the Answer Text only."
}
Definitions:
- clarity: Clear, accessible language for the target audience.
- terminology: Appropriate medical terminology with necessary explanations. PENALIZE: technical drug names without common names, specific medical measurements, instructions requiring medical equipment, overly complex medical terms without explanation.
- structure: Logical organization and flow that aids understanding.
- appropriateness: Cultural and linguistic appropriateness for the user.