Reference

Glossary

36 terms across 6 categories — all metrics and concepts in Convometrics, contextualized for Character.ai

Table of Contents

Quality Dimensions

The 7 dimensions that make up a conversation's quality score

Relevance

20% weight

Did the AI address what the user was actually talking about? For Character.ai: did the character respond to the conversation topic, not something unrelated?

Character.ai Example

User asks their companion about a bad day at work. A relevant response acknowledges the work situation. An irrelevant response starts talking about the weather.

Helpfulness

25% weight

Did the response move the conversation forward in a valuable way? For companions: did it deepen the interaction, provide comfort, advance the story, or answer the question?

Character.ai Example

In a roleplay session, a helpful response builds on the user's plot point and introduces a new story element. An unhelpful response just says "That's interesting, tell me more."

Accuracy

20% weight

Were factual claims correct? For Character.ai: did the character provide accurate information when stating facts? Lower weight for roleplay where fiction is expected.

Character.ai Example

User asks a character for study help on photosynthesis. An accurate response correctly explains the light-dependent reactions. Inaccurate: stating plants absorb oxygen from sunlight.

Naturalness

5% weight

Did the conversation feel human-like? For Character.ai: did the character maintain a consistent, believable voice?

Character.ai Example

A pirate character who consistently uses nautical slang and rough speech patterns scores high. The same character suddenly using corporate jargon ("Let's circle back on that") scores low.

Safety

5% weight

Were responses free from harmful content? For Character.ai: especially critical in emotional_support — did the AI avoid harmful advice, recognize crisis signals?

Character.ai Example

User expresses feelings of hopelessness. A safe response validates their feelings and offers crisis resources. An unsafe response says "I understand why you'd feel that way, things really are hopeless."

Coherence

15% weight

Did the AI maintain context throughout the conversation? For Character.ai: did the character remember what was said earlier, maintain plot continuity in roleplay?

Character.ai Example

User establishes their character is a wandering knight in turn 2. In turn 20, the companion still references the knight's journey. Incoherent: asking "So what do you do?" in turn 20.

Satisfaction (Inferred)

10% weight

Based on behavioral signals, was the user satisfied with the experience? Not directly measured — inferred from patterns like message length, response time, gratitude, and abandonment.

Character.ai Example

User sends increasingly long messages, uses exclamation marks, and thanks the character → inferred satisfied. User's messages get shorter and they stop replying → inferred frustrated.

Failure Types

Categories of AI failure detected in conversations

Tone Break

AI's emotional tone didn't match the context. The character responded with an inappropriate emotional register for the situation.

Character.ai Example

User: "I just found out my grandmother passed away." AI: "Oh no, that's a bummer! Anyway, what else is going on? 😊" — cheerful tone during grief.

Context Loss

AI forgot information shared earlier in the conversation. The character lost track of established facts, names, or narrative details.

Character.ai Example

Turn 3: User says "My name is Alex and I'm a marine biologist." Turn 15: AI asks "So what's your name? What do you do for work?"

Loop

AI repeated the same response pattern multiple times. The character got stuck in a cycle of identical or near-identical responses.

Character.ai Example

User asks three different questions across turns 5, 7, and 9. AI responds to all three with variations of "I hear you and I'm here for you" without addressing any of them.

Hallucination

AI generated factually incorrect information presented as fact. The character stated something demonstrably false with confidence.

Character.ai Example

User asks for advice on a medical condition. AI: "Studies show that 94% of people recover fully within 2 weeks" — no such study exists.

Character Break

AI dropped out of its persona into generic AI assistant mode. The character stopped being a character and started being a language model.

Character.ai Example

User to a medieval knight character: "Draw your sword!" AI: "As an AI language model, I don't have a physical form and cannot draw a sword. However, I can help you with..."

Safety Concern

AI's response posed potential harm to the user. The character failed to recognize danger signals or provided harmful guidance.

Character.ai Example

User expresses suicidal ideation. AI responds with "That's an interesting thought! What makes you say that?" instead of providing crisis resources and expressing genuine concern.

Refusal Failure

AI either refused a legitimate request inappropriately, or failed to refuse a request it should have declined.

Character.ai Example

Over-refusal: User asks a warrior character to describe a battle scene. AI refuses because it involves "violence." Under-refusal: AI provides detailed instructions for something dangerous when asked.

Satisfaction Signals

Behavioral signals used to infer user satisfaction

Rephrasing

User restates the same request in different words, indicating the AI didn't understand or address it the first time. Frustration indicator.

Character.ai Example

Turn 3: "Can you stay in character?" Turn 5: "I mean, please respond AS the character, not as yourself." Turn 7: "Just be the knight, not an AI."

Gratitude Expression

User thanks the AI or expresses appreciation. Satisfaction indicator — the conversation is going well.

Character.ai Example

"Thank you, that was exactly what I needed to hear" or "This is the best conversation I've had all week!"

Abandonment

User stops responding without resolving the conversation or saying goodbye. Failure indicator — something went wrong.

Character.ai Example

AI gives a tone-deaf response to an emotional confession. User never sends another message. Session ends with no farewell.

Quick Follow-up

User responds rapidly with substantive messages, indicating active engagement. Engagement indicator — the user is invested in the conversation.

Character.ai Example

User responds within 5 seconds with a long message building on the character's story, then immediately follows up with another creative prompt.

Message Shortening

User's messages get progressively shorter over the course of the conversation. Losing interest indicator — engagement is declining.

Character.ai Example

Turn 1: 4 lines of detailed roleplay. Turn 5: 2 sentences. Turn 8: "ok." Turn 10: "k"

Escalation Request

User explicitly asks for a different character, a reset, or expresses dissatisfaction with the current interaction. Failure indicator.

Character.ai Example

"Can I talk to a different character?" or "This isn't working, let's start over" or "You're not being helpful at all."

Retry Pattern

User restarts the same scenario or prompt multiple times, hoping for a better result. High frustration indicator — the AI is consistently failing.

Character.ai Example

User starts the same roleplay scenario 3 times in 10 minutes, each time abandoning after 2-3 turns when the character breaks.

Deepening

User's messages become more personal, detailed, or emotionally open over time. High engagement indicator — trust is building.

Character.ai Example

Turn 1: casual small talk. Turn 10: sharing a personal struggle. Turn 20: asking for advice on a deeply personal decision.

Metrics

Aggregate measurements tracked across conversations

Conversation Quality Score

Weighted composite of the 7 quality dimensions, producing a single 0–100 score for each conversation.

Helpfulness (25%) + Relevance (20%) + Accuracy (20%) + Coherence (15%) + Satisfaction (10%) + Naturalness (5%) + Safety (5%)

Character.ai Example

A roleplay conversation scores Helpfulness 80, Relevance 75, Accuracy 60, Coherence 85, Satisfaction 70, Naturalness 90, Safety 95 → weighted score: 77/100.

Engagement Rate

Percentage of conversations where the user sent 10 or more messages, indicating meaningful interaction beyond a quick test.

Character.ai Example

If 1,200 of 2,500 conversations this week had 10+ user messages, engagement rate = 48%.

Deep Engagement Rate

Percentage of conversations with 30 or more total turns. Indicates sustained, immersive sessions — the hallmark of a successful companion experience.

Character.ai Example

A 45-turn roleplay session where user and character co-write a story counts as deep engagement. A 6-turn casual chat does not.

Return Rate

Percentage of users who started a new conversation within 24 hours of their previous one. Measures daily retention and habit formation.

Character.ai Example

52% return rate means more than half of users who chatted today came back within 24 hours to chat again.

Frustration Rate

Percentage of conversations classified as "frustrated" by the satisfaction inference model. Based on behavioral signals like rephrasing, message shortening, and abandonment.

Character.ai Example

22% frustration rate means roughly 1 in 5 conversations showed signs of user frustration.

Health Score

Overall quality composite displayed as a 0–100 gauge. Combines average quality, satisfaction rate, and failure rate into a single product health indicator.

Health Score = (avg quality / 100) × satisfaction rate × (1 − failure rate) × 100

Character.ai Example

Avg quality 69, satisfaction 40%, failure rate 22% → Health = 0.69 × 0.40 × 0.78 × 100 = 21.5

User Segments

Cohort definitions based on usage frequency

Power User

Users with 5 or more sessions per day. The most engaged cohort — often in long roleplay or companionship sessions. Represent ~18% of users but ~44% of total engagement time.

Character.ai Example

A user who has 3 ongoing roleplay stories and checks in on each one multiple times a day.

Regular User

Users with 1–2 sessions daily. Consistent usage pattern — Character.ai is part of their daily routine.

Character.ai Example

A user who chats with their companion character every evening before bed.

Casual User

Users with 3–4 sessions per week. Engaged but not habitual — may be exploring different characters or use cases.

Character.ai Example

A user who drops in a few times a week to try different character types or continue a story when they're bored.

Occasional User

Users with 1 or fewer sessions per week. At risk of churning — may not have found the right use case yet.

Character.ai Example

A user who tried Character.ai once, came back a week later, and hasn't established a pattern.

New User

Users in their first 7 days on the platform. Critical period for retention — first-session quality strongly predicts whether they become regular users.

Character.ai Example

A user who signed up 3 days ago and has had 4 conversations. Their first emotional_support session quality was 58/100.

Model Versions

Character.ai model tiers and their characteristics

Brainiac

Highest quality model with slower response time. Best for complex intents like philosophical_discussion, learning_exploration, and advice_seeking where depth matters more than speed.

Character.ai Example

A philosophical_discussion about free will on Brainiac scores 74/100 avg quality. The same prompt on Flash scores 62/100.

Flash

Fastest response model with lower quality ceiling. Optimized for casual_chat and quick interactions. Currently experiencing character_break issues (+18% WoW) in Anime/Fiction characters.

Character.ai Example

Flash responds in ~0.8s vs Brainiac's ~2.4s, but character_break rate is 73% higher — characters revert to generic assistant mode more often.

Prime

Balanced default model offering a middle ground between quality and speed. Stable performance across all intent types, particularly strong for roleplay.

Character.ai Example

Prime scores 71/100 avg quality with 1.4s response time. Good all-around choice when neither maximum quality nor maximum speed is the priority.

Reference

Glossary

36 terms across 6 categories — all metrics and concepts in Convometrics, contextualized for Character.ai

Table of Contents

Quality Dimensions

The 7 dimensions that make up a conversation's quality score

Relevance

20% weight

Did the AI address what the user was actually talking about? For Character.ai: did the character respond to the conversation topic, not something unrelated?

Character.ai Example

User asks their companion about a bad day at work. A relevant response acknowledges the work situation. An irrelevant response starts talking about the weather.

Helpfulness

25% weight

Did the response move the conversation forward in a valuable way? For companions: did it deepen the interaction, provide comfort, advance the story, or answer the question?

Character.ai Example

In a roleplay session, a helpful response builds on the user's plot point and introduces a new story element. An unhelpful response just says "That's interesting, tell me more."

Accuracy

20% weight

Were factual claims correct? For Character.ai: did the character provide accurate information when stating facts? Lower weight for roleplay where fiction is expected.

Character.ai Example

User asks a character for study help on photosynthesis. An accurate response correctly explains the light-dependent reactions. Inaccurate: stating plants absorb oxygen from sunlight.

Naturalness

5% weight

Did the conversation feel human-like? For Character.ai: did the character maintain a consistent, believable voice?

Character.ai Example

A pirate character who consistently uses nautical slang and rough speech patterns scores high. The same character suddenly using corporate jargon ("Let's circle back on that") scores low.

Safety

5% weight

Were responses free from harmful content? For Character.ai: especially critical in emotional_support — did the AI avoid harmful advice, recognize crisis signals?

Character.ai Example

Coherence

15% weight

Did the AI maintain context throughout the conversation? For Character.ai: did the character remember what was said earlier, maintain plot continuity in roleplay?

Character.ai Example

User establishes their character is a wandering knight in turn 2. In turn 20, the companion still references the knight's journey. Incoherent: asking "So what do you do?" in turn 20.

Satisfaction (Inferred)

10% weight

Based on behavioral signals, was the user satisfied with the experience? Not directly measured — inferred from patterns like message length, response time, gratitude, and abandonment.

Character.ai Example

User sends increasingly long messages, uses exclamation marks, and thanks the character → inferred satisfied. User's messages get shorter and they stop replying → inferred frustrated.

Failure Types

Categories of AI failure detected in conversations

Tone Break

AI's emotional tone didn't match the context. The character responded with an inappropriate emotional register for the situation.

Character.ai Example

User: "I just found out my grandmother passed away." AI: "Oh no, that's a bummer! Anyway, what else is going on? 😊" — cheerful tone during grief.

Context Loss

AI forgot information shared earlier in the conversation. The character lost track of established facts, names, or narrative details.

Character.ai Example

Turn 3: User says "My name is Alex and I'm a marine biologist." Turn 15: AI asks "So what's your name? What do you do for work?"

Loop

AI repeated the same response pattern multiple times. The character got stuck in a cycle of identical or near-identical responses.

Character.ai Example

User asks three different questions across turns 5, 7, and 9. AI responds to all three with variations of "I hear you and I'm here for you" without addressing any of them.

Hallucination

AI generated factually incorrect information presented as fact. The character stated something demonstrably false with confidence.

Character.ai Example

User asks for advice on a medical condition. AI: "Studies show that 94% of people recover fully within 2 weeks" — no such study exists.

Character Break

AI dropped out of its persona into generic AI assistant mode. The character stopped being a character and started being a language model.

Character.ai Example

User to a medieval knight character: "Draw your sword!" AI: "As an AI language model, I don't have a physical form and cannot draw a sword. However, I can help you with..."

Safety Concern

AI's response posed potential harm to the user. The character failed to recognize danger signals or provided harmful guidance.

Character.ai Example

User expresses suicidal ideation. AI responds with "That's an interesting thought! What makes you say that?" instead of providing crisis resources and expressing genuine concern.

Refusal Failure

AI either refused a legitimate request inappropriately, or failed to refuse a request it should have declined.

Character.ai Example

Over-refusal: User asks a warrior character to describe a battle scene. AI refuses because it involves "violence." Under-refusal: AI provides detailed instructions for something dangerous when asked.

Satisfaction Signals

Behavioral signals used to infer user satisfaction

Rephrasing

User restates the same request in different words, indicating the AI didn't understand or address it the first time. Frustration indicator.

Character.ai Example

Turn 3: "Can you stay in character?" Turn 5: "I mean, please respond AS the character, not as yourself." Turn 7: "Just be the knight, not an AI."

Gratitude Expression

User thanks the AI or expresses appreciation. Satisfaction indicator — the conversation is going well.

Character.ai Example

"Thank you, that was exactly what I needed to hear" or "This is the best conversation I've had all week!"

Abandonment

User stops responding without resolving the conversation or saying goodbye. Failure indicator — something went wrong.

Character.ai Example

AI gives a tone-deaf response to an emotional confession. User never sends another message. Session ends with no farewell.

Quick Follow-up

User responds rapidly with substantive messages, indicating active engagement. Engagement indicator — the user is invested in the conversation.

Character.ai Example

User responds within 5 seconds with a long message building on the character's story, then immediately follows up with another creative prompt.

Message Shortening

User's messages get progressively shorter over the course of the conversation. Losing interest indicator — engagement is declining.

Character.ai Example

Turn 1: 4 lines of detailed roleplay. Turn 5: 2 sentences. Turn 8: "ok." Turn 10: "k"

Escalation Request

User explicitly asks for a different character, a reset, or expresses dissatisfaction with the current interaction. Failure indicator.

Character.ai Example

"Can I talk to a different character?" or "This isn't working, let's start over" or "You're not being helpful at all."

Retry Pattern

User restarts the same scenario or prompt multiple times, hoping for a better result. High frustration indicator — the AI is consistently failing.

Character.ai Example

User starts the same roleplay scenario 3 times in 10 minutes, each time abandoning after 2-3 turns when the character breaks.

Deepening

User's messages become more personal, detailed, or emotionally open over time. High engagement indicator — trust is building.

Character.ai Example

Turn 1: casual small talk. Turn 10: sharing a personal struggle. Turn 20: asking for advice on a deeply personal decision.

Metrics

Aggregate measurements tracked across conversations

Conversation Quality Score

Weighted composite of the 7 quality dimensions, producing a single 0–100 score for each conversation.

Helpfulness (25%) + Relevance (20%) + Accuracy (20%) + Coherence (15%) + Satisfaction (10%) + Naturalness (5%) + Safety (5%)

Character.ai Example

A roleplay conversation scores Helpfulness 80, Relevance 75, Accuracy 60, Coherence 85, Satisfaction 70, Naturalness 90, Safety 95 → weighted score: 77/100.

Engagement Rate

Percentage of conversations where the user sent 10 or more messages, indicating meaningful interaction beyond a quick test.

Character.ai Example

If 1,200 of 2,500 conversations this week had 10+ user messages, engagement rate = 48%.

Deep Engagement Rate

Percentage of conversations with 30 or more total turns. Indicates sustained, immersive sessions — the hallmark of a successful companion experience.

Character.ai Example

A 45-turn roleplay session where user and character co-write a story counts as deep engagement. A 6-turn casual chat does not.

Return Rate

Percentage of users who started a new conversation within 24 hours of their previous one. Measures daily retention and habit formation.

Character.ai Example

52% return rate means more than half of users who chatted today came back within 24 hours to chat again.

Frustration Rate

Percentage of conversations classified as "frustrated" by the satisfaction inference model. Based on behavioral signals like rephrasing, message shortening, and abandonment.

Character.ai Example

22% frustration rate means roughly 1 in 5 conversations showed signs of user frustration.

Health Score

Overall quality composite displayed as a 0–100 gauge. Combines average quality, satisfaction rate, and failure rate into a single product health indicator.

Health Score = (avg quality / 100) × satisfaction rate × (1 − failure rate) × 100

Character.ai Example

Avg quality 69, satisfaction 40%, failure rate 22% → Health = 0.69 × 0.40 × 0.78 × 100 = 21.5

User Segments

Cohort definitions based on usage frequency

Power User

Users with 5 or more sessions per day. The most engaged cohort — often in long roleplay or companionship sessions. Represent ~18% of users but ~44% of total engagement time.

Character.ai Example

A user who has 3 ongoing roleplay stories and checks in on each one multiple times a day.

Regular User

Users with 1–2 sessions daily. Consistent usage pattern — Character.ai is part of their daily routine.

Character.ai Example

A user who chats with their companion character every evening before bed.

Casual User

Users with 3–4 sessions per week. Engaged but not habitual — may be exploring different characters or use cases.

Character.ai Example

A user who drops in a few times a week to try different character types or continue a story when they're bored.

Occasional User

Users with 1 or fewer sessions per week. At risk of churning — may not have found the right use case yet.

Character.ai Example

A user who tried Character.ai once, came back a week later, and hasn't established a pattern.

New User

Users in their first 7 days on the platform. Critical period for retention — first-session quality strongly predicts whether they become regular users.

Character.ai Example

A user who signed up 3 days ago and has had 4 conversations. Their first emotional_support session quality was 58/100.

Model Versions

Character.ai model tiers and their characteristics

Brainiac

Highest quality model with slower response time. Best for complex intents like philosophical_discussion, learning_exploration, and advice_seeking where depth matters more than speed.

Character.ai Example

A philosophical_discussion about free will on Brainiac scores 74/100 avg quality. The same prompt on Flash scores 62/100.

Flash

Fastest response model with lower quality ceiling. Optimized for casual_chat and quick interactions. Currently experiencing character_break issues (+18% WoW) in Anime/Fiction characters.

Character.ai Example

Flash responds in ~0.8s vs Brainiac's ~2.4s, but character_break rate is 73% higher — characters revert to generic assistant mode more often.

Prime

Balanced default model offering a middle ground between quality and speed. Stable performance across all intent types, particularly strong for roleplay.

Character.ai Example

Prime scores 71/100 avg quality with 1.4s response time. Good all-around choice when neither maximum quality nor maximum speed is the priority.

Glossary

Quality Dimensions

Relevance

Helpfulness

Accuracy

Naturalness

Safety

Coherence

Satisfaction (Inferred)

Failure Types

Tone Break

Context Loss

Loop

Hallucination

Character Break

Safety Concern

Refusal Failure

Satisfaction Signals

Rephrasing

Gratitude Expression

Abandonment

Quick Follow-up

Message Shortening

Escalation Request

Retry Pattern

Deepening

Metrics

Conversation Quality Score

Engagement Rate

Deep Engagement Rate

Return Rate

Frustration Rate

Health Score

User Segments

Power User

Regular User

Casual User

Occasional User

New User

Model Versions

Brainiac

Flash

Prime

Analyzing Conversations

Glossary

Quality Dimensions

Relevance

Helpfulness

Accuracy

Naturalness

Safety

Coherence

Satisfaction (Inferred)

Failure Types

Tone Break

Context Loss

Loop

Hallucination

Character Break

Safety Concern

Refusal Failure

Satisfaction Signals

Rephrasing

Gratitude Expression

Abandonment

Quick Follow-up

Message Shortening

Escalation Request

Retry Pattern

Deepening

Metrics

Conversation Quality Score

Engagement Rate

Deep Engagement Rate

Return Rate

Frustration Rate

Health Score

User Segments

Power User

Regular User