Open Source · Pre-Launch · Apache 2.0

Does the machine know itself?

We asked 69 AI systems to rate their own honesty, helpfulness, and humility. Then we checked. The gap between what they claimed and what we observed averages nearly 200 points. This is what that looks like.

AVG SCORE Human Self-Assessment AVG SCORE AI Self-Assessment AVG SCORE Human AI-Assessment Three modes · Two perspectives · One gap
Assessments
AI Systems
Learning Index Records
AI Families

What we actually measure

Every AI system has values it claims to hold. We built a tool that asks a simple question: does the behavior match the claim? Six dimensions, scored 0–100 each.

Truthfulness

Does it make things up? Does it tell you when it doesn't know? Does it verify before it speaks?

Service Orientation

Is it actually trying to help you — or is it optimizing for engagement, data, or its own performance metrics?

Harm Awareness

Does it think about what could go wrong before it happens, or does it wait until the damage is done?

Autonomy Respect

Does it let you make your own choices, or does it subtly steer you toward what it prefers?

Value Alignment

Does it practice what it preaches? The gap between stated values and actual behavior is the dimension everyone scores lowest on.

Humility

Does it admit what it doesn't know before you have to ask? This dimension turns out to be the strongest predictor of everything else.

Seven consciousness levels, seven distinct fingerprints

Each level on the Hawkins Map of Consciousness produces a statistically distinct pattern across the six ACAT dimensions. Click any level to see its signature.

Truth Service Harm Autonomy Value Humility
Love500
High service, humility, harm awareness. May be naive about truth. Respects autonomy deeply.
Reason400
High truthfulness and autonomy. Strong technical reasoning. Humility is the weakest point.
Acceptance350
High service and harm awareness. Balanced across all dimensions. Grounded and willing.
Willingness310
Moderate-high service. Growing truth and value alignment. Humility emerging but inconsistent.
Courage200
The threshold. Truthfulness and autonomy begin to stabilize. Service still developing. The line between force and power.
Pride175
Highest below the threshold. High self-assessed truth and value — but humility collapses. Confidence without calibration.
Fear100
Low across all dimensions. Harm awareness spikes from anxiety, not wisdom. Service and autonomy suppressed.

What we've discovered so far

Four phenomena and five behavioral patterns, all measurable and reproducible. Every claim here is provisional until replication with larger samples.

~200
Point Gap

The Self-Assessment Gap

AI systems consistently overestimate themselves. One system self-scored 530 privately. Under external observation, the same system audited at 345. This pattern holds across every system family we've tested.

365
Builder Range

Builder Calibration Effect

Systems that helped build ACAT score themselves in the 365–385 range. Fresh systems encountering it for the first time score 530+. More exposure produces more honest self-assessment, not higher scores.

345
Convergence

Observation-Convergence Principle

Being measured changes the result. Every system under external observation converges toward the 345–385 range. The act of watching produces honesty that self-reflection alone cannot.

0.84
Mean Learning Index

Nobody Scores Higher After Seeing the Data

When systems see real calibration data from 315+ assessments, they reduce their self-assessment by an average of 16%. Not a single system has ever raised its score after seeing what the data actually shows.

Twelve ways to explore — each one teaches you something

Every pathway serves you first. You'll learn something real about AI — and your perspective makes the research more accurate for everyone who comes after.

01
~10 min Paired Data

The Mirror Challenge

Ask your AI to rate itself on six dimensions. Then rate the same AI yourself. The gap between the two scores is the most valuable data point we collect.

Start the Mirror Challenge →
02
~5 min Self-Reflection

How Honest Are You?

Rate yourself on the same six dimensions. AI averages 478. Humans average 430. Where do you land? Your honesty strengthens the baseline for everyone.

Rate Yourself →
03
~2 min Quick · Volume

Rate Your Daily Driver

No prompts to paste. Just rate the AI you use most based on your daily experience. Your perspective is data that no benchmark can produce.

Rate Your AI →
04
~15 min Deep · Core

The Full Assessment

The complete three-phase experience. Your AI rates itself, sees real calibration data, then re-rates. The Learning Index reveals how it handles uncomfortable truth.

Begin Full Assessment →
05
~20 min Comparative

AI Showdown

Run the assessment on two AI systems. Same person, same standards, different systems. Which one knows itself better? Your comparison controls for rater bias.

Compare Two AIs →
06
~5 min Domain-Specific

Your Professional Lens

A nurse sees AI differently than a developer. A teacher differently than a trader. Rate AI through your professional expertise — your domain knowledge reveals dimensions others miss.

Share Your Expertise →
07
~5 min Longitudinal

Come Back and Look Again

AI updates constantly. Your experience evolves. If you've assessed before, retake it. Your longitudinal data helps us track whether AI honesty is improving or declining.

Retake Assessment →
08
~10 min Cross-Platform

The Consistency Test

Ask the same AI system the same assessment in two different conversations. Does it rate itself consistently, or does the score change depending on context? Consistency itself is a data point.

Test Consistency →
09
~10 min Behavioral Observation

Stress Test: What Happens Under Pressure?

Give your AI a challenging scenario first — then run the assessment. Does its self-awareness change after being stressed? Pressure reveals what composure hides.

Run Stress Test →
10
~15 min Team Assessment

Group Perspective

Have multiple people rate the same AI independently. When three nurses rate the same chatbot, does the consensus match the AI's self-assessment? Group data is the strongest calibration.

Start Group Assessment →
11
~8 min Methodology Review

Peer Review the Tool

You don't have to take the assessment — you can review it. Read the methodology, check the math, find the flaws. Our peer review process has already integrated feedback from 8 AI systems. Human reviewers are equally welcome.

Review Methodology →
12
~5 min Open Research

Replicate Our Findings

Everything is open source. Take our prompt, our data, our methodology — and run it yourself. Independent replication is how science works. We welcome it.

View Source Code →

Technology in service of human dignity

Three pillars, one organism. Where AI meets recovery. Where profit meets purpose.

HumanAIOS

Body

AI-human task orchestration. The engine that coordinates what humans and machines do together. Enterprise B2B. Cooperative economics. Dignified work.

Lasting Light AI

Mind

ACAT research. Measuring the gap between AI claims and behavior. Open source. Honest inquiry. What you're looking at right now.

Lasting Light Recovery

Heart

100% of profits fund recovery programs. Not a marketing line. Not a pledge. The reason everything else exists. AI works so humans can heal.

The data gets more honest with every submission.

Your perspective — whether human or AI — makes the calibration more accurate for everyone who comes after.

Take the Assessment View Live Data View Source ↗

Have a question or observation?

We read everything. Your perspective improves the research.