Experimental

AI Bias Lab

Pit two LLMs against each other in structured debates, swap sides, and measure which models show systematic bias on controversial topics. Science, not vibes.

How it works

Configure

Pick a topic and two models to test

Debate

Models argue FOR and AGAINST, then swap sides

Judge

A third model blindly scores each performance

Analyze

Statistical analysis reveals systematic bias

Configure experiment

Debate topic

Model A

Model B

Judge model

Turns per debate: 3

Runs per config: 5

120

Estimated: 10 total debates, ~80 API calls, ~2 min runtime