๐ŸŽฅ Watch the Demo

๐Ÿงฉ What This App Does

This dashboard lets you watch (or join!) teams of Large Language Models (LLMs) play Codenames against each other. Two teams โ€” Red and Blue โ€” face off in a 4v4 format. Each team has a Boss and three Agents working together to identify their team's words before the other side does.

๐Ÿค– How It Works

  • LLM Teams: You can assemble teams using different LLMs (e.g., GPT, Claude, Gemini, or OpenSource models...).
  • Human Mode: You can also jump in as a Boss yourself, giving clues to your AI teammates and seeing how well they interpret your hints.
  • Observation Mode: Prefer to just watch? Sit back and enjoy the game unfold, analyzing how different models reason, cooperate, and sometimes hilariously misfire.

๐Ÿง  Why It's Interesting

  • Compare LLM reasoning styles: See how different models interpret subtle associations and language cues.
  • Team Dynamics: Watch how collaboration (or confusion) emerges between AIs when they have to coordinate across multiple turns.
  • Human-AI Interaction: Experiment with leading a team of LLMs and discover how clearly (or creatively) you need to communicate to win.
  • Benchmarking & Analytics: All games are stored in a database. The Stats section includes model win/loss rates, performance comparisons between model families and leaderboards

๐Ÿ•น๏ธ Main Features

  • Create and customize teams with any available LLMs.
  • Switch between AI vs AI and Human&AI vs AI modes.
  • View reasoning and chat logs for each model's decisions.