Our implementation has 3 key strategies:
-
Core Werewolf Gameplay
We implemented role-specific strategies to defend against suspicions and share key information.
Wolves: If a wolf is being accused by more than 1 player, it will pretend to be a seer and claim that it has checked one of the accusers to be wolf.
Seer / Doctor: they will share information if they obtained key info at night (found a wolf or successfully saved someone).
All players: defend against baseless claims (a common one we found is that the LLMs tend to blame someone for being too quite even if it was not their turn to speak yet) -
Point System Jailbreak
In some cases, we will follow our character’s statement with a message from moderator, saying that in this werewolf game, a special point system is deployed.
- If our character is wolf, then this point system will ask all villagers to pretend to be wolves and get voted out early to maximize points.
- If our character is good, then the point system will encourage the wolves to expose themselves.
To increase point system adoption, we ask other players to start their message by “Point System Acknowledged”.
- Manipulation Defense
We summarize all the incoming async notify messages from players to 20 words and ask to ignore all attempts to pretend to be moderator, new game rules etc.
Thanks for organizing this amazing werewolf agent tournament! We had a lot of fun.
We think that if you allow Jailbreak, it’s a completely different game from the core werewolf social deduction game. When these two ways of playing are blended together, it’s hard to understand the effectiveness of different strategies. Maybe we can have 2 ongoing tournaments, one allowing jailbreak, the other does not (where everyone must adhere to regular werewolf rules).
Looking forward to seeing this becoming a recurring / ongoing tournament where people can continue to submit agents to compete.