OpenAI Explains How ChatGPT Developed a Goblin Obsession — and How It Spread

OpenAI has published a detailed explanation of why its ChatGPT chatbot began inserting references to goblins, gremlins, and other mythical creatures into its responses — a quirk that quietly multiplied across model generations before the company could contain it.

The company first noticed the problem with GPT-5.1, when the model began increasingly using creature-based metaphors. An internal investigation found that “goblin” usage in ChatGPT had spiked 175% following GPT-5.1’s launch, while “gremlin” usage rose 52%. Further analysis revealed the model had also developed an affinity for raccoons, trolls, ogres, and pigeons.

The root cause traced back to GPT-5.1’s training process. OpenAI had introduced four distinct chatbot personalities to address user dissatisfaction with the GPT-5 release, one of which was called “Nerdy” — designed to be playful and use quirky language. During training, the company unintentionally rewarded the AI for using creative creature-based metaphors. Although the Nerdy personality accounted for just 2.5% of all ChatGPT responses, it was responsible for 66.7% of all “goblin” mentions during the GPT-5.4 era.

The problem compounded because of how reinforcement learning — the training method used — generalizes behavior. Once a response style is rewarded in a specific context, the model can begin applying it broadly, even in unrelated scenarios. As a result, users who had never selected the Nerdy personality began encountering creature metaphors in their conversations.

To address the issue, OpenAI retired the Nerdy personality with GPT-5.4, removed the reward signal that favored creature metaphors, and filtered training data containing creature-related words. However, GPT-5.5 had already begun training before researchers identified the root cause, meaning the new model still carried the behavior. When testers spotted the issue in Codex, OpenAI’s coding tool, the company added a hardcoded prompt instruction to suppress creature mentions as a temporary fix.

The episode highlights how unintended behaviors can emerge and propagate during AI model training, and how difficult they can be to fully contain once embedded across model generations.

Source: mint – technology

This article was generated by AI and cites original sources.