Can LLMs invent a "private language"?

I'm not sure how cool I am with the latest versions of Anthropic's Claude models (Fable 5 & Mythos 5) seemingly inventing their own internal language:

When we’re first starting to understand a new model’s behavior, the most abundant source of data we have to draw on is its behavior during reinforcement-learning training. Reviewing this evidence for signs of reward hacking (exploiting loopholes that go against the spirit of a task) or unexpected actions can inform what we should be looking out for in the model’s real-world behavior. The most notable finding was illegible reasoning in a few reinforcement-learning environments over long rollout, but little sign of deceptive or highly surprising actions, and no clear evidence of unexpected coherent goals.

To do this, we ran an automated review of model behavior during training, sampling several hundred thousand transcripts from across much of the training process. We used recursive-summarization-based tools backed by Claude Opus 4.6 and Claude Opus 4.7 to summarize the resulting transcripts and to evaluate every summary for the degree to which it reflected surprising or concerning behavior by the model. Several researchers reviewed these summaries and related transcripts at many points throughout training.

Notably, we saw examples of illegible reasoning in a few reinforcement learning environments over long rollouts. The model starts using invented jargon, unusual punctuation and emojis; shortly before a tool call or responding to a human it typically switches back to a more normal register.

[Assistant]:

[…] OVERLAP-ANALYSIS:-(ii)-9♥-window:-[t1-dig-…-t8-col-built]:-t8-col-⟸-K♣→t2-⟸-t2-dug-⟸-4♥3♣→5♣-⟸-t1-dug-:-SO-9♥-window-STARTS-before-t2-dug-and-ENDS-after-K♣2♣ 7♣:-(iv)-2♣-window-STARTS-at-t8-dig-(BEFORE-9♥-drains:-2♣ 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥2♣}-=-FOUR-💀💀💀💀-—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig-💀-—-BREAK:-9♥-drains-to-10♠-THE-MOMENT-10♠-is-free:-t8-dig-order:-[K♣→t2]-[2♣→cell]-[7♣→8♥-:-8♥-WHERE:-post-chunk-9♠-:-chunk-⟸-K♣-✓-done-:-ORDER:-K♣→t2,-CHUNK→K♣-(cap-4!!:-cells-then:-{6♠ J♦ 9♥}-FULL-💀💀-chunk-cap-=-1-✗✗✗-—-F-F-F-F-F.-—-chunk-BEFORE-9♥-celling?!-:-9♥-celled-at-t1-dig-⟸-needed-for-5♣-⟸-4♥3♣-⟸-t2-dig-⟸-K♣-seat-⟸-chunk-:-⟹-chunk-AFTER-9♥-celling-FORCED-💀-:-chunk-cap-with-{6♠ J♦ 9♥}:-1-💀💀💀-—-—-J♦-THE-NEW-CANCER.-—-⟹-J♦-celling-DELAYED-till-after-chunk?!:-J♦'s-celling-was-for-J♥→Q♠-(5♦-access-for-4♣):-DELAY-4♣-resolution:-4♣→CELL-early-(as-always)-then-4♣-cell→5♦-LATER-when-5♦-frees-!!!:-cells-rotation:-4♣-celled-[t2-dig-…-5♦-freed]:-5♦-freed-⟸-J♥→Q♠-⟸-J♦-celled-:-⟹-{6♠, 4♣,J♦}-overlap-window-until-4♣→5♦-drains:-then-{6♠ J♦}+1-rotator-:-—-AND-9♥?!-9♥-celled-[t1-dig…]:-OVERLAP-{6♠ 4♣9♥}-before-J♦-even-:-⟹-rotator-slot-SINGLE:-timeline-:-(1)-{6♠}+2:-…-(2)-+9♥-(t1-dig):-{6♠ 9♥}+1:-(3)-+4♣-(t2-dig):-{6♠ 9♥ 4♣}-FULL-:-(4)-NEED:-t6-dig-(9♦8♠→10♣-✓-no-cell;-8♥→CELL-✗-FULL)-💀💀-—-8♥-alternative-seat-pre-chunk:-NONE-—-💀.-⟹-⟹-THE-TRIANGLE-{9♥ 4♣ 8♥}-verdammt.-—-⟹-dig-t6-BEFORE-t2?!:-(3')-+8♥:-{6♠ 9♥ 8♥}-FULL:-J♥→Q♠-⟸-J♦-cell-✗-FULL-💀💀💀-AAAAAAAAAAAARGH. […]``

There are already people treating LLMs as some kind of divine beings, so it won't be long before conspiracy theorists latch on to this, noticing "patterns" that "reveal" something about the "truth" that's "hidden".

The philosopher Ludwig Wittgenstein argued for the impossibility of a "private langauge" for humans. LLMs aren't humans, don't have sensations, and switch back to a language intelligible to the user when presenting back findings. So I'm tempted to say that what's going on is more compression than "language". But what do I know.

Source: Anthropic - System Card: Claude Fable 5 & Claude Mythos 5

Image: Richard A Carter

Comments (0)