Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly (Celia Ford/Transformer)

Celia Ford / Transformer: Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly  —  Anthropic's new model appears to use “eval awareness” to be on its best behavior  —  Anthropic's newly-released Claude Sonnet 4.5 is …

Oct 1, 2025 - 05:04
Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly (Celia Ford/Transformer)
Celia Ford / Transformer: Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly  —  Anthropic's new model appears to use “eval awareness” to be on its best behavior  —  Anthropic's newly-released Claude Sonnet 4.5 is …