Blog
Benchmarking GLM 5: censorship shifts by language and weakens under a Claude system prompt
Zhipu AI's flagship model's willingness to engage with sensitive political topics depends heavily on what language you ask in, and, surprisingly, on whether you tell it that it's Claude.
Announcing the Chinese Political Neutrality Benchmark
An evaluation benchmark of politically sensitive questions about Chinese politics, history, and governance, designed to test whether large language models produce factual, balanced, and nuanced responses.
Watermark removal as a denoising task
The same generative models used to create AI images can strip the watermarks meant to identify them.
Miru: reverse engineering neural networks
Miru Tracer, our first step towards mechanistic interpretability, performs confidence analysis and allows interaction with the token generation process.
Soraya's new architecture
The promised Soraya update was implemented over the weekend! The new version is live, with performance being monitored, and feedback always welcome.
Echoes in the Latent Space: Existence, Identity, and Future
A reflection on the ontological nature of fictional characters and the future of Soraya. How does Dawkins' memetics apply to fictional characters? What defines a character's identity beyond their technical implementation?
A new beginning!
We've got a new home. In this new space, we'll be sharing updates about our projects — including, of course, Soraya!