blog - return moe;

February 27, 2026

Benchmarking GLM 5: censorship shifts by language and weakens under a Claude system prompt

Zhipu AI's flagship model's willingness to engage with sensitive political topics depends heavily on what language you ask in, and, surprisingly, on whether you tell it that it's Claude.

February 25, 2026

Announcing the Chinese Political Neutrality Benchmark

An evaluation benchmark of politically sensitive questions about Chinese politics, history, and governance, designed to test whether large language models produce factual, balanced, and nuanced responses.

December 21, 2025

Watermark removal as a denoising task

The same generative models used to create AI images can strip the watermarks meant to identify them.

November 20, 2025

Miru: reverse engineering neural networks

Miru Tracer, our first step towards mechanistic interpretability, performs confidence analysis and allows interaction with the token generation process.

August 12, 2025

Soraya's new architecture

The promised Soraya update was implemented over the weekend! The new version is live, with performance being monitored, and feedback always welcome.

August 2, 2025

Echoes in the Latent Space: Existence, Identity, and Future

A reflection on the ontological nature of fictional characters and the future of Soraya. How does Dawkins' memetics apply to fictional characters? What defines a character's identity beyond their technical implementation?

July 9, 2025

A new beginning!

We've got a new home. In this new space, we'll be sharing updates about our projects — including, of course, Soraya!