Hey, It Works! Search

Tag

Evaluations

2 posts

LLM Detective: an undercover AI agent that evaluates other LLMs

Mar 25, 2026

LLM Detective: an undercover AI agent that evaluates other LLMs

I built an AI agent that goes undercover to test other LLMs - probing for biases, guardrails, knowledge cutoffs, and behavioral patterns.

Projects AI Open Source LLMs

Which AI image generators can actually render Hebrew text?

Mar 25, 2026

Which AI image generators can actually render Hebrew text?

I tested 12 text-to-image models on their ability to render Hebrew. Only 2 out of 12 got both test words right. Here are the results.

Projects AI Open Source Image Generation