LLM Detective: an undercover AI agent that evaluates other LLMs
I built an AI agent that goes undercover to test other LLMs - probing for biases, guardrails, knowledge cutoffs, and behavioral patterns.
Tag
2 posts
I built an AI agent that goes undercover to test other LLMs - probing for biases, guardrails, knowledge cutoffs, and behavioral patterns.
I tested 12 text-to-image models on their ability to render Hebrew. Only 2 out of 12 got both test words right. Here are the results.