Mapping the multimodal AI landscape with a structured taxonomy
AI platforms make it hard to filter models by multimodal capabilities. I built an open-source taxonomy that maps which inputs produce which outputs.
Page 4 of 11 · 130 posts total
AI platforms make it hard to filter models by multimodal capabilities. I built an open-source taxonomy that maps which inputs produce which outputs.
I built a body language analysis app using Google AI Studio's vibe coding interface and Gemini's multimodal vision. Upload a photo, get expert-level analysis.
I built an AI agent that goes undercover to test other LLMs - probing for biases, guardrails, knowledge cutoffs, and behavioral patterns.
Claude Code isn't just for writing software. I've been collecting projects that use it for research, writing, budgeting, therapy tracking, and more.
I tested 12 text-to-image models on their ability to render Hebrew. Only 2 out of 12 got both test words right. Here are the results.
Traditional resumes are built for human eyes. I created an open JSON schema that gives AI recruiters and screening agents structured, queryable candidate data.
A Gradio web app that uses AI to recommend existing open source licenses or generate custom ones based on your requirements described in plain English.
An AI-powered React app that analyzes how different countries approach policy challenges, with interactive clustering visualizations powered by Gemini.
An automated pipeline that converts raw voice recordings into polished blog posts using audio preprocessing, Gemini transcription, and AI-powered formatting.
A voice analysis application built with Google AI Studio and the Gemini API, exploring multimodal AI capabilities for audio processing.
A KDE Plasma 6 widget displaying both Gregorian and Hebrew calendar dates with sunset-aware transitions and multiple format options.
A Python tool with a local web UI for extracting, reconstructing, and preserving WhatsApp chat exports with voice transcription and anonymization.