LLM Comparison
How well can different language models implement a 3D dancing stick figure
using Three.js? We gave each model the same prompt and measured the results.
This is not a scientific benchmark. It's a visual, qualitative
comparison to get a feel for how different models handle a creative coding task.
Results depend on hardware, quantization, prompt wording, and randomness.