teaching_llm_agents

Evals

OpenAI

Rethinking evals and software engineering and testing

Article on evaluating voice agents
🧩 🚀 engineers should focus on creating good test sets and good metrics
🧩 🚀 this is the new unit testing
See this lecture from deeplearning.ai on how Canvas was created
Decision boundaries of when to trigger Canvas (help me write a blog post) vs. when not to (help me cook a new recipe)
Does a Canvas model suggest high quality edits? Use human evaluations (acuracy and quality)

This site is open source. Improve this page.