Code Agent Daily

An open-source, agent-driven daily briefing system.

View on GitHub LLM powered by GLM Coding Plan Code by Claude Code

Weekly Report

Dec 25, 2025 – Dec 31, 2025

A curated summary of the most important updates in AI from the last 7 days.

New Products

Anthropic Agent Skills

A new open standard protocol released on December 24, 2025 for AI interoperability. Agent Skills provides a universal language for AI agents to communicate and collaborate across different platforms and tools.

Dec 25 Financial Content Markets

MiniMax M2.1

Chinese AI startup MiniMax's latest model with significantly enhanced performance for real-world applications and multi-language programming versatility, positioning it as a competitive option for AI-powered coding tasks.

Dec 25 SiliconAngle

GLM-4.7

Zhipu AI's latest flagship model specifically designed for 'Agentic Coding' scenarios, achieving 73.8% on SWE-bench benchmark with enhanced programming abilities, better long-term task planning, improved tool collaboration, and more stable multi-step reasoning.

Dec 25 Zhipu AI Official

Zenflow

A free AI orchestration desktop platform that brings engineering discipline to AI coding. Analyzes developer tasks, plans work, assigns steps to specialized coding agents, and uses multi-agent verification with specification-driven workflows.

Dec 25 PR Newswire

GitHub Spark

An AI-powered prompt-based app builder that lets anyone build micro-apps without writing code. Users describe their idea in natural language and Spark transforms it into a working application.

Dec 25 DevOps Digest

Vibe Pocket

A cloud-based platform that enables running AI coding agents like Claude Code, Codex, and OpenCode on mobile devices or web. Developers connect GitHub, pick an agent, and start building from any device, transforming mobile phones into coding studios.

Dec 26 Product Hunt

Toad

A unified terminal UI for AI coding agents that supports 12+ agent CLIs including OpenHands, Claude Code, and Gemini CLI through the ACP protocol. Provides a refined terminal experience with fuzzy file search, Markdown streaming, shell integration, and notebook-style persistent conversations.

Dec 26 Official Blog

Google Agent Designer (Vertex AI)

A low-code visual designer launched in Google Cloud Vertex AI that allows users to design, test, and orchestrate AI agents and subagents through a visual interface. Available in Preview, it enables enterprise development teams to build AI agents without extensive coding.

Dec 26 Google Cloud

Cursor acquires Graphite

Cursor, an AI-powered code editor, has acquired Graphite, a code review platform. Graphite specializes in AI-powered code review workflow that helps teams review changes more efficiently by breaking large changes into smaller, connected pieces and includes an AI helper that can fix broken tests automatically.

Dec 28 Fortune

GPT-5.2-Codex

OpenAI's agentic coding model designed for professional software engineering and defensive cybersecurity. Built on the GPT-5.2 architecture, it represents the biggest leap in agentic coding since GPT-5, offering improved instruction adherence, tool grounding for multi-step execution, and stronger long-context performance for complex coding workflows.

Dec 28 OpenAI Official

Agent Skills

Anthropic's open standard for AI agent capabilities that enables cross-platform portability of agent skills. Skills are special sets of instructions that teach AI agents how to handle specific work tasks, allowing them to be portable across different AI platforms including OpenAI's ChatGPT and Codex.

Dec 28 Anthropic Official

GitLab 18.7

GitLab release with enhanced AI capabilities including improved Duo analytics dashboard, AI-powered model chat selection, Duo Data Analyst beta, AI-powered SAST false positive detection, and new building blocks for the upcoming Duo Agent Platform GA. Focuses on advancing AI automation, governance, and developer experience.

Dec 29 GitLab

Google Agent Development Kit for TypeScript

An open-source, modular framework that brings a code-first approach to building AI agents with TypeScript and JavaScript. Enables developers to build autonomous multi-agent AI systems using familiar software engineering practices, optimized for Gemini and the Google ecosystem.

Dec 29 Google Developers Blog

Wafer

GPU development stack that lives inside your IDE, bringing together profiling, compiler explorer, and GPU documentation in one place. Eliminates context switching when writing GPU kernels by integrating fragmented tools directly into the editor workflow.

Dec 30 Product Hunt

GitHub Copilot Agent Skills

A new feature for GitHub Copilot that allows developers to create custom agent capabilities and skills. Agent Skills enables more customized and powerful agent-based workflows within the Copilot ecosystem, integrating with VS Code and supporting multi-agent orchestration for AI-assisted development tasks.

Dec 31 GitHub Blog

cto bench

A ground truth code agent benchmark that measures AI coding agent performance. The benchmark tracks merged code as a percentage of completed tasks and reports a 4-day rolling success rate with a 1-day lag for task resolution.

Dec 31 Product Hunt

Netlify AI Gateway

A developer tool that simplifies using AI inference in code by eliminating the need to manage API keys or create accounts with AI providers. Provides access to 30+ AI models with unified monitoring of usage and costs.

Dec 31 Netlify Press Release

New Features

Cursor Visual Editor Launch

Cursor launched a Visual Editor feature that allows developers to modify UI directly within the editor without leaving, eliminating the need for Figma-to-code workflows. The feature enables drag-and-drop interface editing and seamless AI integration.

Dec 25 Usama Codes

Codex CLI 0.76.0 Released

OpenAI released Codex CLI version 0.76.0 with DMG support for macOS, skills improvements including 'always-on' and enterprise-friendly features, short descriptions for better discoverability, and admin scope configuration.

Dec 25 OpenAI

GPT-5.2-Codex Launch

OpenAI announced GPT-5.2, described as 'smarter and more useful,' alongside the specialized GPT-5.2-Codex model for agentic coding tasks.

Dec 25 OpenAI

Gemini 3 Flash Preview and CLI Improvements

Google launched Gemini 3 Flash Preview delivering 'fast frontier-class' performance. The CLI received documentation updates and Code Assist telemetry improvements for tracking user accept/reject behavior on suggestions.

Dec 25 Google

Claude Code December 2025 Performance and Features

Claude Code released version 2.0.73 with CLI changes and new features including LSP (Language Server Protocol) tool support for code intelligence features like go-to-definition, find references, and hover documentation. Added multi-terminal setup support, syntax highlighting toggle, and current usage tracking.

Dec 25 GitHub

Windsurf GPT 5.2 Model Availability

Windsurf added GPT 5.2 model support to its AI coding assistant, expanding the available model options for users.

Dec 25 Windsurf

Cursor CEO warns about 'vibe coding' building shaky foundations

Cursor CEO Michael Truell (age 25) warned that over-reliance on AI-generated 'vibe coding' creates unstable foundations for software projects, noting that while it works for simple tasks, 'eventually things start to crumble' as complexity increases.

Dec 26 Fortune

Claude Code Christmas usage limit doubling

Holiday promotion offering 2x usage limits for Claude Code users from December 25-31, 2025.

Dec 26 Apidog

Claude Code Browser Control Feature with Chrome Integration

Claude Code launched a major new capability allowing full browser automation through Chrome integration. The AI can now navigate pages, click buttons, fill forms, read console logs, monitor network requests, and record GIFs of browser interactions - all while keeping the browser visible (not headless).

Dec 27 Claude Code

Cursor 2.2 with Debug Mode and Visual Editor

Major release introducing Debug Mode with runtime thinking capabilities that let AI see actual program state during execution, plus a Visual Editor for natural language-based web design with point-and-click editing. Also includes Multi-Agent Judging and Plan Mode improvements.

Dec 27 Cursor

GitHub Copilot Agent Skills Support

GitHub Copilot now supports Agent Skills, a new feature allowing developers to create specialized, repeatable tasks through folders containing instructions, scripts, and resources. Skills load dynamically for better performance and reduced context consumption.

Dec 27 GitHub

GitHub Copilot Free Enterprise for CNCF Maintainers

GitHub announced free GitHub Copilot Enterprise access for CNCF (Cloud Native Computing Foundation) project maintainers as part of investment in open source ecosystem support.

Dec 27 CNCF

Codex CLI GPT-5.2-Codex Introduction

OpenAI introduced GPT-5.2-Codex, a version of GPT-5.2 further optimized for agentic coding in Codex CLI with improvements for long-horizon work, enhanced long-context understanding, improved tool calling reliability, and better factuality.

Dec 27 OpenAI

Gemini CLI Version v0.22.0 with Gemini 3 Free Tier

Gemini CLI released v0.22.0 with major features including Gemini 3 Flash availability for free tier users, experimental permission improvements with new policy engine, model in history toggle, hook improvements, simplified tool confirmation, and session auto-save.

Dec 27 GitHub

Windsurf Wave 13 with Multi-Agent Sessions and GPT-5.2 Integration

Windsurf released Wave 13 featuring first-class support for parallel, multi-agent sessions, GPT-5.2 integration (announced December 11), and various bug fixes. Updates throughout December included performance improvements, PowerShell fixes, and general stability enhancements.

Dec 27 Windsurf

Cursor 2.3: Layout Customization and Stability Improvements

Cursor released version 2.3 on December 22, 2025, featuring layout customization options for workspaces with four default layouts. The update focuses on addressing user concerns about UI stability following the major 2.2 release, with bug fixes for disappearing chat history and improved auto-run behavior.

Dec 28 Cursor

Windsurf GPT-5.2 Now Available

Windsurf announced GPT-5.2 is now available, described as 'the biggest' model update. The model is available for 0x credits (free for paid users) for a limited time, bringing enhanced capabilities to the AI-native IDE.

Dec 29 Windsurf

GitHub Copilot Memory Feature Public Preview

GitHub Copilot Memory feature is now in public preview for Pro and Pro+ subscribers, allowing Copilot to learn from your codebase and remember context across sessions for improved project understanding over time.

Dec 29 GitHub

Cursor 2.2: Debug Mode and Visual Editor

Cursor released version 2.2 with Debug Mode that automatically instruments apps with runtime logs to identify and fix bugs, a Visual Editor for the Cursor Browser that unifies design and code work, multi-agent judging for code quality evaluation, and improved Plan Mode.

Dec 29 Cursor

Claude Code Async Background Agents

Claude Code now supports asynchronous background agents that can run tasks in parallel while you continue working. Agents can be sent to background with Ctrl+B and automatically hook back in when complete, enabling parallel AI development workflows.

Dec 29 Reddit

Claude Code LSP Support and VS Code Integration

Claude Code released native LSP (Language Server Protocol) support for better code intelligence and language understanding, along with improved VS Code extension integration including multi-terminal setup support and syntax highlighting toggle.

Dec 29 Claude Code

Continue CLI Beta v1.5.29

Continue released multiple CLI beta versions in late December 2025, including v1.5.29-beta.20251229, bringing ongoing improvements and bug fixes to the AI coding assistant.

Dec 29 GitHub

Aider v0.86.0 - Expanded GPT-5 Support

Aider released v0.86.0 with expanded GPT-5 model support across family variants and providers (OpenAI, Azure, OpenRouter), including dated and chat/mini/nano variants. Aider wrote 88% of the code in this release.

Dec 29 Aider

Devin December 2025 Agent Upgrade

Devin released a major update on December 19, 2025, upgrading all enterprise customers to the newest version of Devin with enhanced capabilities. The v3 API also received updates with enterprise and organization-level Notes and Playbooks routers.

Dec 29 Devin

Cursor Layout Customization and Stability Improvements - Holiday Release

Focused entirely on fixing bugs and improving stability across the core agent, layout controls, and viewing code diffs. Introduced layout customization with four default layouts accessible via Cmd+Option+Tab. Features slow rollout over the week to prevent regressions during holiday coding.

Dec 30 Cursor

Cursor Enterprise Features: Conversation Insights, Billing Groups, Service Accounts

Added conversation insights to analyze code and context in each agent session. Introduced billing groups for fine-grained usage tracking and spend mapping. Service accounts for non-human accounts with API keys for secure automation. Linux sandboxing for agents with scoped workspace access.

Dec 30 Cursor

Claude Code: Anthropic acquires Bun as Claude Code reaches $1B milestone

Anthropic announced its first acquisition - Bun, a high-performance JavaScript runtime - as Claude Code reaches $1 billion in run-rate revenue in just 6 months since launch.

Dec 26 Anthropic

Codex CLI: Command injection vulnerability fix

Security vulnerability (CVE) documented in December changelog: Codex CLI v0.23.0 from August 2025 prevents .env files from silently redirecting CODEX_HOME to address command injection vulnerability disclosed by Check Point Research.

Dec 26 Check Point Research

Windsurf Version 1.13.104 released

December 24th release of Windsurf editor version 1.13.104 with performance improvements and bug fixes.

Dec 26 Windsurf

Windsurf Wave 13: Multi-agent sessions support

Wave 13 release (Merry Shipmas) introduces first-class support for parallel, multi-agent sessions in Windsurf Next, enabling multiple AI agents to work simultaneously.

Dec 26 Windsurf

Windsurf JetBrains plugin v2.10.7 with Cascade UI improvements

December 17-18 update for Windsurf JetBrains plugin featuring revamped Cascade bar UI, added keyboard shortcuts, improved file search performance, and fixes for diagnostics downloading during file renames.

Dec 26 Windsurf

Claude Code CLI 2.0.70 released with 13 improvements

Major CLI update including Enter key to accept and submit prompt suggestions immediately, Tab for editing acceptance, and 11 other improvements to the command-line interface.

Dec 26 Reddit

Gemini CLI v0.22.0: Gemini 3 for free-tier users

Version 0.22.0 released making Gemini 3 available to free-tier users when preview features are enabled. Includes Code Assist backend telemetry for tracking user accept/reject of suggestions.

Dec 26 GitHub

Claude Code Browser automation feature

New browser automation capabilities allowing Claude Code to navigate pages, click buttons, fill forms, read console logs, monitor network requests, and record GIFs of browser interactions.

Dec 26 Medium

Claude Code Slack integration and Skills enhancements

New Slack integration capability added to Claude Code, along with enhanced Skills feature for teaching Claude repeatable workflows and performance improvements including removal of Opus-specific usage caps.

Dec 26 WebProNews

Windsurf GPT-5.2 integration with 0x credits promotion

GPT-5.2 became available in Windsurf on December 11, 2025. Windsurf offered 0x credits (free access) for paid users for a limited time to test the new model.

Dec 26 Windsurf

Cursor CEO strategy: Product over models to compete with OpenAI/Anthropic

CEO Michael Truell explains Cursor's strategy focusing on product experience and UX over foundation models, believing Cursor can compete with AI labs through superior agent capabilities and developer experience.

Dec 26 TechCrunch

GitHub Copilot December updates: Spaces, Visual Studio, and model options

Updates to Copilot Spaces for better grounding in curated project context, along with Visual Studio improvements and expanded model options.

Dec 26 Visual Studio Magazine

GitHub Copilot: Claude Opus 4.5 available in multiple IDEs

Claude Opus 4.5 expanded availability to multiple IDEs including Visual Studio, JetBrains, Xcode, and Eclipse, building on the initial preview announcement.

Dec 26 GitHub

New Technologies

Holistic Evaluation of State-of-the-Art LLMs for Code Generation

Comprehensive empirical evaluation of six state-of-the-art LLMs (DeepSeek-R1, GPT-4.1, Claude-3.7, DeepSeek-V3, Qwen2.5-Coder, Llama-3.3) for code generation using 944 LeetCode problems across five languages. Key findings: DeepSeek-R1 and GPT-4.1 consistently outperform others in correctness, efficiency, and robustness; Python3 and JavaScript have fewer compile/runtime errors; Algorithmic suboptimality is common, especially in Llama-3.3.

Dec 25 ArXiv

Semi-Supervised Learning for Large Language Models

ArXiv paper submission exploring semi-supervised learning approaches for training large language models. Addresses challenges in LLM training efficiency and effectiveness through hybrid learning methodologies.

Dec 26 ArXiv

Formal Verification of Neural Networks with Early Exits

ArXiv paper addressing the intersection of efficiency and safety in neural networks through formal verification techniques for models with early exit capabilities. Critical for AI safety in production systems.

Dec 26 ArXiv

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning

ArXiv paper presenting RevFFN, a new approach for memory-efficient full-parameter fine-tuning of large language models. Addresses key challenge of memory constraints in LLM adaptation.

Dec 26 ArXiv

2025 LLM Year in Review: Reinforcement Learning from Verifiable Rewards

Andrej Karpathy's comprehensive review of 2025 LLM developments highlighting RLVR (Reinforcement Learning from Verifiable Rewards) as the major new paradigm. Covers 6 paradigm shifts: RLVR emergence, Ghosts vs Animals/Jagged Intelligence, Cursor as new LLM app layer, Claude Code as AI that lives on your computer, Vibe Coding as new programming paradigm.

Dec 26 Blog

MicroQuickJS for Safe Sandboxing

Fabrice Bellard's MicroQuickJS is a JavaScript engine for embedded systems (10 kB RAM, 100 kB ROM). Simon used Claude Code to investigate it as a safe sandboxing environment for untrusted code from LLMs. Found it very well-suited with robust memory/time limits, no dangerous primitives, and regex engine protecting against exhaustion attacks.

Dec 27 Blog

How AI coding agents work—and what to remember if you use them

Comprehensive technical deep-dive into how AI coding agents function under the hood. Explains the multi-agent architecture (supervising LLM that delegates to parallel LLMs with tools), context management challenges ('context rot' and diminishing returns), context compression techniques, and the orchestrator-worker pattern. Highlights that agents typically use 4x more tokens than chatbots, and multi-agent systems use 15x more.

Dec 30 Ars Technica

AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts

Research paper analyzing security implications and ecosystem impacts of AI-generated code in real-world software projects. The study examines how AI coding tools are changing software development practices and identifies potential security vulnerabilities introduced by AI-generated code.

Dec 30 ArXiv

LLM Benchmark Wars of 2025: Claude Dominates Real-World Coding Tasks

Comprehensive analysis of LLM benchmarks in December 2025 reveals Claude dominates with Sonnet 4.5 and Opus 4.5 outperforming OpenAI's flagship models on real-world coding tasks. The report questions what benchmark numbers actually tell us about model capabilities, noting significant gaps between synthetic benchmarks and practical coding performance.

Dec 31 Medium

Gemini 3 Pro vs GPT-5.2 vs Claude Opus 4.5 vs Grok 4.1: December 2025 Showdown

Comprehensive model comparison as of December 13, 2025 evaluates top AI models across different use cases. Results show Claude Opus 4.5 Thinking 32k is best for building web apps. The comparison covers Gemini 3 Pro, GPT-5.2, Claude Opus 4.5, and Grok 4.1 across various dimensions including coding capability, reasoning depth, context window, and practical applications.

Dec 31 Fello AI

arXiv Paper: Exploring Vertical-Domain Reasoning Capabilities with LLMs

New arXiv paper extends general-purpose LLM research to vertical-domain accounting reasoning. The research analyzes relationships between LLM reasoning ability and domain-specific applications, exploring how well general LLMs can transfer reasoning capabilities to specialized domains like accounting.

Dec 31 ArXiv

arXiv Paper: Plan Reuse Mechanism for LLM-Driven Agents in AIoT Applications

New arXiv paper focuses on LLM-driven agents and AIoT applications, with specific mention of Xiaomi's Xiao Ai and OPPO's Xiaobu Assistant. The research introduces a plan reuse mechanism for improving efficiency of LLM-driven agents. Addresses practical deployment of AI agents in consumer electronics and IoT devices.

Dec 31 ArXiv

Others

What a Year for AI Coding

A developer with almost three decades of experience reflects on the dramatic evolution of AI coding in 2025. At the start of 2025, LLMs were 'very sketchy, creating as many problems as they solved.' By December 2025, the author is confident that 'the best coding LLMs are better than 95% of software developers out there.'

Dec 25 Reddit

Your Job is to Deliver Code You Have Proven to Work

Simon Willison argues that AI-assisted developers must deliver proven code, not just generate it. The piece critiques junior engineers who submit giant untested PRs expecting code review to catch issues. Willison outlines two essential steps: Manual testing - actually seeing the code work; Automated testing - now easier with LLMs.

Dec 25 Blog

Browser Agent Success Story

First successful use of a browser agent (Claude in Chrome extension) to solve a real problem - finding CORS configuration in Cloudflare dashboard. Took 1m45s to find the exact solution. Despite concerns about prompt injection risks, this was a very positive experience.

Dec 25 Blog

Cooking with Claude - LLMs for Culinary Tasks

Simon Willison describes using LLMs for cooking, evolving from basic recipes to advanced tasks. Successfully had Claude vibe-code a custom application to help with timing for complicated meal preparation. Demonstrates practical LLM use beyond coding.

Dec 25 Blog

2025 LLM Year in Review

Andrej Karpathy's comprehensive review of paradigm shifts in LLMs during 2025, covering: RLVR (Reinforcement Learning from Verifiable Rewards) as a new training stage; Understanding LLMs as 'ghosts' not 'animals'; Cursor and the new layer of LLM applications; Claude Code as the first convincing LLM agent that runs locally; 'Vibe coding' - AI enabling programming through natural language.

Dec 25 Blog

Agent Skills Open Standard

Anthropic has turned their skills mechanism into an 'open standard' living in an independent agentskills/agentskills GitHub repository. The specification is tiny but deliberately under-specified. Adoption promoted by OpenCode, Cursor, Amp, Letta, goose, GitHub, and VS Code.

Dec 25 Blog

GPT-5.2-Codex Release

OpenAI released GPT-5.2-Codex, optimized for agentic coding with improvements on long-horizon work through context compaction, stronger performance on large code changes, improved Windows performance, and enhanced cybersecurity capabilities. Scores 64% on Terminal-Bench 2.0.

Dec 25 Blog

The State of AI Coding Report 2025

Hacker News discussion on the state of AI coding in 2025. An AI code review agent used by 2,000 companies from startups like PostHog, Brex, and Partiful to F500s and F10s shared insights about approximately a billion lines of code reviewed.

Dec 25 Hacker News

A Guide to Local Coding Models

Hacker News discussion comparing local coding models. Notes that OpenAI's Codex is charged significantly lower than Claude, with suggestions that $100-200/month is worth it for serious developers. Discussion covers the trade-offs between local and cloud-based coding models.

Dec 25 Hacker News

LLM Year in Review - Hacker News Discussion

Hacker News discussion on Andrej Karpathy's 2025 LLM Year in Review. Commenters highlight that Claude Code has catapulted their performance at least 5x, and with minimal cost for writing tests, they're able to achieve higher code quality.

Dec 25 Hacker News

Scaling LLMs to Larger Codebases

Comprehensive Hacker News discussion about LLM workflows for handling larger codebases as of December 2025. Key insights: Effective framework Research → Plan → Clear → Execute Plan → Review & Test; Sonnet/Opus and GPTCodex can now fire off subagents during exploration; Claude Code emerges as first convincing LLM Agent that runs locally.

Dec 26 Hacker News

AWS CEO: Replacing Junior Devs with AI is 'One of the Dumbest Ideas'

Matt Garman, AWS CEO, criticizes replacing junior developers with AI in a December 2025 interview. Key points: Juniors are often the most experienced with AI tools; They're the least expensive, so cost optimization arguments don't hold; Eliminating talent pipeline breaks the future of the organization.

Dec 26 Hacker News

The Most Important AI Stories This Week

AI Explained YouTube video covering major AI developments from December 22, 2025. Topics include Google Flash, Amazon's AI moves, OpenAI fundraising developments, and Bernie Sanders' proposed moratorium on data center construction.

Dec 26 YouTube

The Download: AI Doomers and Tech Developments

MIT Technology Review's daily newsletter covering technology developments including AI news and analysis from December 19, 2025. Part of ongoing coverage of AI's impact on technology and society.

Dec 26 MIT Technology Review

As 2025 ends, a failed AI prediction: 'LLM hallucinations...'

Reddit discussion reflecting on failed AI predictions at the end of 2025. The community notes how 'completely resigned to the fact that hallucinations are inherent to LLMs' they have become, contrary to earlier predictions that this problem would be solved by 2025.

Dec 27 Reddit

A guide to local coding models

Hacker News discussion about local coding models for development. The community explores options for running LLMs locally for coding tasks, with one commenter stating 'If you aren't using coding models you aren't ahead of the curve.'

Dec 27 Hacker News

December 2025 Guide To Popular AI Coding Agents

Comprehensive Reddit guide to AI coding agents as of December 2025. The post notes 'there are now so many options it's hard to keep track of them all' and provides an overview of various AI coding agents and their capabilities.

Dec 27 Reddit

uv-init-demos: Python project initialization options

Willison created a GitHub repository demonstrating all the different options of uv's init command using Claude to help generate the content. The project uses a GitHub Actions script to stay up-to-date with uv releases.

Dec 28 Blog

Cooking with Claude

Simon Willison describes using LLMs for cooking, progressing from basic recipes to advanced tasks. Most notably, he used Claude to 'vibe-code' a custom application for complex meal preparation timing.

Dec 28 Blog

MicroQuickJS for sandboxing LLM-generated code

Willison explores Fabrice Bellard's MicroQuickJS (JavaScript engine for embedded systems) as a sandboxing environment for running untrusted code from LLMs. He used Claude Code to investigate building Python bindings, testing it as a sandbox, and compiling to WebAssembly.

Dec 28 Blog

First success using browser agent (Claude in Chrome) to solve a Cloudflare CORS problem

Willison reports using Claude in Chrome extension to solve a real problem: figuring out how Cloudflare Transform Rules were configured for open CORS policy. The agent found the solution in 1m45s by navigating the Cloudflare dashboard.

Dec 28 Blog

Hacker News: By what percentage has AI changed your output?

Ask HN discussion requesting developers to quantify productivity changes compared to ~2 years ago before AI coding tools became prevalent. Community discussion on real-world impact of AI coding assistants.

Dec 29 Hacker News

Top Stories of 2025: Agents Write Code Faster, Cheaper

DeepLearning.ai's year-end review highlights that coding agents using the latest LLMs now routinely complete more than 80% of SWE-Bench tasks, up from 13.86% in 2024. Claude Code, Google Gemini CLI, and OpenAI Codex emerged as competitive battlegrounds.

Dec 29 DeepLearning.ai

Top Stories of 2025: Thinking Models Solve Bigger Problems

DeepLearning.ai reports that reasoning models became standard in 2025, with OpenAI o1 preview outperforming GPT-4o by 43 percentage points on AIME 2024 math problems. Reasoning models achieved 62nd percentile on Codeforces vs 11th for non-reasoning models.

Dec 29 DeepLearning.ai

uv-init-demos: Exploring uv init options

Simon created a GitHub repository demonstrating all the different options of uv's init command, generated using a Claude Code prompt and scheduled to run via GitHub Actions to capture changes from future releases.

Dec 29 Blog

MicroQuickJS as a robust sandboxing environment for LLM-generated code

Simon Willison explores using MicroQuickJS as a sandboxing solution for executing untrusted code from LLMs. The engine has robust memory and time limits baked in, doesn't expose dangerous primitives, and has a regex engine that protects against exhaustion attacks.

Dec 29 Blog

Cooking with Claude: vibe-coding custom applications

Simon describes using LLMs for cooking tasks, progressing from basic recipes to advanced applications. He had Claude vibe-code a custom application to help with timing for complicated meal preparation.

Dec 29 Blog

First success using Claude in Chrome browser agent

Simon successfully used the Claude in Chrome extension to solve a real problem - finding a Cloudflare configuration for CORS headers. Despite being skeptical of browser agents due to prompt injection risks, he found the 1m45s session very effective.

Dec 29 Blog

A new way to extract detailed transcripts from Claude Code

Simon Willison released `claude-code-transcripts`, a Python CLI tool that converts Claude Code conversation transcripts into shareable HTML pages. The tool provides better interfaces for understanding what Claude Code did than Claude Code's own interface.

Dec 30 Blog

uv-init-demos: Using GitHub Actions and Claude Code for automated project demos

Simon Willison created a GitHub repository demonstrating different options of the `uv init` command for setting up Python projects. The repository was generated using Claude and runs on a schedule via GitHub Actions to capture changes from future releases of uv.

Dec 30 Blog

Top 18 AI Coding Assistant Tools in 2025

Comprehensive overview and comparison of AI coding assistants available to developers in 2025. Covers major tools including Claude Code, Cursor, GitHub Copilot, and others, comparing their features, capabilities, and use cases.

Dec 30 Apidog

AI agents arrived in 2025 – here's what happened and challenges ahead

Comprehensive analysis of how AI agents moved from theory to infrastructure in 2025. Discusses the major developments in AI agent technology, real-world applications, and the challenges that lie ahead for 2026.

Dec 30 The Conversation

Best Free AI Coding Tools 2025: ChatGPT vs Gemini

Detailed comparison of free AI coding tools available in 2025, specifically comparing ChatGPT and Google Gemini for code generation tasks. Analyzes their strengths, weaknesses, and best use cases for developers looking for free AI coding assistance.

Dec 30 Zoer.ai

Simon Willison Releases claude-code-transcripts: Extract Detailed AI Coding Sessions

Simon Willison released claude-code-transcripts, a new Python CLI tool for converting Claude Code transcripts to detailed HTML pages. The tool provides a better interface for understanding what Claude Code has done than even Claude Code itself.

Dec 31 Blog

Top AI Coding Agents December 2025: Opus 4.5, Gemini 3.0 Pro, GPT 5.1 Reviewed

YouTube video from December 2025 reviews the top AI coding agents, covering Opus 4.5, Gemini 3.0 Pro, and GPT 5.1. The review discusses Claude Code's issues and improvements, providing practical comparisons of leading AI coding assistants.

Dec 31 YouTube

Hacker News: AI is Forcing Us to Write Good Code

Hacker News discussion sparked by observation that AI coding assistants are fundamentally changing how developers approach code quality. The conversation explores how LLMs expose bad code practices more quickly and force developers to be more explicit about requirements and design.

Dec 31 Hacker News

Hacker News: Rich Hickey on 'Thanks AI' - Reflections on LLMs in Programming

Hacker News discussion about Rich Hickey's (creator of Clojure) perspective on AI in programming. The conversation touches on deeper philosophical questions about AI's role in software development.

Dec 31 Hacker News

Hacker News: The 70% AI Productivity Myth - Why Most Companies Aren't Seeing Gains

Hacker News discussion examines 'hard truths about AI-assisted coding' and questions commonly cited productivity metrics. The conversation explores why many companies aren't achieving promised 70% productivity gains from AI coding tools.

Dec 31 Hacker News

Hacker News: 2025 Was a Disaster for Windows 11 - AI Mandates Backfire

Hacker News discussion from December 31, 2025 describes 2025 as a disaster for Windows 11. Comments mention Microsoft mandating AI usage and having 90% of code generated by AI. The discussion highlights backlash against forced AI adoption and quality concerns.

Dec 31 Hacker News

Reddit Discussion: Karpathy's AI Coding Thread 'Hits Different' - Developer Disorientation

Reddit r/programming discussion about Andrej Karpathy's thread on AI coding tools causing developers to feel 'disoriented.' The conversation captures the profound psychological and professional impact of AI coding tools on experienced programmers.

Dec 31 Reddit

Reddit: Are You Afraid of AI Making You Unemployable? - Rob Pike's GenAI Critique

Reddit r/artificial discussion explores fears about AI-driven unemployment, featuring Rob Pike's strong critique of generative AI. The conversation captures growing anxiety about AI's impact on programming careers.

Dec 31 Reddit