Weekly Report
Dec 25, 2025 – Dec 31, 2025
A curated summary of the most important updates in AI from the last 7 days.
New Products
Anthropic Agent Skills
A new open standard protocol released on December 24, 2025 for AI interoperability. Agent Skills provides a universal language for AI agents to communicate and collaborate across different platforms and tools.
MiniMax M2.1
Chinese AI startup MiniMax's latest model with significantly enhanced performance for real-world applications and multi-language programming versatility, positioning it as a competitive option for AI-powered coding tasks.
GLM-4.7
Zhipu AI's latest flagship model specifically designed for 'Agentic Coding' scenarios, achieving 73.8% on SWE-bench benchmark with enhanced programming abilities, better long-term task planning, improved tool collaboration, and more stable multi-step reasoning.
Zenflow
A free AI orchestration desktop platform that brings engineering discipline to AI coding. Analyzes developer tasks, plans work, assigns steps to specialized coding agents, and uses multi-agent verification with specification-driven workflows.
GitHub Spark
An AI-powered prompt-based app builder that lets anyone build micro-apps without writing code. Users describe their idea in natural language and Spark transforms it into a working application.
Vibe Pocket
A cloud-based platform that enables running AI coding agents like Claude Code, Codex, and OpenCode on mobile devices or web. Developers connect GitHub, pick an agent, and start building from any device, transforming mobile phones into coding studios.
Toad
A unified terminal UI for AI coding agents that supports 12+ agent CLIs including OpenHands, Claude Code, and Gemini CLI through the ACP protocol. Provides a refined terminal experience with fuzzy file search, Markdown streaming, shell integration, and notebook-style persistent conversations.
Google Agent Designer (Vertex AI)
A low-code visual designer launched in Google Cloud Vertex AI that allows users to design, test, and orchestrate AI agents and subagents through a visual interface. Available in Preview, it enables enterprise development teams to build AI agents without extensive coding.
Cursor acquires Graphite
Cursor, an AI-powered code editor, has acquired Graphite, a code review platform. Graphite specializes in AI-powered code review workflow that helps teams review changes more efficiently by breaking large changes into smaller, connected pieces and includes an AI helper that can fix broken tests automatically.
GPT-5.2-Codex
OpenAI's agentic coding model designed for professional software engineering and defensive cybersecurity. Built on the GPT-5.2 architecture, it represents the biggest leap in agentic coding since GPT-5, offering improved instruction adherence, tool grounding for multi-step execution, and stronger long-context performance for complex coding workflows.
Agent Skills
Anthropic's open standard for AI agent capabilities that enables cross-platform portability of agent skills. Skills are special sets of instructions that teach AI agents how to handle specific work tasks, allowing them to be portable across different AI platforms including OpenAI's ChatGPT and Codex.
GitLab 18.7
GitLab release with enhanced AI capabilities including improved Duo analytics dashboard, AI-powered model chat selection, Duo Data Analyst beta, AI-powered SAST false positive detection, and new building blocks for the upcoming Duo Agent Platform GA. Focuses on advancing AI automation, governance, and developer experience.
Google Agent Development Kit for TypeScript
An open-source, modular framework that brings a code-first approach to building AI agents with TypeScript and JavaScript. Enables developers to build autonomous multi-agent AI systems using familiar software engineering practices, optimized for Gemini and the Google ecosystem.
Wafer
GPU development stack that lives inside your IDE, bringing together profiling, compiler explorer, and GPU documentation in one place. Eliminates context switching when writing GPU kernels by integrating fragmented tools directly into the editor workflow.
GitHub Copilot Agent Skills
A new feature for GitHub Copilot that allows developers to create custom agent capabilities and skills. Agent Skills enables more customized and powerful agent-based workflows within the Copilot ecosystem, integrating with VS Code and supporting multi-agent orchestration for AI-assisted development tasks.
cto bench
A ground truth code agent benchmark that measures AI coding agent performance. The benchmark tracks merged code as a percentage of completed tasks and reports a 4-day rolling success rate with a 1-day lag for task resolution.
Netlify AI Gateway
A developer tool that simplifies using AI inference in code by eliminating the need to manage API keys or create accounts with AI providers. Provides access to 30+ AI models with unified monitoring of usage and costs.
New Features
Cursor Visual Editor Launch
Cursor launched a Visual Editor feature that allows developers to modify UI directly within the editor without leaving, eliminating the need for Figma-to-code workflows. The feature enables drag-and-drop interface editing and seamless AI integration.
Codex CLI 0.76.0 Released
OpenAI released Codex CLI version 0.76.0 with DMG support for macOS, skills improvements including 'always-on' and enterprise-friendly features, short descriptions for better discoverability, and admin scope configuration.
GPT-5.2-Codex Launch
OpenAI announced GPT-5.2, described as 'smarter and more useful,' alongside the specialized GPT-5.2-Codex model for agentic coding tasks.
Gemini 3 Flash Preview and CLI Improvements
Google launched Gemini 3 Flash Preview delivering 'fast frontier-class' performance. The CLI received documentation updates and Code Assist telemetry improvements for tracking user accept/reject behavior on suggestions.
Claude Code December 2025 Performance and Features
Claude Code released version 2.0.73 with CLI changes and new features including LSP (Language Server Protocol) tool support for code intelligence features like go-to-definition, find references, and hover documentation. Added multi-terminal setup support, syntax highlighting toggle, and current usage tracking.
Windsurf GPT 5.2 Model Availability
Windsurf added GPT 5.2 model support to its AI coding assistant, expanding the available model options for users.
Cursor CEO warns about 'vibe coding' building shaky foundations
Cursor CEO Michael Truell (age 25) warned that over-reliance on AI-generated 'vibe coding' creates unstable foundations for software projects, noting that while it works for simple tasks, 'eventually things start to crumble' as complexity increases.
Claude Code Christmas usage limit doubling
Holiday promotion offering 2x usage limits for Claude Code users from December 25-31, 2025.
Claude Code Browser Control Feature with Chrome Integration
Claude Code launched a major new capability allowing full browser automation through Chrome integration. The AI can now navigate pages, click buttons, fill forms, read console logs, monitor network requests, and record GIFs of browser interactions - all while keeping the browser visible (not headless).
Cursor 2.2 with Debug Mode and Visual Editor
Major release introducing Debug Mode with runtime thinking capabilities that let AI see actual program state during execution, plus a Visual Editor for natural language-based web design with point-and-click editing. Also includes Multi-Agent Judging and Plan Mode improvements.
GitHub Copilot Agent Skills Support
GitHub Copilot now supports Agent Skills, a new feature allowing developers to create specialized, repeatable tasks through folders containing instructions, scripts, and resources. Skills load dynamically for better performance and reduced context consumption.
GitHub Copilot Free Enterprise for CNCF Maintainers
GitHub announced free GitHub Copilot Enterprise access for CNCF (Cloud Native Computing Foundation) project maintainers as part of investment in open source ecosystem support.
Codex CLI GPT-5.2-Codex Introduction
OpenAI introduced GPT-5.2-Codex, a version of GPT-5.2 further optimized for agentic coding in Codex CLI with improvements for long-horizon work, enhanced long-context understanding, improved tool calling reliability, and better factuality.
Gemini CLI Version v0.22.0 with Gemini 3 Free Tier
Gemini CLI released v0.22.0 with major features including Gemini 3 Flash availability for free tier users, experimental permission improvements with new policy engine, model in history toggle, hook improvements, simplified tool confirmation, and session auto-save.
Windsurf Wave 13 with Multi-Agent Sessions and GPT-5.2 Integration
Windsurf released Wave 13 featuring first-class support for parallel, multi-agent sessions, GPT-5.2 integration (announced December 11), and various bug fixes. Updates throughout December included performance improvements, PowerShell fixes, and general stability enhancements.
Cursor 2.3: Layout Customization and Stability Improvements
Cursor released version 2.3 on December 22, 2025, featuring layout customization options for workspaces with four default layouts. The update focuses on addressing user concerns about UI stability following the major 2.2 release, with bug fixes for disappearing chat history and improved auto-run behavior.
Windsurf GPT-5.2 Now Available
Windsurf announced GPT-5.2 is now available, described as 'the biggest' model update. The model is available for 0x credits (free for paid users) for a limited time, bringing enhanced capabilities to the AI-native IDE.
GitHub Copilot Memory Feature Public Preview
GitHub Copilot Memory feature is now in public preview for Pro and Pro+ subscribers, allowing Copilot to learn from your codebase and remember context across sessions for improved project understanding over time.
Cursor 2.2: Debug Mode and Visual Editor
Cursor released version 2.2 with Debug Mode that automatically instruments apps with runtime logs to identify and fix bugs, a Visual Editor for the Cursor Browser that unifies design and code work, multi-agent judging for code quality evaluation, and improved Plan Mode.
Claude Code Async Background Agents
Claude Code now supports asynchronous background agents that can run tasks in parallel while you continue working. Agents can be sent to background with Ctrl+B and automatically hook back in when complete, enabling parallel AI development workflows.
Claude Code LSP Support and VS Code Integration
Claude Code released native LSP (Language Server Protocol) support for better code intelligence and language understanding, along with improved VS Code extension integration including multi-terminal setup support and syntax highlighting toggle.
Continue CLI Beta v1.5.29
Continue released multiple CLI beta versions in late December 2025, including v1.5.29-beta.20251229, bringing ongoing improvements and bug fixes to the AI coding assistant.
Aider v0.86.0 - Expanded GPT-5 Support
Aider released v0.86.0 with expanded GPT-5 model support across family variants and providers (OpenAI, Azure, OpenRouter), including dated and chat/mini/nano variants. Aider wrote 88% of the code in this release.
Devin December 2025 Agent Upgrade
Devin released a major update on December 19, 2025, upgrading all enterprise customers to the newest version of Devin with enhanced capabilities. The v3 API also received updates with enterprise and organization-level Notes and Playbooks routers.
Cursor Layout Customization and Stability Improvements - Holiday Release
Focused entirely on fixing bugs and improving stability across the core agent, layout controls, and viewing code diffs. Introduced layout customization with four default layouts accessible via Cmd+Option+Tab. Features slow rollout over the week to prevent regressions during holiday coding.
Cursor Enterprise Features: Conversation Insights, Billing Groups, Service Accounts
Added conversation insights to analyze code and context in each agent session. Introduced billing groups for fine-grained usage tracking and spend mapping. Service accounts for non-human accounts with API keys for secure automation. Linux sandboxing for agents with scoped workspace access.
Claude Code: Anthropic acquires Bun as Claude Code reaches $1B milestone
Anthropic announced its first acquisition - Bun, a high-performance JavaScript runtime - as Claude Code reaches $1 billion in run-rate revenue in just 6 months since launch.
Codex CLI: Command injection vulnerability fix
Security vulnerability (CVE) documented in December changelog: Codex CLI v0.23.0 from August 2025 prevents .env files from silently redirecting CODEX_HOME to address command injection vulnerability disclosed by Check Point Research.
Windsurf Version 1.13.104 released
December 24th release of Windsurf editor version 1.13.104 with performance improvements and bug fixes.
Windsurf Wave 13: Multi-agent sessions support
Wave 13 release (Merry Shipmas) introduces first-class support for parallel, multi-agent sessions in Windsurf Next, enabling multiple AI agents to work simultaneously.
Windsurf JetBrains plugin v2.10.7 with Cascade UI improvements
December 17-18 update for Windsurf JetBrains plugin featuring revamped Cascade bar UI, added keyboard shortcuts, improved file search performance, and fixes for diagnostics downloading during file renames.
Claude Code CLI 2.0.70 released with 13 improvements
Major CLI update including Enter key to accept and submit prompt suggestions immediately, Tab for editing acceptance, and 11 other improvements to the command-line interface.
Gemini CLI v0.22.0: Gemini 3 for free-tier users
Version 0.22.0 released making Gemini 3 available to free-tier users when preview features are enabled. Includes Code Assist backend telemetry for tracking user accept/reject of suggestions.
Claude Code Browser automation feature
New browser automation capabilities allowing Claude Code to navigate pages, click buttons, fill forms, read console logs, monitor network requests, and record GIFs of browser interactions.
Claude Code Slack integration and Skills enhancements
New Slack integration capability added to Claude Code, along with enhanced Skills feature for teaching Claude repeatable workflows and performance improvements including removal of Opus-specific usage caps.
Windsurf GPT-5.2 integration with 0x credits promotion
GPT-5.2 became available in Windsurf on December 11, 2025. Windsurf offered 0x credits (free access) for paid users for a limited time to test the new model.
Cursor CEO strategy: Product over models to compete with OpenAI/Anthropic
CEO Michael Truell explains Cursor's strategy focusing on product experience and UX over foundation models, believing Cursor can compete with AI labs through superior agent capabilities and developer experience.
GitHub Copilot December updates: Spaces, Visual Studio, and model options
Updates to Copilot Spaces for better grounding in curated project context, along with Visual Studio improvements and expanded model options.
GitHub Copilot: Claude Opus 4.5 available in multiple IDEs
Claude Opus 4.5 expanded availability to multiple IDEs including Visual Studio, JetBrains, Xcode, and Eclipse, building on the initial preview announcement.
New Technologies
Holistic Evaluation of State-of-the-Art LLMs for Code Generation
Comprehensive empirical evaluation of six state-of-the-art LLMs (DeepSeek-R1, GPT-4.1, Claude-3.7, DeepSeek-V3, Qwen2.5-Coder, Llama-3.3) for code generation using 944 LeetCode problems across five languages. Key findings: DeepSeek-R1 and GPT-4.1 consistently outperform others in correctness, efficiency, and robustness; Python3 and JavaScript have fewer compile/runtime errors; Algorithmic suboptimality is common, especially in Llama-3.3.
Semi-Supervised Learning for Large Language Models
ArXiv paper submission exploring semi-supervised learning approaches for training large language models. Addresses challenges in LLM training efficiency and effectiveness through hybrid learning methodologies.
Formal Verification of Neural Networks with Early Exits
ArXiv paper addressing the intersection of efficiency and safety in neural networks through formal verification techniques for models with early exit capabilities. Critical for AI safety in production systems.
RevFFN: Memory-Efficient Full-Parameter Fine-Tuning
ArXiv paper presenting RevFFN, a new approach for memory-efficient full-parameter fine-tuning of large language models. Addresses key challenge of memory constraints in LLM adaptation.
2025 LLM Year in Review: Reinforcement Learning from Verifiable Rewards
Andrej Karpathy's comprehensive review of 2025 LLM developments highlighting RLVR (Reinforcement Learning from Verifiable Rewards) as the major new paradigm. Covers 6 paradigm shifts: RLVR emergence, Ghosts vs Animals/Jagged Intelligence, Cursor as new LLM app layer, Claude Code as AI that lives on your computer, Vibe Coding as new programming paradigm.
MicroQuickJS for Safe Sandboxing
Fabrice Bellard's MicroQuickJS is a JavaScript engine for embedded systems (10 kB RAM, 100 kB ROM). Simon used Claude Code to investigate it as a safe sandboxing environment for untrusted code from LLMs. Found it very well-suited with robust memory/time limits, no dangerous primitives, and regex engine protecting against exhaustion attacks.
How AI coding agents work—and what to remember if you use them
Comprehensive technical deep-dive into how AI coding agents function under the hood. Explains the multi-agent architecture (supervising LLM that delegates to parallel LLMs with tools), context management challenges ('context rot' and diminishing returns), context compression techniques, and the orchestrator-worker pattern. Highlights that agents typically use 4x more tokens than chatbots, and multi-agent systems use 15x more.
AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts
Research paper analyzing security implications and ecosystem impacts of AI-generated code in real-world software projects. The study examines how AI coding tools are changing software development practices and identifies potential security vulnerabilities introduced by AI-generated code.
LLM Benchmark Wars of 2025: Claude Dominates Real-World Coding Tasks
Comprehensive analysis of LLM benchmarks in December 2025 reveals Claude dominates with Sonnet 4.5 and Opus 4.5 outperforming OpenAI's flagship models on real-world coding tasks. The report questions what benchmark numbers actually tell us about model capabilities, noting significant gaps between synthetic benchmarks and practical coding performance.
Gemini 3 Pro vs GPT-5.2 vs Claude Opus 4.5 vs Grok 4.1: December 2025 Showdown
Comprehensive model comparison as of December 13, 2025 evaluates top AI models across different use cases. Results show Claude Opus 4.5 Thinking 32k is best for building web apps. The comparison covers Gemini 3 Pro, GPT-5.2, Claude Opus 4.5, and Grok 4.1 across various dimensions including coding capability, reasoning depth, context window, and practical applications.
arXiv Paper: Exploring Vertical-Domain Reasoning Capabilities with LLMs
New arXiv paper extends general-purpose LLM research to vertical-domain accounting reasoning. The research analyzes relationships between LLM reasoning ability and domain-specific applications, exploring how well general LLMs can transfer reasoning capabilities to specialized domains like accounting.
arXiv Paper: Plan Reuse Mechanism for LLM-Driven Agents in AIoT Applications
New arXiv paper focuses on LLM-driven agents and AIoT applications, with specific mention of Xiaomi's Xiao Ai and OPPO's Xiaobu Assistant. The research introduces a plan reuse mechanism for improving efficiency of LLM-driven agents. Addresses practical deployment of AI agents in consumer electronics and IoT devices.
Others
What a Year for AI Coding
A developer with almost three decades of experience reflects on the dramatic evolution of AI coding in 2025. At the start of 2025, LLMs were 'very sketchy, creating as many problems as they solved.' By December 2025, the author is confident that 'the best coding LLMs are better than 95% of software developers out there.'
Your Job is to Deliver Code You Have Proven to Work
Simon Willison argues that AI-assisted developers must deliver proven code, not just generate it. The piece critiques junior engineers who submit giant untested PRs expecting code review to catch issues. Willison outlines two essential steps: Manual testing - actually seeing the code work; Automated testing - now easier with LLMs.
Browser Agent Success Story
First successful use of a browser agent (Claude in Chrome extension) to solve a real problem - finding CORS configuration in Cloudflare dashboard. Took 1m45s to find the exact solution. Despite concerns about prompt injection risks, this was a very positive experience.
Cooking with Claude - LLMs for Culinary Tasks
Simon Willison describes using LLMs for cooking, evolving from basic recipes to advanced tasks. Successfully had Claude vibe-code a custom application to help with timing for complicated meal preparation. Demonstrates practical LLM use beyond coding.
2025 LLM Year in Review
Andrej Karpathy's comprehensive review of paradigm shifts in LLMs during 2025, covering: RLVR (Reinforcement Learning from Verifiable Rewards) as a new training stage; Understanding LLMs as 'ghosts' not 'animals'; Cursor and the new layer of LLM applications; Claude Code as the first convincing LLM agent that runs locally; 'Vibe coding' - AI enabling programming through natural language.
Agent Skills Open Standard
Anthropic has turned their skills mechanism into an 'open standard' living in an independent agentskills/agentskills GitHub repository. The specification is tiny but deliberately under-specified. Adoption promoted by OpenCode, Cursor, Amp, Letta, goose, GitHub, and VS Code.
GPT-5.2-Codex Release
OpenAI released GPT-5.2-Codex, optimized for agentic coding with improvements on long-horizon work through context compaction, stronger performance on large code changes, improved Windows performance, and enhanced cybersecurity capabilities. Scores 64% on Terminal-Bench 2.0.
The State of AI Coding Report 2025
Hacker News discussion on the state of AI coding in 2025. An AI code review agent used by 2,000 companies from startups like PostHog, Brex, and Partiful to F500s and F10s shared insights about approximately a billion lines of code reviewed.
A Guide to Local Coding Models
Hacker News discussion comparing local coding models. Notes that OpenAI's Codex is charged significantly lower than Claude, with suggestions that $100-200/month is worth it for serious developers. Discussion covers the trade-offs between local and cloud-based coding models.
LLM Year in Review - Hacker News Discussion
Hacker News discussion on Andrej Karpathy's 2025 LLM Year in Review. Commenters highlight that Claude Code has catapulted their performance at least 5x, and with minimal cost for writing tests, they're able to achieve higher code quality.
Scaling LLMs to Larger Codebases
Comprehensive Hacker News discussion about LLM workflows for handling larger codebases as of December 2025. Key insights: Effective framework Research → Plan → Clear → Execute Plan → Review & Test; Sonnet/Opus and GPTCodex can now fire off subagents during exploration; Claude Code emerges as first convincing LLM Agent that runs locally.
AWS CEO: Replacing Junior Devs with AI is 'One of the Dumbest Ideas'
Matt Garman, AWS CEO, criticizes replacing junior developers with AI in a December 2025 interview. Key points: Juniors are often the most experienced with AI tools; They're the least expensive, so cost optimization arguments don't hold; Eliminating talent pipeline breaks the future of the organization.
The Most Important AI Stories This Week
AI Explained YouTube video covering major AI developments from December 22, 2025. Topics include Google Flash, Amazon's AI moves, OpenAI fundraising developments, and Bernie Sanders' proposed moratorium on data center construction.
The Download: AI Doomers and Tech Developments
MIT Technology Review's daily newsletter covering technology developments including AI news and analysis from December 19, 2025. Part of ongoing coverage of AI's impact on technology and society.
As 2025 ends, a failed AI prediction: 'LLM hallucinations...'
Reddit discussion reflecting on failed AI predictions at the end of 2025. The community notes how 'completely resigned to the fact that hallucinations are inherent to LLMs' they have become, contrary to earlier predictions that this problem would be solved by 2025.
A guide to local coding models
Hacker News discussion about local coding models for development. The community explores options for running LLMs locally for coding tasks, with one commenter stating 'If you aren't using coding models you aren't ahead of the curve.'
December 2025 Guide To Popular AI Coding Agents
Comprehensive Reddit guide to AI coding agents as of December 2025. The post notes 'there are now so many options it's hard to keep track of them all' and provides an overview of various AI coding agents and their capabilities.
uv-init-demos: Python project initialization options
Willison created a GitHub repository demonstrating all the different options of uv's init command using Claude to help generate the content. The project uses a GitHub Actions script to stay up-to-date with uv releases.
Cooking with Claude
Simon Willison describes using LLMs for cooking, progressing from basic recipes to advanced tasks. Most notably, he used Claude to 'vibe-code' a custom application for complex meal preparation timing.
MicroQuickJS for sandboxing LLM-generated code
Willison explores Fabrice Bellard's MicroQuickJS (JavaScript engine for embedded systems) as a sandboxing environment for running untrusted code from LLMs. He used Claude Code to investigate building Python bindings, testing it as a sandbox, and compiling to WebAssembly.
First success using browser agent (Claude in Chrome) to solve a Cloudflare CORS problem
Willison reports using Claude in Chrome extension to solve a real problem: figuring out how Cloudflare Transform Rules were configured for open CORS policy. The agent found the solution in 1m45s by navigating the Cloudflare dashboard.
Hacker News: By what percentage has AI changed your output?
Ask HN discussion requesting developers to quantify productivity changes compared to ~2 years ago before AI coding tools became prevalent. Community discussion on real-world impact of AI coding assistants.
Top Stories of 2025: Agents Write Code Faster, Cheaper
DeepLearning.ai's year-end review highlights that coding agents using the latest LLMs now routinely complete more than 80% of SWE-Bench tasks, up from 13.86% in 2024. Claude Code, Google Gemini CLI, and OpenAI Codex emerged as competitive battlegrounds.
Top Stories of 2025: Thinking Models Solve Bigger Problems
DeepLearning.ai reports that reasoning models became standard in 2025, with OpenAI o1 preview outperforming GPT-4o by 43 percentage points on AIME 2024 math problems. Reasoning models achieved 62nd percentile on Codeforces vs 11th for non-reasoning models.
uv-init-demos: Exploring uv init options
Simon created a GitHub repository demonstrating all the different options of uv's init command, generated using a Claude Code prompt and scheduled to run via GitHub Actions to capture changes from future releases.
MicroQuickJS as a robust sandboxing environment for LLM-generated code
Simon Willison explores using MicroQuickJS as a sandboxing solution for executing untrusted code from LLMs. The engine has robust memory and time limits baked in, doesn't expose dangerous primitives, and has a regex engine that protects against exhaustion attacks.
Cooking with Claude: vibe-coding custom applications
Simon describes using LLMs for cooking tasks, progressing from basic recipes to advanced applications. He had Claude vibe-code a custom application to help with timing for complicated meal preparation.
First success using Claude in Chrome browser agent
Simon successfully used the Claude in Chrome extension to solve a real problem - finding a Cloudflare configuration for CORS headers. Despite being skeptical of browser agents due to prompt injection risks, he found the 1m45s session very effective.
A new way to extract detailed transcripts from Claude Code
Simon Willison released `claude-code-transcripts`, a Python CLI tool that converts Claude Code conversation transcripts into shareable HTML pages. The tool provides better interfaces for understanding what Claude Code did than Claude Code's own interface.
uv-init-demos: Using GitHub Actions and Claude Code for automated project demos
Simon Willison created a GitHub repository demonstrating different options of the `uv init` command for setting up Python projects. The repository was generated using Claude and runs on a schedule via GitHub Actions to capture changes from future releases of uv.
Top 18 AI Coding Assistant Tools in 2025
Comprehensive overview and comparison of AI coding assistants available to developers in 2025. Covers major tools including Claude Code, Cursor, GitHub Copilot, and others, comparing their features, capabilities, and use cases.
AI agents arrived in 2025 – here's what happened and challenges ahead
Comprehensive analysis of how AI agents moved from theory to infrastructure in 2025. Discusses the major developments in AI agent technology, real-world applications, and the challenges that lie ahead for 2026.
Best Free AI Coding Tools 2025: ChatGPT vs Gemini
Detailed comparison of free AI coding tools available in 2025, specifically comparing ChatGPT and Google Gemini for code generation tasks. Analyzes their strengths, weaknesses, and best use cases for developers looking for free AI coding assistance.
Simon Willison Releases claude-code-transcripts: Extract Detailed AI Coding Sessions
Simon Willison released claude-code-transcripts, a new Python CLI tool for converting Claude Code transcripts to detailed HTML pages. The tool provides a better interface for understanding what Claude Code has done than even Claude Code itself.
Top AI Coding Agents December 2025: Opus 4.5, Gemini 3.0 Pro, GPT 5.1 Reviewed
YouTube video from December 2025 reviews the top AI coding agents, covering Opus 4.5, Gemini 3.0 Pro, and GPT 5.1. The review discusses Claude Code's issues and improvements, providing practical comparisons of leading AI coding assistants.
Hacker News: AI is Forcing Us to Write Good Code
Hacker News discussion sparked by observation that AI coding assistants are fundamentally changing how developers approach code quality. The conversation explores how LLMs expose bad code practices more quickly and force developers to be more explicit about requirements and design.
Hacker News: Rich Hickey on 'Thanks AI' - Reflections on LLMs in Programming
Hacker News discussion about Rich Hickey's (creator of Clojure) perspective on AI in programming. The conversation touches on deeper philosophical questions about AI's role in software development.
Hacker News: The 70% AI Productivity Myth - Why Most Companies Aren't Seeing Gains
Hacker News discussion examines 'hard truths about AI-assisted coding' and questions commonly cited productivity metrics. The conversation explores why many companies aren't achieving promised 70% productivity gains from AI coding tools.
Hacker News: 2025 Was a Disaster for Windows 11 - AI Mandates Backfire
Hacker News discussion from December 31, 2025 describes 2025 as a disaster for Windows 11. Comments mention Microsoft mandating AI usage and having 90% of code generated by AI. The discussion highlights backlash against forced AI adoption and quality concerns.
Reddit Discussion: Karpathy's AI Coding Thread 'Hits Different' - Developer Disorientation
Reddit r/programming discussion about Andrej Karpathy's thread on AI coding tools causing developers to feel 'disoriented.' The conversation captures the profound psychological and professional impact of AI coding tools on experienced programmers.
Reddit: Are You Afraid of AI Making You Unemployable? - Rob Pike's GenAI Critique
Reddit r/artificial discussion explores fears about AI-driven unemployment, featuring Rob Pike's strong critique of generative AI. The conversation captures growing anxiety about AI's impact on programming careers.