Weekly Report
Apr 7, 2026 – Apr 13, 2026
A curated summary of the most important updates in AI from the last 7 days.
New Products
Version 0.120.0: Realtime V2 Background Agent Streaming
Codex CLI v0.120.0 (April 11, 2026) introduces Realtime V2 background agent streaming, custom TUI status lines, and code-mode tool declarations with MCP outputSchema support.
Copilot SDK Public Preview
GitHub Copilot SDK entered public preview, allowing developers to build custom integrations and extend Copilot capabilities.
scan-for-secrets: Tool to scan files before sharing
Simon Willison released scan-for-secrets 0.1, a Python tool to scan files for secrets before publishing. The tool scans for literal secrets and common encodings (backslash, JSON escaping). Built using README-driven-development with Claude Code, the tool provides CLI usage and Python API functions. Subsequent releases added redaction functionality, streaming results, directory scanning, and file-specific scanning capabilities.
Syntaqlite Playground: SQLite validation in browser
Simon Willison created a web-based playground for syntaqlite, Lalit Maganti's SQLite linting and verification tool. The playground provides UI for formatting, parsing into AST, validating, and tokenizing SQLite queries directly in the browser using WebAssembly/Pyodide. The tool includes example buttons showing table typos, column typos, and valid queries with diagnostic feedback.
Cleanup Claude Code Paste tool
Simon Willison created a niche web tool to clean up prompts copied from the Claude Code terminal app. The tool removes the '❯' prompt, fixes wrapped-line whitespace, and joins lines into clean text with a copy-to-clipboard button. Solves the problem of weird additional whitespace when copying from the Claude Code terminal.
datasette-ports: Find all running Datasette instances
Simon Willison released datasette-ports, a tool to solve the problem of losing track of multiple Datasette instances running across different terminals. Running 'datasette ports' lists all running instances with their URLs, versions, databases, and plugins. The tool uses README-driven development and can be installed either as a Datasette plugin or run standalone with uvx.
GitHub platform activity surging: 275 million commits per week
Kyle Daigle, GitHub COO, reports massive platform growth. There were 1 billion commits in 2025. Now it's 275 million commits per week, on pace for 14 billion in 2026 if growth remains linear. GitHub Actions grew from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes in a single week.
Vulnerability research transformed by AI coding agents
Thomas Ptacek analyzes how coding agents are fundamentally changing vulnerability research and exploit development. He predicts that within months, agents will find zero-days by pointing at source trees. LLMs are uniquely suited for this because they encode vast knowledge of source code correlations, know all documented bug classes, and excel at pattern-matching and constraint-solving. Exploit research provides the perfect problem for LLMs: baked-in knowledge, pattern matching, and brute force with testable success/failure outcomes.
AI-generated security reports: From slop tsunami to real vulnerability research
Linux kernel maintainer Greg Kroah-Hartman and other security experts report a dramatic shift in AI-generated security reports. What started as 'AI slop' has transformed into genuinely useful vulnerability research. The Linux kernel security list went from 2-3 reports per week to 5-10 per day, with most being correct. Daniel Stenberg (curl) notes spending hours daily reviewing AI-generated reports. Willy Tarreau (HAProxy) observes they now see duplicate reports where different people using AI tools find the same bug.
Highlights from agentic engineering conversation on Lenny's Podcast
Simon Willison was a guest on Lenny Rachitsky's podcast episode titled 'An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines.' The conversation covered agentic engineering, coding agents, and the state of AI in 2026.
Gas Town v1.0.0
Multi-agent workspace manager for coordinating multiple AI coding agents (Claude Code, Copilot, Codex, Gemini). Enables orchestrating 20-30+ AI agents in parallel with persistent work tracking and multi-agent coordination.
SiteGround Coderick AI
Web-based AI application and website builder described as a 'vibe coding' tool. Allows users to build custom websites and web applications through AI prompts without writing code, includes hosting and SSL.
Copilot SDK Public Preview
GitHub launched Copilot SDK in public preview, enabling developers to embed Copilot agents and workflows into custom applications. The SDK provides programmatic access to Copilot's agentic capabilities for building custom integrations.
Hacker News: My LLM coding workflow going into 2026
Discussion about practical LLM coding workflows, debate about whether AI coding tools actually provide speedups, and conversations about the division of work between developers and AI.
So I tried using Claude Code to build actual software
A data engineer shares their experience with Claude Code as a 'game changer' for building pipelines, dashboards, and analytics scripts. Discusses practical usage patterns and productivity improvements in real-world development scenarios.
Agentic AI: A Simple Definition
Clear explanation of agentic AI as 'an LLM call put in a loop with a bunch of tools which enable it to do stuff in its environment.' Provides straightforward definition and context for understanding agentic AI concepts.
I Tried Agentic Coding and I Hate It
Critical perspective on agentic coding workflows, offering a skeptical view of current agentic AI approaches for development. Provides counterpoint to enthusiasm around agentic coding tools.
Best Agentic AI Coding Tools in 2026: Compared
Comparative analysis of top agentic AI coding tools including Cursor, Windsurf, Copilot, Claude Code, and others. Discusses how these tools handle autonomous coding and provides guidance on choosing the right tool for different use cases.
Karpathy Says Developers Have 'AI Psychosis'
Andrej Karpathy discusses developers experiencing 'AI Psychosis' - concerns about losing coding ability due to AI programming tools. Notable quote: 'I started to lose my ability to code'
The 'Slopacolypse' Prediction for 2026
Karpathy predicts 2026 will be the 'Slopacolypse', expressing concern that AI is writing most of his code and mentioning atrophying his ability to write code manually
Hacker News: What are your predictions for 2026?
HN discussion includes predictions about AI bubbles potentially popping, debate about whether LLM-coding will be worth it after accounting for downsides, and mentions that LLMs for generating photos and videos are still evolving.
Bluesky users are mastering the fine art of blaming everything on 'vibe coding'
Use of AI coding tools has become a convenient boogeyman for any tech issues, with users on Bluesky attributing various problems to 'vibe coding'.
Mustafa Suleyman: AI development won't hit a wall anytime soon...
Microsoft AI CEO discusses the continued growth of AI development and the compute explosion in a conversation about the future of artificial intelligence.
AI Is Rewiring Coders' Brains. Yours May Be Next
The CEO of GitHub says half of all code produced by users of the Copilot programming helper is now AI-generated, examining the impact on developers' cognitive processes and workflows.
Hacker News: Ask HN - What developer tool do you wish existed in 2026?
Wishlist for AI developer tools, mentions LLM tools for CI pipelines that could propose blocking tests, and ideas for improving test selection and automation.
Simon Willison: Eight Years of Wanting, Three Months of Building with AI
Lalit Maganti's deep dive into building syntaqlite (SQLite devtools) using AI after procrastinating for 8 years. Key insight: AI made them procrastinate on design decisions because refactoring felt cheap, but deferring decisions corroded clear thinking. AI weakness is in design and architecture.
Deterministic Code Generation for LLM-Based Workflow Automation
Presents a compiled AI paradigm where LLMs generate executable code artifacts during compilation, focusing on deterministic workflow automation.
ZooClaw
A proactive team of AI specialists in one place. Acts as a single entry point to multiple AI agents with structured domain expertise, automatically routing tasks to the right agent with natural language input.
Viktor
An AI coworker that lives in Slack and automates workflows by observing team behavior. Proactively suggests automations and has context from tools and conversations, running autonomously without manual setup.
dbg
A universal CLI debugger interface that gives AI agents direct visibility into runtime state across multiple debugging protocols. Enables agents to inspect variables, set breakpoints, and analyze execution flow instead of guessing from source code.
SenWeaverCoding
A Rust-first autonomous AI agent runtime and CLI code editor built on SenAgentOS. Applies Harness Engineering to code engineering with autonomous exploration, refactoring, testing, and debugging capabilities.
Claude Code Voice Mode
Voice mode for Claude Code that allows developers to speak their prompts instead of typing them. Reached #1 Product of the Day on April 2, 2026.
CoreCoder
Minimal AI coding agent (~950 LoC Python) inspired by Claude Code. Works with any LLM and provides a clean, readable implementation of the core coding agent architecture. Think NanoGPT for coding agents.
nanocode
A lightweight AI coding assistant built in Python (~1.9k lines). Provides a minimal but functional implementation of an AI coding assistant with tool use and agentic workflows.
Baton
A desktop app for developing with AI coding agents. Run multiple agents in parallel, each in their own git-isolated workspace. Provides PR-ready code review capabilities with parallel agent execution.
ToFu
Self-hosted AI assistant with tool use, multi-agent orchestration, coding copilot and a lightweight Flask + vanilla JS stack. Provides a complete self-hosted solution for teams wanting to control their AI infrastructure.
Bugbot Learned Rules and MCP Support
Bugbot can now learn from feedback on pull requests and turn those signals into learned rules, added MCP support for additional context during code reviews, and introduced new Cursor 3 interface with Agents Window.
Cursor 3.0 Launch - New Interface
Major interface overhaul with multi-repo layout, seamless agent handoff, agent switching options, faster and cleaner performance. Includes Design Mode for browser UI element annotation.
Hacker News: LLM coding workflow going into 2026
A developer's experience working on multi-step AI pipelines for 3D mesh generation, noting that LLMs often skip edge cases. The discussion explores the practical challenges of using LLMs for complex coding tasks and the importance of human oversight in catching edge cases that AI might miss.
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
New approach to improving general reasoning capabilities in LLMs using reinforcement learning on natural instructions. The 23-page paper with 4 figures presents a method for enhancing LLM reasoning without extensive supervised training. This could improve AI coding assistants' ability to reason about complex programming problems.
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
New benchmark for evaluating mobile AI agents with focus on interactivity, proactivity, and personalization. The benchmark addresses the unique challenges of agent behavior on mobile devices. This is relevant for evaluating AI coding assistants that run on mobile platforms or need to work across different devices.
Fireship: The unhinged world of tech in 2026
Fireship's analysis of technology trends in 2026, covering AI, coding, and developer tools. The video with 1.4M views discusses the rapid evolution of the tech landscape and emerging patterns in software development.
Hacker News: How would you learn to code in 2026?
Discussion about project-based learning approaches for coding in the current era. The community explores how learning to code has changed with AI assistance, shifting from memorizing syntax to verifying AI-generated work and understanding system architecture.
Google Gemma 4
Google's fourth-generation open-source LLM series featuring Mixture of Experts (MoE) architecture. Available in multiple sizes (2B, 4B, 26B, 31B) under Apache 2.0 license, optimized specifically for agent workflows and advanced reasoning tasks.
Claude Code Voice Mode
Voice interaction mode for Claude Code CLI, enabling developers to code using voice commands and receive spoken responses directly in the terminal environment. Represents the growing trend of voice interfaces in developer tools.
Jupid
File your taxes with Claude Code. A specialized AI agent that leverages Claude Code's capabilities to automate tax preparation workflows, demonstrating the expanding use cases for AI coding assistants.
Offsite
Build teams of humans and agents, watch them work. A workflow orchestration platform for creating hybrid teams of humans and AI agents that can collaborate on complex tasks.
Buddi
Your Claude Code companion, living in the notch. A macOS utility that provides quick access to Claude Code functionality directly from the menu bar, integrating with the Claude Code CLI workflow.
Codentis
Run intelligent workflows directly in your terminal. An AI-powered CLI tool that helps developers automate complex terminal workflows and command sequences using natural language commands.
traceAI
Open-source LLM tracing tool that speaks GenAI, not HTTP. Provides specialized tracing for generative AI workloads rather than traditional HTTP/web tracing methods. Reached #3 on Product Hunt daily leaderboard.
ZooClaw
Your proactive team of AI specialists in one place. An AI agent orchestration tool designed to coordinate multiple AI specialists and automate team workflows. Ranked #6 on Product Hunt in April 2026.
Chronicle 2.0
AI-powered memory and context management system for long-running coding projects. Maintains project context across sessions, enabling agents to remember decisions, code patterns, and project history.
April 4, 2026 OAuth Token Policy Change
Anthropic changed policy on April 4, 2026, making Claude Pro and Max subscription OAuth tokens no longer work in third-party tools. This restricts usage of paid Claude subscriptions to official channels only, affecting tools like OpenClaw, OpenCode, and Crush.
Multi-Agent Code Review Tool Launch
Anthropic launched a dedicated multi-agent code review system for Claude Code to address the surge in pull requests driven by AI coding tools, enhancing collaborative code review capabilities.
ArXiv: 'Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents'
Introduces an approach that uses inter-rollout action agreement as a signal for adaptive compute allocation in LLM agents. The method helps balance reasoning depth with computational efficiency, preventing overthinking while maintaining performance.
ArXiv: 'IndustryCode: A Benchmark for Industry Code Generation'
IndustryCode introduces a benchmark specifically designed for evaluating code generation in industry settings. The work addresses the gap between academic code generation benchmarks and real-world industrial requirements.
Hacker News: 'Eight years of wanting, three months of building with AI' - Extended Discussion
Extended Hacker News discussion (757 points) on Simon Willison's post about AI-assisted development. Commenters debate: (1) Code quality relevance in AI era - whether it matters more or less, (2) 'Vibe coding' productivity vs technical debt, (3) AI as accelerator vs engineer replacement, (4) Democratization of software development, (5) Comparison to historical technology shifts (printing press, internet), (6) Testing challenges with AI-generated code, (7) Future of software engineering profession. Strong consensus that AI is a tool requiring human oversight, with diverse opinions on long-term implications.
Amazon OpenSearch Agentic AI
Agentic features for OpenSearch Service including Investigation Agent and Agentic Memory, enabling developers to automate observability with automated PPL query generation and cross-index root-cause analysis.
App Store sees 84% surge in new apps as AI coding tools take off
Discussion about the massive growth in new app development driven by AI coding tools, with 235,800 new apps in Q1 2026. Explores the impact of 'Vibe Coding' boom on app development ecosystem.
The Future of AI Software Development
Discussion about whether LLMs will be cheaper than human developers once token subsidies are removed, exploring the true cost considerations and economic viability of AI-powered development.
GitHub platform activity surging
GitHub had 1 billion commits in 2025, now 275 million per week on pace for 14 billion in 2026 if linear. GitHub Actions grew from 500M minutes/week in 2023 to 1B in 2025, now 2.1B minutes per week.
How I write software with LLMs
Practical guide using a hierarchy of AI agents with different personas (architect, business analyst, security expert) for comprehensive AI-assisted development workflows.
I'm a junior developer, and to be honest, in 2026 AI is...
A developer's perspective on using AI tools to generate code, fix bugs, and refactor logic. Discusses AI writing 'cleaner' code and the impact on development workflows from a junior developer's experience.
Cursor 3.0 - New Interface
Cursor 3 introduces a completely new interface centered around agents, allowing parallel agent execution across repos and environments (local, worktrees, cloud, SSH). Features include a new Agents Window, Design Mode for browser UI annotation, Agent Tabs for multiple simultaneous chats, and significant performance improvements including faster large-file diff rendering.
datasette-ports 0.2 Released
Release of datasette-ports 0.2 - a tool to find all currently running Datasette instances and list their ports. No longer requires Datasette - running 'uvx datasette-ports' now works standalone. Installing as a Datasette plugin continues to provide the 'datasette ports' command.
OpenAI Alums Launch $100M Investment Fund
OpenAI alumni have been quietly investing from a new potentially $100M fund, showing continued financial activity in the AI space from former OpenAI personnel.
Iran Threatens 'Stargate' AI Data Centers
Geopolitical tensions involving AI infrastructure as Iran threatens 'Stargate' AI data centers, highlighting the strategic importance of AI computing facilities.
Eight Years of Wanting, Three Months of Building with AI - SyntaQLite
Lalit Maganti's long-form piece on agentic engineering: spent 8 years thinking about and 3 months building syntaqlite (high-fidelity devtools for SQLite with parser, formatter, and verifier). The key insight: AI excels at tedious work like 400+ grammar rules. Claude Code helped build the first prototype, but they eventually threw it away and started from scratch - AI made them procrastinate on key design decisions because refactoring felt cheap. Important lesson about AI-assisted development: great for low-level details and prototyping, but can lead to deferred architectural decisions that corrode clear thinking.
Japan Proving Experimental Physical AI Ready for Real World
In Japan, robots aren't coming for jobs - they're filling jobs nobody wants. Shows practical deployment of physical AI in real-world scenarios.
Copilot 'For Entertainment Purposes Only' Per Microsoft Terms
Microsoft's terms of service indicate Copilot is 'for entertainment purposes only', raising questions about liability and production use guarantees for AI coding assistants.
GitHub Activity Surging: 275 Million Commits Per Week
GitHub platform activity is accelerating dramatically. There were 1 billion commits in 2025. Now it's 275 million commits per week, on pace for 14 billion this year if growth remains linear. GitHub Actions grew from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes so far this week (2026). Shows massive explosion in development activity, likely driven by AI coding tools.
Anthropic Says Claude Code Subscribers Need Extra Payment for OpenClaw
Anthropic announced that Claude Code subscribers will need to pay extra for OpenClaw usage, introducing new pricing tiers for advanced coding features.
OpenAI Executive Shuffle: New Roles for COO Brad Lightcap
OpenAI executive shuffle includes new role for COO Brad Lightcap to lead 'special projects', along with new roles for Fidji Simo and Kate Rouch.
Anthropic Buys Biotech Startup Coefficient Bio in $400M Deal
Anthropic acquires biotech startup Coefficient Bio in a $400M deal, expanding into AI applications for biotechnology research.
AI Companies Building Huge Natural Gas Plants for Data Centers
AI companies are constructing massive natural gas power plants to support energy-intensive data centers, raising environmental and infrastructure concerns.
Anthropic Ramps Up Political Activities with New PAC
Anthropic is increasing its political engagement by forming a new Political Action Committee, joining other AI companies in political lobbying efforts.
OpenAI Acquires TBPN Business Talk Show
OpenAI acquires TBPN, the buzzy founder-led business talk show, showing expansion into media content.
Highlights from Lenny's Podcast on Agentic Engineering
Simon Willison was a guest on Lenny Rachitsky's podcast discussing 'An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines'. Episode covers agentic engineering, coding agents, and the current state of AI development. Available on YouTube, Spotify, and Apple Podcasts.
datasette-llm 0.1a6 Released
Release of datasette-llm 0.1a6, an LLM integration plugin for Datasette that other plugins can depend on. Part of Simon's ongoing work building LLM tooling.
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
ArXiv paper exploring what agentic capabilities truly bring to multimodal intelligence systems. Investigates the unique advantages of agentic AI architectures in handling complex multimodal tasks.
New Features
Version 1.9577.43: Quota Billing System
Implemented quota-based billing system for more flexible usage-based pricing and resource management.
Background Agents in Slack Integration
Cursor now supports launching Background Agents directly from Slack by mentioning @Cursor, enabling AI-powered workflows within team communication channels.
Week 14 Updates: Computer Use in CLI, Interactive Lessons
Claude Code Week 14 (March 30 - April 3, 2026) introduces computer use capabilities to the CLI, allowing Claude to open native apps, click through UI, test its own changes, and fix issues from the terminal. Also includes interactive lessons (/powerup), flicker-free rendering, MCP result-size overrides (up to 500K characters), and plugin executables added to PATH.
Version 1.3.38-vscode: Config Fixes and Workspace Filtering
Continue v1.3.38-vscode includes config.yaml fixes, workspace directory filtering capabilities, and support for .continue/configs directory structure.
Version 0.56.0: Prompt Caching and Report Command
Aider v0.56.0 introduces prompt caching for Sonnet via OpenRouter for improved performance, new /report command for session summaries, and --chat-language switch for multi-language support.
Claude Code v2.1.101
Latest version of Claude Code with /team-onboarding command, OS CA certificate store trust, improved brief/focus modes, visual changes, and security fixes for Bash tool permissions.
April 2026 Updates: Enterprise Features, Auto-Fix Button, PR Resuming
Major April update including Enterprise-scoped secrets management, Devin Review Auto-Fix button for one-click bug fixes, PR Resuming for working on existing PRs across sessions, Streaming Terminals for real-time output, Light Mode (Beta), and improved session management with pinning.
v2.1.101: Team Onboarding, Enterprise TLS Proxy, Ultralplan Improvements
Claude Code v2.1.101 introduces /team-onboarding command for guided team setup, enterprise TLS proxy support, OS CA certificate store trust by default, improved brief/focus modes, and dozens of critical session, permission, and rendering fixes. Can now be used without manual web setup first.
Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk
An attacker known as TeamPCP compromised two versions of the AI API tool LiteLLM, potentially exposing AI industry secrets and causing Meta to pause work with Mercor.
Cursor 3
Unified workspace for parallel local/cloud agents and MCPs (Model Context Protocol). A major update to the Cursor AI IDE enhancing it with multi-agent capabilities and parallel execution.
Version 0.37.1 - Latest Stable Release
Dynamic sandbox expansion and worktree support for Linux and Windows, broad nightly update featuring UI polish, core fixes, and new tools.
Version 2.1.101 - Team Onboarding & Performance Improvements
Added /team-onboarding command for generating teammate ramp-up guides, OS CA certificate trust by default, interactive Google Vertex AI setup wizard, and numerous performance improvements including faster diff computation and reduced memory usage.
Main Branch - Claude 4.5/4.6 and GPT-5.3/5.4 Support
Added support for Claude 4.5/4.6 models and updated model aliases, expanded Gemini model support with 2.5 Flash and Flash-Lite, added Gemini 3 preview models, added DeepSeek Reasoner model, and added support for GPT-5.3/5.4 model variants across OpenAI, Azure, and OpenRouter.
Stitch 2.0 by Google
Google's updated AI development environment with enhanced agent capabilities and improved tool integration. Represents Google's continued investment in AI-first development experiences.
Ollama v0.19
Open-source tool for running large language models locally with enhanced features for AI development workflows. Enables developers to run and test AI models without API dependencies, featuring support for multiple open-source models.
Bugbot Learned Rules and MCP Support
Released April 8, 2026 with enhanced Bugbot capabilities including ability to self-improve in real time, MCP (Model Context Protocol) support, and improvements to Bugbot Autofix with its highest resolution rate yet. Bugbot can now learn from feedback on pull requests and convert those signals into learned rules.
Visual Studio Extensibility Improvements
March 2026 update brought major enhancements for Visual Studio users including custom agents, agent skills, and new tools for extensibility in the Visual Studio environment.
Claude Code v2.1.98 - Enhanced Security, MCP, and Vertex AI Integration
Major update featuring interactive Google Vertex AI setup wizard, Monitor tool for background script events, subprocess sandboxing with PID namespace isolation, and multiple critical security fixes including Bash tool permission bypass prevention. Also includes W3C TRACEPARENT support for OpenTelemetry tracing.
Aider v0.56.0 - Prompt Caching and Enhanced Output
Enabled prompt caching for Sonnet via OpenRouter and 8k output tokens for Sonnet via VertexAI and DeepSeek V2.5. Added new /report command to open browser with pre-populated GitHub Issue, new --chat-language switch for spoken language, and --suggest-shell-commands controls for shell command prompting. Aider wrote 56% of the code in this release.
New Technologies
Gemma 4: Byte for byte, the most capable open models
Google released four new vision-capable Apache 2.0 licensed reasoning LLMs sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts. The models feature unprecedented intelligence-per-parameter, with Per-Layer Embeddings (PLE) for parameter efficiency. All models natively process video, images, and audio (E2B and E4B models). Simon tested the GGUF versions in LM Studio, with 2B, 4B, and 26B-A4B working perfectly, but the 31B model had issues. The progression in quality from 2B to 26B-A4B is notable, with the 26B model generating excellent SVG output.
Research into LLM provider HTTP APIs for new abstraction layer
Simon Willison is working on a major change to his LLM Python library. To help design a new abstraction layer for features like server-side tool execution, he had Claude Code analyze Python client libraries from Anthropic, OpenAI, Gemini, and Mistral to craft curl commands for accessing raw JSON in streaming and non-streaming modes. The scripts and captured outputs are now available in the research-llm-apis repository.
Google ADK for Java 1.0.0
Agent Development Kit for Java v1.0.0 with Google Maps grounding, Human-in-the-Loop workflows, event compaction, Agent2Agent protocol, and session management. Framework for building scalable, interoperable AI agents.
Unified Dynamic Model Fetching
Major feature introducing unified dynamic model fetching across all providers (Ollama, OpenRouter, Anthropic, Gemini, OpenAI). Auto-discovers models without manual configuration, automatic capability detection, refresh button, persistent storage to config.yaml, and added support for Gemma 4 and GPT-5.4 families.
Security Fix: URL Encoding for Model IDs
Critical security fix for URL injection vulnerability in dynamic model fetching code. Model names/IDs are now properly URL-encoded when constructing reference URLs to prevent path traversal or manipulation with malicious model names.
Simon Willison: Google AI Edge Gallery for iPhone
Google's official app for running Gemma 4 models (E2B and E4B sizes) directly on iPhone. Works really well with E2B at 2.54GB. Features interesting 'skills' demo with tool calling against eight interactive HTML widgets.
Simon Willison: GLM-5.1 Towards Long-Horizon Tasks
Chinese AI lab Z.ai's GLM-5.1 is a 754B parameter model. Willison tested it with SVG generation and found it can generate HTML+CSS animations, though initially broken. When prompted about bugs, it correctly diagnosed and fixed CSS transform animation issues.
Google announces Gemma 4 open AI models, switches to Apache 2.0 license
Google announces new open AI models and invites developers to begin prototyping agentic workflows in the latest AI Core Developer Preview with Gemma E2B and E4B.
Claude 4: A Step Forward in Agentic Coding
Discussion about Anthropic's Claude 4 (Opus and Sonnet) achieving record-breaking 72.7% performance on SWE-bench Verified, surpassing OpenAI's latest models. Users report significant productivity gains with Claude Sonnet 4 for agentic coding tasks.
We all are living in the Sonnet 4 bubble
Discussion emphasizing that Sonnet 4 is considered 'legendary model for coding' and 'so good, maybe even too good.' Community members share positive experiences about Claude Sonnet 4's superior coding capabilities.
A guide to the best agentic tools and the best way to use them
Comprehensive guide ranking agentic coding tools with emphasis on Roocode and Cline. Discusses LLM model tiers for agentic coding, highlighting Sonnet 4.5 as 'the single best model for agentic coding' and positioning GPT in the top tier.
AI Coding Agent Dev Tools Landscape 2026
Analysis of the current state of coding agent frameworks, noting that while there are tons of coding agent frameworks, there's almost nothing for AI agents that handle infrastructure and incident response. Highlights gaps in the current tool ecosystem.
Which AI Coding Tools Do Developers Actually Use at Work?
JetBrains Research Survey reveals top tools developers actually use: Claude Code, Cursor, JetBrains AI Assistant, Junie, GitHub Copilot, OpenAI Codex, and Google's solutions. Over 500 LLM models now available across commercial APIs and open-source releases.
Simon Willison: Meta's Muse Spark Model with Interesting Tools
Simon Willison reviews Meta's new Muse Spark model (first since Llama 4), noting it's hosted not open weights, with private API preview. Features interesting tools in meta.ai chat including code interpreter and tool use capabilities.
Simon Willison: Anthropic's Project Glasswing Restricts Claude Mythos
Anthropic didn't release their latest Claude Mythos model publicly, instead making it available only to restricted preview partners under Project Glasswing. Willison argues this restriction sounds necessary for security research.
Improving Code Generation via Small Language Model-as-a-Judge
ArXiv paper discusses how LLMs have shown remarkable capabilities in automated code generation. Focus on improving code generation quality using small language models as judges to evaluate code generation performance.
Idea First, Code Later: Disentangling Problem Solving from Code Generation
ArXiv paper explores disentangling problem-solving capabilities from code generation when evaluating LLMs for coding tasks, providing insights into how we should measure AI coding performance.
RunawayContext
A universal framework for giving AI coding assistants persistent memory and project intelligence across sessions. Provides workspace-scoped memory that survives chat sessions and enables context-aware decision making.
cheetahclaws
CheetahClaws (Nano Claude Code) is a fast, easy-to-use, Python-native personal AI assistant for any model. Inspired by OpenClaw and Claude Code, it's built to work autonomously 24/7 with minimal resource requirements.
traceAI
Open-source LLM tracing framework designed specifically for GenAI applications. Captures every LLM call, prompt, token count, retrieval step, and agent decision as structured traces.
CCX-RS
Community Claude Code eXtended — a free, open-source AI coding assistant implemented in Rust. Features 19 tools, multi-model support (Claude/OpenRouter/Ollama), and a Claude Code-style TUI interface.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
New research from ACL 2026 proposing a self-auditing framework for LLM agents to ensure faithful reasoning. The approach allows agents to verify their own reasoning before committing to actions, addressing reliability and trustworthiness concerns in agentic AI systems. This is particularly relevant for AI coding assistants that need to ensure code correctness before execution.
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Novel framework for allowing skills in LLM agents to evolve collectively through an agentic evolver. The work in progress addresses the challenge of managing and improving tool-use capabilities in AI agents dynamically. This has implications for AI coding assistants that need to continuously improve their coding skills and tool usage.
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
Technical report on proactive agents with intent-awareness and long-term memory capabilities. The framework enables agents to maintain context over extended interactions and take initiative based on inferred user intent. This is particularly relevant for AI coding assistants that need to remember project context and proactively suggest improvements.
Lightweight LLM Agent Memory with Small Language Models
ACL 2026 accepted paper proposing using small language models to create lightweight memory systems for LLM agents. The approach addresses computational efficiency while maintaining effective memory capabilities for agentic systems. This has implications for making AI coding assistants more efficient and deployable.
SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
ACL 2026 paper presenting a framework for self-evolving agents that jointly optimize policy and tool graph memory. The approach enables agents to continuously improve their tool-use strategies and adapt to new tasks. This has direct applications for AI coding assistants that need to learn and evolve their capabilities.
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
Lex Fridman Podcast #490 featuring Nathan Lambert and Sebastian Raschka discussing the current state of AI in 2026. Topics include Large Language Models, AI and Coding, Scaling Laws, China's AI developments, AI Agents, GPUs, and Artificial General Intelligence. The 4.5-hour conversation provides comprehensive insights into the current landscape and future directions of AI technology.
Shadow APIs breaking research reproducibility crisis
A new paper (arxiv 2603.01919) audits shadow APIs - third party services claiming to provide GPT-5/Gemini access. Findings are alarming: 187 academic papers used these services, with the most popular one having 5,966 citations. Performance divergence up to 47%, safety behavior completely unpredictable, 45% of fingerprint tests failed identity verification. Many research papers might be built on fake model outputs. These services are popular due to payment barriers and regional restrictions. This undermines trust in the entire field and affects production systems that depend on specific model behavior.
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
Research analyzing how LLMs handle conflicts of interest when ads are incorporated into AI chatbot interfaces. The study examines the implications for user trust and decision-making. This is relevant for AI coding tools that may include sponsored suggestions or recommendations.
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search
ACL 2026 Findings paper analyzing what makes training data valuable for agentic search systems. The 15-page paper provides insights into data curation for training better search-capable agents. This is relevant for improving AI coding assistants' ability to search through and understand codebases.
Fireship: 7 new open source AI tools you need right now
Fireship presents seven new open source AI coding tools and frameworks. The video covers practical tools for building meeting bots, desktop recording apps, and other AI-powered applications. Includes partnership content with Recall.ai about rapid AI development workflows.
Meta's Muse Spark model and meta.ai chat tools
Meta announced Muse Spark, their first model release since Llama 4 almost exactly a year ago. It's hosted, not open weights, and the API is currently a private API preview to select users, but available to try on meta.ai (Facebook or Instagram login required). Simon Willison provides detailed analysis of the new model's capabilities and tools.
Anthropic's Project Glasswing - restricting Claude Mythos to security researchers
Anthropic didn't release their latest model, Claude Mythos, to the public. Instead, they made it available to a very restricted set of preview partners under their newly announced Project Glasswing. Simon discusses why this security-focused restricted access approach sounds necessary given the model's capabilities. The system card PDF provides details about the model's capabilities and safety considerations.
Cursor 3
A unified workspace for parallel local/cloud agents and MCPs (Model Context Protocols). Complete redesign built around AI agents from the ground up with multi-repository layouts, seamless local/cloud agent handoff, and parallel agent execution. Represents a major paradigm shift to agent-first IDE architecture.
Notion MCP
Model Context Protocol integration for Notion, enabling AI agents to directly access and manipulate Notion databases and documents. Part of the growing MCP ecosystem for agent-tool integration.
ArXiv: 'ACIArena: Toward Unified Evaluation for Agent Cascading Injection'
Provides a unified evaluation framework for agent cascading injection, addressing the challenge of evaluating complex multi-agent interactions. The work aims to standardize evaluation methodologies for cascading agent systems.
ArXiv: 'SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking'
SAT introduces a stepwise adaptive thinking approach that balances reasoning accuracy and efficiency in AI systems. The method allows AI models to adapt their reasoning depth based on task complexity and available computational resources.
ArXiv: 'Beyond Isolated Tasks: A Framework for Evaluating Coding Agents on Sequential Software Evolution'
This framework evaluates coding agents on sequential software evolution tasks, moving beyond isolated code generation to assess performance on complex, multi-stage software development processes. The work addresses the need for more comprehensive evaluation of AI coding assistants.
Simon Willison: 'Vulnerability Research Is Cooked'
Thomas Ptacek's analysis of how frontier AI models are revolutionizing vulnerability research and exploit development. Argues that coding agents will drastically alter both the practice and economics of vulnerability research within months, with agents able to 'find me zero days' by leveraging baked-in knowledge, pattern matching abilities, and brute force searching.
Microsoft Agent Governance Toolkit
An open-source, multi-language governance framework for autonomous AI agents with sub-millisecond policy engine, cryptographic agent identities, runtime isolation, and compliance automation mapped to EU AI Act, HIPAA, and SOC2.
Continue v1.3.38, v1.3.37, v1.3.36, v1.3.35 (VS Code & JetBrains)
Multiple releases in late March 2026 featuring config.yaml fixes, session history filtering by workspace directory, .continue/configs support, Ollama tool support improvements, critical and high security vulnerability fixes, JetBrains stability improvements including preventing IDE freezes and sidebar freezes, and ClawRouter provider for cost-optimized model routing.
What's the Best LLM for Coding in 2026
Discussion comparing different LLMs specifically for coding tasks, evaluating which models provide the best performance for programming tasks and developer productivity.
Are LLM merge rates not getting better?
Discussion about improvements in 2025 that made models and terminal-based apps like Claude Code much better, questioning whether merge rates continue to improve as AI coding tools evolve.
Eight years of wanting, three months of building with AI
Lalit Maganti built syntaqlite (high-fidelity devtools for SQLite) after procrastinating for 8 years. Used Claude Code to overcome initial hurdle with 400+ grammar rules. Key insight: AI made procrastination on design decisions worse because refactoring felt cheap. First AI prototype worked as proof of concept but lacked coherent architecture. Second attempt with more human-in-the-loop design took longer but produced robust library. AI weakness: struggles when you don't know what you want and when tasks have no objectively checkable answer (design vs implementation).
Vulnerability Research Is Cooked
Thomas Ptacek analyzes how frontier models are transforming vulnerability research. Within months, coding agents will drastically alter exploit development economics. LLMs excel at this due to baked-in knowledge of bug classes (stale pointers, integer mishandling, type confusion), pattern matching abilities across vast codebases, and the ability to run unlimited test trials. Kernel security reports have jumped from 2-3 per week to 5-10 per day, with duplicate reports becoming common.
Google Quietly Launches Offline-First AI Dictation App on iOS
Google quietly released an AI dictation app that works offline on iOS, similar to the AI Edge Gallery app for running local models.
Vulnerability Research Is Cooked - AI Agents Finding Zero Days
Thomas Ptacek's analysis of how frontier models are drastically altering vulnerability research and exploit development. Within months, coding agents will handle most high-impact vulnerability research by pointing an agent at source code. Agents excel at this because LLMs encode: (1) supernatural amounts of correlation across vast codebases, (2) complete library of documented bug classes, (3) pattern matching and constraint solving abilities, (4) ability to search forever without boredom. Vulnerability research is 'the perfect problem for an LLM agent' - outcomes are testable success/failure trials.
Others
ChatGPT healthcare usage insights from OpenAI
Chengpeng Mou, Head of Business Finance at OpenAI, shared anonymized U.S. ChatGPT data showing significant healthcare usage: ~2M weekly messages on health insurance, ~600K weekly messages from people living in 'hospital deserts' (30 min drive to nearest hospital), and 7 out of 10 messages happening outside clinic hours.
Eight years of wanting, three months of building with AI
Lalit Maganti's deep dive into building syntaqlite, a SQLite parser, formatter, and verifier. After procrastinating for 8 years due to 400+ grammar rules, Claude Code helped build the first prototype in 3 months. Key insights: AI excels at getting started quickly with concrete problems, but can lead to procrastination on key design decisions because refactoring feels cheap. The first AI-assisted prototype worked as proof-of-concept but lacked coherent architecture, requiring a second attempt with more human-in-the-loop decision making. AI struggles when tasks lack objectively checkable answers like design and architecture.
Eight years of wanting, three months of building with AI - syntaqlite story
Lalit Maganti's long-form writing on agentic engineering: spent 8 years thinking about and 3 months building syntaqlite, high-fidelity devtools for SQLite. Key insights: AI helped overcome procrastination on tedious work (400+ grammar rules), but AI made design decisions harder - cheap refactoring led to deferred decisions that corroded clear thinking. The second attempt involved more human-in-the-loop design decisions. The article is full of non-obvious downsides to working heavily with AI and how to overcome them. Critical insight: 'When I was working on something where I didn't even know what I wanted, AI was somewhere between unhelpful and harmful.'
Privacy and Data Usage Policy Change
Starting April 24, 2026, GitHub began using interaction data from Copilot Free, Pro, and Pro+ users to improve the service. Data being collected includes inputs, code snippets, prompts, and suggestions. Users can opt out in settings under 'Privacy'.