The last week brought announcements that could change how we build web software — not just how fast we write code, but who (or what) actually does the heavy lifting. Google released Gemini 3 and an “agent-first” platform called Antigravity, while OpenAI rolled out Aardvark, an AI agent focused on finding and fixing vulnerabilities. At the same time, academic research reminded us of a hard truth: patches generated by AI can pass tests and still introduce security flaws. If you work on web projects, this is big — in good and worrying ways.
What dropped this week (short version)
- Gemini 3: Google’s latest frontier model — more capable at reasoning and tool use. blog.google
- Antigravity: an agent-first IDE/platform where AI agents can act inside the editor, terminal and browser to plan, execute and verify tasks. It’s public preview and aims to change developer workflows. Google Antigravity+1
- Aardvark: OpenAI’s agentic security researcher that scans codebases, validates exploitability and suggests patches — currently rolling out in beta. OpenAI
- Research warning: large-scale studies show LLM-generated patches can be functionally correct but still introduce new vulnerabilities that tools/tests miss. That’s a real risk. arXiv
Why this matters for web development
- New workflows — dev as conductor, not typist
Agents like those in Antigravity can scaffold projects, write endpoints, generate tests and even run deploys. Your role increasingly becomes: define goals, review agent outputs, set policies and ensure quality. That’s a shift from “keyboard-first” to “orchestration-first.” Google Antigravity - Security automation gets smarter — and creepier
Aardvark-style tools prove that agents can behave like security researchers: they don’t just statically flag code, they try to exploit and confirm issues before suggesting fixes. That can speed up remediation — but only if locked into safe review processes. OpenAI - Tests are necessary but not sufficient
The academic analyses show LLMs can pass functional tests while still introducing subtle vulnerabilities. Meaning: your CI green checkboxes won’t catch everything if a patch was generated by an AI. arXiv - Faster prototyping, higher attack surface
Agentic tools can accelerate MVPs and proofs of concept, but they can also increase the amount of generated code that needs security vetting. Teams that adopt agents without new safety processes risk shipping insecure systems faster.
Practical checklist: how to use agentic tools safely (do this today)
- Treat AI-generated patches as draft PRs — require human review by someone who understands security. arXiv
- Add exploit validation to your pipeline: when a tool flags a bug and suggests a patch, run tests that include privilege checks, auth flows and boundary conditions. Aardvark-like validation is useful — but don’t skip the human step. OpenAI
- Create agent guardrails: limit what agents can do automatically (no direct production deploys), require approvals for infra changes, and log all agent actions (artifacts, screenshots, commands). Antigravity’s artifact model is a good example to emulate. Google Antigravity
- Expand your test suite beyond functionality: add security-focused tests (fuzzing, auth tests, injection scenarios). Don’t rely on unit tests alone. arXiv
- Train your team: teach prompt-writing, output auditing, and how to triage agent-produced issues. The human-in-the-loop skill is now crucial. Google Developers Blog
Example scenarios (concrete)
- Good use: Ask an agent to scaffold a REST API with authentication and basic tests. The agent creates code and tests; devs review, tighten auth checks and merge. Win — saves days. Google Antigravity
- Bad use: Agent auto-applies a “fix” in production and deploys — it passes unit tests but opens an auth bypass. Disaster. Always require staged review. arXiv
- Hybrid: Use agentic AppSec (like Aardvark) to surface issues, but route every suggested patch through a security engineer or an automated security pipeline that runs exploit validation in an isolated environment. OpenAI
The career angle: skills that will win in the next 12–24 months
- Prompt engineering with intent: write prompts that produce auditable, minimal and explainable code. Google Developers Blog
- Security auditing of AI outputs: humans who can quickly find the weak spots in AI-generated patches will be in demand. arXiv
- DevOps + policy design: setting CI rules, agent permissions, artifact retention and emergency kill-switches. Google Antigravity
Bottom line
Agentic tools like Antigravity and Aardvark are not incremental improvements — they push us into a different model of software creation. That’s exciting: faster builds, smarter security scanning, better automation. But excitement without discipline leads to risk. The teams that win will be those that adopt agentic workflows and simultaneously harden review, testing and guardrails.
If you want, I can:
- Convert this into a technical how-to (guardrails + CI examples).
- Produce a short LinkedIn post and 3 tweets ready to publish.
- Draft a follow-up guide: “How to audit AI-generated patches” with a checklist and sample tests.