Decisional | Spreadsheet-Native AI Agents for Workflow Automation

"Just adding AI" has burned a lot of founders, PMs, and enterprises adopting AI products. This post proposes a set of alternative principles that define what it means to be "AI Native".

Five Principles for AI Native Products

Use AI along the Jagged Edges
Reduce AI in everything else around the jagged edge
Extract context progressively
Target 100% workflow completion
Reduce the burden of review

Background

There are a lot of year-in-reviews by really smart people in the AI space. I love reading these because it gives a sense of what people believe is consensus and a hint at what people disagree with since it is a stake in the sand. If you didn't have the time, here are some simple arguments that you should be paying attention to.

"AGI takeoff" scenario likely - AI progress in 2025 was more incremental than expected
Technology Diffusion takes time: It has been gradual like all technological revolutions before it
"No Free Lunch for the Foundation Labs": The business impact is that foundation model labs now NEED TO CHOOSE the capabilities they want to boost performance on. No surprising capabilities are likely out of the box.

"The main strength of pre-training is that … you don't have to think hard about what data to put into pre-training … it's very natural data … it's like the whole world as projected by people onto text, and pre-training tries to capture that … Now, once you leave that regime, you have to think about what goal you are optimizing for, what the data is, and that becomes a design choice …" - Ilya on Dwarkesh Podcast

What was remarkable to me was that there was almost no to very little disagreement among the smartest people in AI at a fundamental level i.e Andrej Karpathy, Ilya Sutskevar, Dan Wang, Dwarkesh, etc.

The Consensus

We are far away from a human level general learner mainly due to a lack of the ability for AI models to be continual learners and generalise. They can't learn new skills easily.

Reinforcement Learning with Verifiable Reward is what foundation labs are investing in to push the capability curve. If you want a simple way of understanding it: getting performance out of AI models this way is much harder than just training on internet data like GPT-3/4 did. What came out of O1 and R1 (reasoning models) is using this technique—taking human-graded solutions, incorporating a reward signal for better answers, and improving the model's ability on specific tasks.

Jagged (AI) vs Human (Round)

If you want to understand the potential trajectory of intelligence and how to build products here you need to know that human intelligence is ROUND whereas AI intelligence is JAGGED. The average human is a good all-rounder at a suite of tasks like driving, excel editing, presentation making, stacking boxes, etc. with the occasional Prodigy that spikes in one direction.

"LLMs… display amusingly jagged performance characteristics — they are at the same time a genius polymath and a confused and cognitively challenged grade schooler…" - Karpathy in 2025 LLM Year in review

This means that Agents are going to be superhuman in some ways (Coding, Information Research, Math, etc.) and incredibly poor compared to average human intelligence in other ways. And the foundation labs are going to spend a lot of resources on using RLVR to improve progress along narrow directions but it is not going to be filling gaps along all the deficient areas (presentation formatting or Excel editing).

What it Means for AI Products

So does this mean that the average knowledge worker can get back to work and not have to worry about AI? Should businesses not try to deploy AI Agents?

Not quite. The cost per unit of labor has changed for a certain set of tasks in every job description. This is along the lines of coding, web research, etc. These tasks now should be channeled through AI. The rest of the stuff is really where you need to fill in the gaps. If you believe that the cost of inference is going to reduce it will not be viable for humans to compete in these areas.

"My co-founder Simon was what we call a 10× programmer, but he rarely writes code these days. Walk by his desk and you'll see him orchestrating three or four AI coding agents at once, and they don't just type faster, they think, which together makes him a 30-40× engineer. He queues tasks before lunch or bed, letting them work while he's away. He's become a manager of infinite minds." - Ivan, Notion co-founder

Key Numbers

2 years ago: AI could do tasks taking humans ~9 minutes
Now: AI can do tasks taking humans ~4+ hours (at 50% reliability)
Extrapolated: Soon AI will handle tasks taking humans weeks

Principle #1 - Use AI Along the Jagged Edges

Coding, Research, Web Search, Imagegen. These all work and your primary flow should incorporate these in order to bring some value to the table that exceeds human ability.

Principle #2 - Reduce AI Around the Jagged Edge

To fill in for the average intelligence of these AI models, DO NOT RELY on them completing avg IQ tasks, reduce the AI in all of this. Use reliable software engineering. Set guardrails to protect AI from doing things it is not supposed to be doing. E.g asking the user before sending an email.

AI SDRs that auto-send emails without approval have burned countless companies. The ones that work (like Instantly or Smartlead) use AI for copywriting but keep sends, sequences, and timing in deterministic logic.

AI needs to draft, humans approve, and software executes.

Principle #3 - Extract Context Progressively

Since you are using AI that is superhuman in certain ways, you need to get all the context it needs to deliver the best possible results. Most of the time users are lazy so get started with a simple prompt, but use questions and clarifications to eke out relevant information that can affect the quality of the result.

Examples: Plan mode in Claude Code, Clarification widget in OpenAI Shopping Research

Principle #4 - 100% Workflow Completion

"AI doesn't do it end-to-end. It does it middle-to-middle." - Balaji

AI lands in the realm of "gets you 60% there" which is generally a failure. AI lands in this territory precisely because of its jagged nature. It excels at the meaty middle and fumbles the first mile and last mile.

This is why you cannot treat AI as a reliable employee that just gets the job done. Products need to be designed to exploit the peaks and compensate for the valleys.

This is why copying code from ChatGPT created the opportunity for AI IDEs - they closed the loop. If your product is not aiming to get 100% of a workflow done, you are not going to cut it and making the right tradeoffs to get you to 100% is what builders should focus on.

Principle #5 - Reduce the Burden of Review

Users don't trust AI to get everything right, and they shouldn't. But forcing them to review everything forfeits all the gains in productivity. As the capability of the model gets better, the burden of review should be getting smaller, so it is up to the product to define the layers of review that users can drill into. (Cursor has moved from code editing to full agents that write, execute and test code). In the case of code it is compilation and tests. In the case of research work it may involve reading the entire body of work (more burdensome less popular use case). In the case of Image generation it might be looking at the asset (quick review means products like midjourney are doing well here).

Example: Vibecoding platforms (Replit, Lovable, etc) have code viewable only on clicking on a separate tab. They first allow you to review what the app looks like by rendering the code on the front end. The code and database may be hidden in the back end.

Design heuristic: You could have predicted vibe coding in the GPT-3 era by asking one question: "What if users only reviewed the final website, not the frontend code, backend code, and infra underneath?"

Conclusion

The foundation model labs will keep pushing the jagged edges outward, that's their job. Your job is to build products that ride those edges without pretending the gaps don't exist. Draft with AI, approve with humans, execute with software. Extract context progressively, close the loop on workflows, and collapse the review burden to the minimum viable artifact.

Apply that question to your domain!

Thanks to Mayank Juneja & Adit Sanghvi for their inputs!

References

Dwarkesh Patel: A long-form interviewer and synthesizer of frontier AI thinking
Andrej Karpathy: A hands-on AI builder and educator (ex-OpenAI, ex-Tesla)
Dan Wang: A China and industrial-tech analyst
Foundation Capital: An early-stage venture firm
Noah Smith (Noahpinion): An economist-turned-writer
The Free Press: A multi-author outlet providing contrarian retrospectives
Stratechery (Ben Thompson): A tech strategy analyst
Ilya Sutskever (interview): A leading AI researcher and co-founder of OpenAI

The Practical Handbook for Building AI Native Products