Subscribe
Logo
Logo
  • Topics Icon Topics
    • AI Icon AI
    • Banking Icon Banking
    • Blockchain/DeFi Icon Blockchain/DeFi
    • Embedded Finance Icon Embedded Finance
    • Fraud/Identity Icon Fraud/Identity
    • Investing Icon Investing
    • Lending Icon Lending
    • Payments Icon Payments
    • Regulation Icon Regulation
    • Startups Icon Startups
  • Podcasts Icon Podcasts
  • Products Icon Products
    • Webinars Icon Webinars
    • White Papers Icon White Papers
  • TechWire Icon TechWire
  • Search
  • Subscribe
Reading
How Traversal Prevents Million-Dollar Outages
ShareTweet
Home
AI
How Traversal Prevents Million-Dollar Outages

How Traversal Prevents Million-Dollar Outages

Christine Hall·
Home
·Apr. 30, 2026·7 min read

“It’s like finding a needle in a haystack with fake needles everywhere.” – Anish Agarwal, co-founder and CEO of Traversal

Website outages are painful, but in the age of AI-generated code they’re turning existential. Last year, companies, including Amazon Web Services, Azure, Cloudflare and Google Cloud all announced major outages, some lasting over 15 hours.

As Traversal co-founder and CEO Anish Agarwal puts it, the oft-quoted “$2 million an hour” figure during a downtime is now just a starting point, unfortunately, for large enterprises. 

“The problem only gets bigger, the larger the company gets,” he said. “The $2 million might even be small if we’re talking about some of the largest. I’m certain AWS’s recent outage was an order of orders of magnitude bigger than $2 million an hour.”

The stakes aren’t just abstract numbers on a slide. Agarwal has watched outages end careers. For example, Optus CEO Kelly Bayer Rosmarin resigned in 2023 following a 14-hour network outage, and more recently, IndiGo airline CEO Pieter Elbers resigned after an outage led to thousands of flight cancellations.

“CEOs are fired when they’re no longer hitting the agreements that they have contractually obligated to hit with customers,” Agarwal said. “Once you don’t do that, it’s a security problem. You’re in breach of your contract, and that leads to massive fines and reputational damage.”

Why the old model broke

This isn’t a new problem, but the increase in AI use is pouring gasoline on an already burning fire, according to Agarwal. 

Even before generative AI, the amount of data produced from software was going up, yet the number of people who can troubleshoot well has been flat, Agarwal said. Why? Site reliability engineers (SREs) are scarce and budgets are capped even as observability has become “the second largest spend, typically, for a company after cloud spend,” he said.

That means the status quo can look like a hospital emergency room on a bad night when something breaks in a large system.

“It spreads like an epidemic throughout your entire system,” Agarwal said. This is because each team only understands its own part of the system, so connecting the dots between all these teams with limited context is painful, he said. 

In a pre-AI world, a major incident can mean 50 to 60 engineers in a “war room” for hours troubleshooting while millions of dollars are wasted.

Now add AI-generated code. More organizations are under pressure “to apply AI to everything,” with one of the clearest returns on investment areas being software development via tools like Claude or Cursor, Agarwal said.

It also causes some CIOs to regret their decisions. AI company Dataiku polled 800 CIOs and found 74% of them were under pressure to “deliver measurable business gains from AI within the next two years” or risk their jobs. 

That’s leading to some harried decision-making. The same percentage also “regret at least one major AI vendor or platform decision made in the last 18 months.”

The result of all that pressure is a ton of code being written by AI. And large enterprises also give AI systems permissions that they might typically not give so that they can see what the AI can do. This is known as “dangerously skip permissions,” a mode in Claude that bypasses the need for user approval before the AI performs an action.

The combination of more opaque code, more permissions and less human context means things are breaking in ways not seen before.

“No one has context of the code, and the amount of code is blowing up as well,” Agarwal said. “So the outages are getting way, way worse than they used to be, which was already really bad.”

From causal ML research to AI SRE

All of this became the thesis for Agarwal’s company, Traversal, which launches AI SREs to find the root cause of a network outage before engineers need the war room. 

Agarwal didn’t arrive at this problem as a traditional SaaS founder. His research while getting a Ph.D. at MIT and as a current professor at Columbia centered on a niche but powerful area: causal machine learning. 

“These AI systems are very good at picking up minute correlations in data and not very good at picking up cause-and-effect relationships,” he said. “My research was how do you get these AI systems to learn cause-and effect-relationships from data automatically?”

That turns out to be exactly what’s missing in today’s incident responses, and what Traversal is solving. In a complex distributed system, an outage looks like “finding a needle in a haystack with fake needles everywhere,” Agarwal said. 

The hard question, according to him, is: “When you see an issue, is it a symptom of the problem? Is it just a spurious correlation because something else is wrong in the system, or is it the root cause?”

Agarwal joined with Ahmed Lone, Raaz Dwivedi and Raj Agrawal to research this, and says the light-bulb moment came when he and his co-founders connected that research to the reality of operations. They also played with early AI coding tools and saw the trajectory clearly.

“If AI is going to write all of your code, and no one’s going to understand it, we need AI to fix your code as well,” Anish Agarwal said. “That was really the key moment for us.”

He also felt that some of the most interesting work in AI was happening in companies now, and that a company “with research in its DNA,” tackling a deeply technical problem. was the right expression.

Ending the 2 a.m. emergency calls

Traversal describes itself as an AI SRE agent that “autonomously troubleshoots, remediates and even prevents production incidents.” To understand what that means, Agarwal paints a before-and-after picture.

Before Traversal, Agarwal saw a lot of those “war room” scenarios play out where an engineer gets paged at “ungodly times of the day,” and joins an incident war room in Slack or Zoom to figure out what went wrong. Hours go by until there’s an “aha moment” and the team finally converges on a fix. 

“It’s like this heart attack that an organization goes through every time a [critical] incident happens,” Agarwal said.

With Traversal, the workflow looks very different. For example, when there’s an incident, a ticket gets created, and Traversal automatically kicks off. By the time an engineer shows up, Traversal has come back with an answer, Agarwal said. 

Not only an answer, but tells the engineer who is needed to verify what Traversal has said. So instead of 50 people, five or six people are needed to verify the answer,” then execute the mitigating steps Traversal proposes, Agarwal said.

Rather than an average three hours, it becomes something like 15 minutes to get to the root cause of an incident and mitigate it,” he said. 

For some customers, Traversal has moved beyond recommendation into action. They have trusted the organization with autonomously healing their system without a human in the loop. Agarwal called this “self driving production,” where “Traversal finds the issue, tells you the mitigating steps, and then heals the system fully autonomously” without needing to get anyone up at 2 a.m.

Tangible ROI from AI

Over the last nine months, Agarwal has seen observability and reliability having a “ChatGPT moment,” with enterprises actively seeking AI SRE solutions to keep increasingly AI-generated code stable in production.

Agarwal emphasizes that the product is now at a point where it can deliver fast, repeatable time-to-value — often within 30 days — by significantly reducing mean time to resolution.

As a result, Traversal is in go-to-strategy mode, growing the company by four times to over 70 people and turning on the sales engine after gaining clients, including American Express and Pepsi.

The company has moved so aggressively and hired so strategically that one of Agarwal’s friends commented that Traversal has created “the Avengers of enterprise sales.”

In just a few months, Traversal has hired, among them, a vice president of worldwide sales, vice president of field engineering and vice president of marketing, all from blue-chip infrastructure and observability companies like AppDynamics, Cribl, SignalFx and Splunk, along with more than 10 sales executives and supporting solutions engineers.

In addition to securing more customers, Traversal’s vision extends well beyond incident response. The team is building what Agarwal calls a “production world model,” which is a rich representation of a company’s production environment analogous to the simulators used in self-driving cars. 

This world model doesn’t just power faster root-cause analysis; it can also be surfaced to AI coding tools to help them write more resilient code before it ever reaches production.

“The market for this is massive, and if you start collecting all this data and correlating across all these disparate systems, you can really rethink all of the maintenance of software, and that’s the vision of where we’re going,” Agarwal said.

  • Christine Hall
    Christine Hall

    Christine Hall is a freelance journalist who previously wrote about enterprise/B2B, e-commerce, and foodtech for TechCrunch, and venture capital rounds for Crunchbase News. Based in Houston, Christine previously reported for the Houston Business Journal, the Texas Medical Center’s Pulse magazine, and Community Impact Newspaper. She has an undergraduate journalism degree from Murray State University and a graduate degree from The Ohio State University.

    View all posts
Related

What is Really Going on With Private Credit

Combining Households in the Era of Modern Money Management

Funded: Zenskar lands $15M to rebuild billing for AI-era finance teams

Overheard At HumanX 2026

Popular Posts

Today:

  • What is Really Going on With Private CreditWhat is Really Going on With Private Credit Apr. 30, 2026
  • How Traversal Prevents Million-Dollar OutagesHow Traversal Prevents Million-Dollar Outages Apr. 30, 2026
  • FN2What Fintech Events Are Missing — And How to Get More Out of Them Mar. 19, 2026
  • 2026 FintechWhat does 2026 hold for Fintech?  Jan. 29, 2026
  • peter2The Flipping Point: Why Fintech Meetup 2026 Marked the End of AI Hype Apr. 6, 2026
  • fundedAmigo AI raises $11M to train clinical AI agents like doctors Mar. 13, 2026
  • Jon StonaTips from Airwallex x McLaren on Making the Best of a Fintech Sponsorship  Jun. 18, 2025
  • fundedBeautiful.ai lands $45M to turn prompts into polished decks Mar. 20, 2026
  • FN1Pigment co-CEO Eléonore Crespo wants to give CFOs superpowers Mar. 19, 2026
  • FN1No Backspace in the Physical World – Building AI for 5,000-lb Machines Apr. 9, 2026

This month:

  • Alloy President Laura SpiekermanAlloy President Laura Spiekerman on Agentic AI and Identity Risk Apr. 2, 2026
  • peter2The Flipping Point: Why Fintech Meetup 2026 Marked the End of AI Hype Apr. 6, 2026
  • FNCombining Households in the Era of Modern Money Management Apr. 23, 2026
  • FN1No Backspace in the Physical World – Building AI for 5,000-lb Machines Apr. 9, 2026
  • HumanX_recapHumanX: Between Prophecy and Procurement Apr. 9, 2026
  • HumanXOverheard At HumanX 2026 Apr. 16, 2026
  • What is Really Going on With Private CreditWhat is Really Going on With Private Credit Apr. 30, 2026
  • How Traversal Prevents Million-Dollar OutagesHow Traversal Prevents Million-Dollar Outages Apr. 30, 2026
  • FNThe Bank Charter Gold Rush: What’s Really Happening and What it Means for Banking Feb. 12, 2026
  • NumosFunded: Numos raises $4.25M to make AI accountable to finance teams Apr. 3, 2026

More News
  • About
  • Contact
  • Disclaimer
  • Privacy Policy
  • Terms
Subscribe
Copyright © 2026 Fintech Nexus
  • Topics
    • AI
    • Banking
    • Blockchain/DeFi
    • Embedded Finance
    • Fraud/Identity
    • Investing
    • Lending
    • Payments
    • Regulation
    • Startups
  • Podcasts
  • Products
    • Webinars
    • White Papers
  • TechWire
  • Contact Us
Start typing to see results or hit ESC to close
lis digital banking USA Lending Club UK
See all results