Founders Cutting AI Costs To Disrupt Nvidia’s $4T Monopoly

If you run a small business, you might already feel the AI pinch: your customer support runs on ChatGPT, your marketing automation uses Claude, and you’re paying for Grok’s research capabilities and real-time updates. For the average company (or user), those subscriptions can easily hit $300 a month, especially if you’re integrating multiple tools into your workflow. That’s a serious line item for what’s supposed to be affordable technology.

What most don’t realize is that these ballooning costs have more to do with the hardware on the backend than they do the software running their workflows. Every time an AI model responds, it triggers a process called inference: the act of generating output from a trained model. Unlike training—which costs a fortune but only happens once—inference occurs billions of times each day and scales with usage. It has become one of the largest ongoing expenses in AI, driving massive, sustained energy demand that fuels the industry’s growing power crisis.

For individuals and small business owners, this hidden cost means AI remains incredibly expensive. But that might be about to change. A new cohort of hardware startups—including Positron AI, Groq, Cerebras Systems, and Sambanova Systems—are racing to make inference radically cheaper. If they succeed, AI tools could drop from $300-a-month luxuries to accessible everyday infrastructure for freelancers, educators, retailers, and entrepreneurs.

If Positron and its peers succeed, the $300-a-month AI stack could shrink to $30. It could also be replaced entirely by tools you run yourself, privately and affordably. And that changes who gets to participate in the future of AI.

Among these, Positron has emerged as a favorite choice by some of the world’s dominant neocloud providers, gaining investor attention for its unique approach.

“The early benefits of AI are coming at a very high cost – it is expensive and energy-intensive to train AI models and to deliver curated results, or inference, to end users.” DFJ Growth co-founder Randy Glein said. “Improving the cost and energy efficiency of AI inference is where the greatest market opportunity lies, and this is where Positron is focused.”

Inference Is The New Electricity Bill

In the world of AI economics, inference is like your utility bill: it grows as you grow, and it’s never just a one-time fee. Whether you’re sending AI-generated emails or running a support chatbot, inference is what keeps the lights on—and right now, that light is powered by Nvidia’s premium-priced GPUs.

“Nvidia GPUs have become the backbone of AI infrastructure today, powering nearly every major inference workload at every major cloud provider. The downside to this, beyond having one $4 trillion company own the entire inference market, is that they weren’t designed with efficiency in mind. They’re built for flexibility and optimized for training complex models that require general-purpose chips for multifaceted tasks. And yet, the majority of inference today still runs on Nvidia hardware, leaving the industry with high power usage, steep cloud bills, and limited options for smaller players, which is why Positron is building the most energy-efficient inference-first chip.” said Mitesh Agrawl, CEO of Positron

The Race To Make AI Affordable

Those are exactly the problems Positron, Groq, Cerebras, and Sambanova are solving by building alternatives to the Nvidia tax. And while they all share a common goal—deliver inference infrastructure that slashes energy consumption, improves performance-per-dollar, and gives developers more control—Positron is arguably the most technically ambitious and commercially mature contender in this race.

Founded by systems engineer Thomas Sohmers and compiler expert Edward Kmett, Positron has taken a radically different path from its peers. Instead of building application-specific chips or chasing general-purpose GPUs, Positron bet on field-programmable gate array (FPGAs)—reconfigurable chips optimized for memory efficiency—and used them to build Atlas, an inference-first system designed from the ground up for performance and energy savings.

Atlas delivers 93 percent memory bandwidth utilization (vs. about 30 percent for GPUs), uses 66 percent less energy, and offers 3.5 times better performance per dollar—all while supporting seamless deployment with no code changes. That kind of out-of-the-box compatibility makes it a practical swap for existing cloud or local systems without forcing teams to rewrite their infrastructure from scratch. These gains have landed it major enterprise deployments with Cloudflare, Crusoe, and Parasail.

The company recently raised a $51.6 million Series A led by Valor Equity Partners, Atreides, and DFJ Growth—the very firms that bankrolled SpaceX, Tesla, X, and xAI, some of the world’s largest buyers of AI hardware.

Positron is already working on its next-generation system, Titan, built on custom “Asimov” silicon, which is expected to support models up to 16 trillion parameters with two terabytes of memory per chip—all while running on standard air-cooled racks. That could make high-throughput inference viable in a wider range of environments, from enterprise data centers to sovereign cloud infrastructure.

While others in the field are exploring niche optimizations, Positron is staking a claim to general-purpose inference acceleration—solving for cost and compatibility at scale. But it’s not alone.

Other Challengers Redefining The Stack

While Positron is focused on general-purpose inference acceleration, other challengers are tackling different bottlenecks. Groq is optimizing ultra-low-latency inference for large language models (LLM). Its Tensor Streaming Processor (TSP) delivers consistent, repeatable latency, with sub-millisecond response times—enabling a new class of AI tools that respond instantly—without incurring massive cloud costs—and laying the groundwork for local, responsive AI that could eventually be accessible to small businesses.

Cerebras brings an edge-native, security-first perspective. Its modular AI appliances can run powerful models entirely on-site—ideal for defense, critical infrastructure, or industries where cloud deployment isn’t an option. Cerebras makes it possible for organizations to deploy advanced AI with a small footprint—something previously only achievable by hyperscalers.

Sambanova is taking a full-stack approach, combining hardware and software to deliver vertically optimized AI systems. Rather than asking businesses to build training pipelines and inference clusters from scratch, they offer a turnkey platform with pre-trained models—essentially packaging AI as an appliance for organizations without a dedicated machine learning (ML) team.

All of these players are on a mission to unlock high-performance inference that doesn’t require hyperscaler infrastructure or ballooning cloud costs—opening the door to entirely new economic possibilities.

Why This Matters For Your Mottom Line

If inference becomes cheaper, everything changes. A Shopify seller could train and run a private AI model locally—without relying on costly cloud infrastructure. A solopreneur could fine-tune a sales assistant on years of customer emails, then run it on a $10 chip instead of a $30,000 graphic processing unit (GPU). A tutoring platform could deploy personalized lesson-plan generators without needing a full-time infrastructure team.

This is already happening. Smaller teams are building domain-specific copilots that live inside their own companies’ firewalls. Independent consultants are running multi-agent AI workflows from their laptops. This shows that inference costs are a technical problem, but more importantly, they’re the gatekeeper to who gets to build with AI.

Nvidia’s Grip Might Finally Be Loosening

Today, Nvidia holds a near-monopoly over AI infrastructure—and by extension, who gets to play. Its chips power the vast majority of generative AI systems worldwide, and its ecosystem (CUDA, TensorRT, etc.) makes switching difficult. The result is a pay-to-play system where cost determines access.

But that grip may not hold if companies like Positron, Groq, Cerebras, and Sambanova continue to gain traction and change the economics of AI. By lowering the cost of inference, they’re making it possible for smaller teams and individual users to run powerful models without relying on expensive cloud infrastructure.

This shift could have broad implications. Instead of paying hundreds of dollars a month for AI-powered tools, users may soon be able to run custom assistants, automations, and workflows locally—on hardware they control. For small businesses, freelancers, educators, and startups, that means more control, more customization, and a lower barrier to entry—for a fraction of today’s cost.

If inference becomes affordable, innovation stops being a privilege and starts becoming infrastructure. Because when you democratize cost, you decentralize control. The next chapter of AI won’t be written by whoever builds the biggest model, but by whoever makes it cheap enough to run—and that’s how you break a $4 trillion monopoly.

What's Hot

Founders Cutting AI Costs To Disrupt Nvidia’s $4T Monopoly

Inference Is The New Electricity Bill

The Race To Make AI Affordable

Other Challengers Redefining The Stack

Why This Matters For Your Mottom Line

Nvidia’s Grip Might Finally Be Loosening

Keep Reading

News

Mobile Apps

Subscribe to Updates