The agentic AI platform wars are at full swing, and today, Google and Google Cloud are launching Gemini Enterprise, a one-stop, full-stack platform that can integrate across an entire business or enterprise. The platform unites multiple mature components, including: the latest Gemini models to research any task; a no-code workbench which can be used to orchestrate tasks and automation processes by any user; a pre-built repertoire of Google Agents that can be further customized for local business needs; and an entire data connectivity layer that securely understands a company’s data, wherever it is housed. All of this is managed by a centralized governance framework that gives organizations the ability to easily track, audit and secure their agents in one place.
Why is this important? Because there has never been a more important time for the use of agents in the workplace. What was previously Google’s Agentspace ecosystem has now been rolled into Enterprise so that the technology can be used synchronously with a business’ core processes and data.
This is particularly useful in healthcare and life-sciences, where these agents can often act as massive time-saving additions and significantly augment workflows and processes. Shweta Maniar, global director of healthcare and life sciences at Google Cloud, explains that agentic AI is a game-changer. Especially when it comes to incredibly complex tasks such as regulatory reviews, there is a lot of scope for this technology. For example, life sciences companies currently spend a significant amount of time submitting applications to regulatory bodies. Despite the incredible amount of time spent, these submissions often entail errors, quality inconsistencies and mistakes that can cause monumental delays. Maniar describes how using orchestration agents can significantly enhance the process by reviewing guidelines, understanding past successful submissions, and parsing across different data sources and documents to create a much better submission that can avoid delays. She also explains that with this, “there is a significant opportunity to actually empower individuals to spend their time doing more high-quality work, rather than routine or monotonous tasks. This is the power of this technology—to give people the gift of time and satisfaction.”
Importantly, Google has ensured that all of their tools in this space are completely auditable and trackable, and certainly rely on human input to finalize results. Maniar emphasizes that keeping a human-in-the-loop is one of the most important aspects of all of this to ensure high-quality and safe results.
This has been a major topic of discussion in recent months, especially as 2025 has been the year of agentic workflows. The research community has experessed concern that although agents have significant potential to create numerous workflow benefits, real-time benchmarks to assess their efficacy and outputs are required, especially if they are being used in crucial industries such as healthcare.
A recent study in NEJM AI described one group’s solution to this exact concern: a virtual EHR environment, called MedAgentBench, to impartially benchmark medical LLM agents. The group explains that the reason for their study is because “no standardized benchmark exists for evaluating the agent capabilities of LLMs in medical contexts, which have unique intricacies and highly specialized data (e.g., multiple systems, abbreviations, longitudinal patient records). Robust evaluation is essential for safe AI deployment, but the lack of benchmark datasets hinders agent adoption in health care due to a lack of trust, safety concerns, and regulatory hurdles.”
Another study discusses the use of τ-bench, which has a goal of “emulating dynamic conversations between a user (simulated by language models) and a language agent provided with domain-specific API tools and policy guidelines.” This domain-specific guideline approach is what is necessary to ultimately pressure test agents in very niche specialties.
Indeed, these benchmark studies have become top-of-mind as more companies are entering this space. Earlier this week, OpenAI launched AgentKit, a platform for businesses and users to build and launch their own agentic systems. Claude is innovating in the same space as well.
Why the sudden rush for agents? Because large language models have finally reached a point where many tasks can now be routinely orchestrated by advanced reasoning agents. Virgin Voyage, for example, found that by employing specialized AI agents via Google Cloud, they were able to reduce time spent by their marketing teams by nearly 40%.
In fact, a recent study by McKinsey & Co., found that in certain use cases, embedding AI agents into workflows increased productivity gains nearly 20% to 60% and introduced 30% increases in decision-making speeds.
Therefore, the ROI for organizations to invest in AI agents is slowly becoming clearer, though it may still take time to tangibly measure. Indeed, it is still very much early days for these systems; but, given how fast the technology has progressed already, one thing is certain: it will undoubtedly change the way that industries and societies envision human productivity in the decades to come.