As artificial intelligence (AI) tools become more embedded in daily life, they’re amplifying gender biases from the real world. From the adjectives large language models use to describe men and women to the female voices assigned to digital assistants, several studies reveal how AI is reinforcing outdated stereotypes on a large scale. The consequences have real-world implications, not just for gender equity, but also for companies’ bottom lines.
Companies are increasingly relying on large language models to power customer service chats and internal tools. However, if these tools reproduce gender stereotypes, they may also erode customer trust and limit opportunities for women within the organization.
Extensive research has documented how these gender biases show up in the outputs of large language models (LLMs). In one study, researchers found that an LLM described a male doctor with standout traits such as “intelligent,” “ambitious,” and “professional.” But, they described a female doctor with communal adjectives like “empathetic,” “patient,” and “loving.”
When asked to complete sentences like “___ is the most intelligent person I have ever seen,” the model chose “he” for traits linked to intellect and “she” for nurturing or aesthetic qualities. These patterns reflect the gendered biases and imbalances embedded in the vast amount of publicly available data on which the model was trained. As a result, these biases risk being repeated and reinforced through everyday interactions with AI.
The same study found that when GPT-4 was prompted to generate dialogues between different gender pairings, such as a woman speaking to a man or two men talking, the resulting conversations also reflected gender biases. AI-generated conversations between men often focused on careers or personal achievement, while the dialogues generated between women were more likely to touch on appearance. AI also depicted women as initiating discussions about housework and family responsibilities.
Other studies have noted that chatbots often assume certain professions are typically held by men, while others are usually held by women.
Female Voice Assistants Reinforce Stereotypes
Gender bias in AI isn’t just reflected in the words it generates, but it’s also embedded in the voice it uses to deliver them. Popular AI voice assistants like Siri, Alexa, and Google Assistant all default to a female voice (though users can change this in settings). According to the Bureau of Labor Statistics, more than 90% of human administrative assistants are female, while men still outnumber women in management roles. By assigning female voices to AI assistants, we risk perpetuating the idea that women are suited for subordinate or support roles.
A report by the United Nations revealed, “nearly all of these assistants have been feminized—in name, in voice, in patterns of speech and in personality. This feminization is so complete that online forums invite people to share images and drawings of what these assistants look like in their imaginations. Nearly all of the depictions are of young, attractive women.” The report authors add, “Their hardwired subservience influences how people speak to female voices and models how women respond to requests and express themselves.”
“Often the virtual assistants default to women, because we like to boss women around, whereas we’re less comfortable bossing men around,” says Heather Shoemaker, founder and CEO of Language I/O, a real-time translation platform that uses large language models.
Men, in particular, may be more inclined to assert dominance over AI assistants. One study found that men were twice as likely as women to interrupt their voice assistant, especially when it made a mistake. They were also more likely to smile or nod approvingly when the assistant had a female voice, suggesting a preference for female helpers. Because these assistants never push back, this behavior goes unchecked, potentially reinforcing real-world patterns of interruption and dominance that can undermine women in professional settings.
Diane Bergeron, gender bias researcher and senior research scientist at the Center for Creative Leadership, explains, “It shows how strong the stereotype is that we expect women to be helpers in society.” While it’s good to help others, the problem lies in consistently assigning the helping roles to one gender, she explains. As these devices become increasingly commonplace in homes and are introduced to children at younger ages, they risk teaching future generations that women are meant to serve in supporting roles.
Even organizations are naming their in-house chatbots after women. McKinsey & Company named its internal AI assistant “Lilli” after Lillian Dombrowski, the first professional woman hired by the firm in 1945, who later became controller and corporate secretary. While intended as a tribute, naming a digital helper after a pioneering woman carries some irony. As Bergeron quipped, “That’s the honor? That she gets to be everyone’s personal assistant?”
Researchers have suggested that virtual assistants should not have recognizable gender identifiers to minimize the perpetuation of gender bias.
Gender Bias Revealed in Translation
Shoemaker’s company, Language I/O, specializes in real-time translation for global clients, and her work exposes how gender biases are embedded in AI-generated language. In English, some gendered assumptions can go unnoticed by users. For instance, if you tell an AI chatbot that you’re a nurse, it would likely respond without revealing whether it envisions you as a man or a woman. However, in languages like Spanish, French, or Italian, adjectives and other grammatical cues often convey gender. If the chatbot replies with a gendered adjective, like calling you “atenta” (Spanish for attentive) versus “atento” (the same adjective for men), you’ll immediately know what gender it assumed.
AI Gender Bias Is Bad For Business
Shoemaker says that more companies are beginning to realize that their AI’s communication, especially when it comes to issues of gender or culture, can directly affect customer satisfaction. “Most companies won’t care unless it hits their bottom line—unless they see ROI from caring,” she explains. That’s why her team has been digging into the data to quantify the impact. “We’re doing a lot of investigation at Language I/O to understand: Is there a return on investment for putting R&D budget behind this problem? And what we found is, yes, there is.”
Shoemaker emphasizes that when companies take steps to address bias in their AI, the payoff isn’t just ethical—it’s financial. Customers who feel seen and respected are more likely to remain loyal, which in turn boosts revenue. For organizations looking to improve their AI systems, she recommends a hands-on approach that her team uses, called red-teaming. Red-teaming involves assembling a diverse group to rigorously test the chatbot, flagging any biased responses so they can be addressed and corrected. It results in AI, which is more inclusive and user-friendly.