Alan Cowen feigns a dejected expression. âMy dog died this morning,â he says, speaking to an AI model from startup Hume that claims to detect more than 24 distinct emotional expressions lacing a personâs voice â from nostalgia to awkwardness to anxiety â and respond to them accordingly.
âI’m so sorry to hear about your loss. Losing a pet is never easy,â the AI responded, in the voice of Matt Forte, Humeâs creative producer, tinged with sympathy and disappointment.
A former Google researcher, Cowen founded Hume in 2021 to build âemotionally intelligentâ conversational AI that can interpret emotions based on how people are speaking and generate an appropriate response. Since then, over 1,000 developers and 1,000 companies including SoftBank and Lawyer.com have used Humeâs API to build AI-based applications that can pick up on and measure a vast range of emotional signals in human speech through aspects like the rhythm, tone and timbre of the voice as well as sighs, âummsâ and âahhs.â
âThe future of AI interfaces is going to be voice-based because the voice is four times faster than typing and carries twice as much information,â Cowen told Forbes. âBut in order to take advantage of that you really need a conversational interface that captures more than just language.â
The New York-based startup announced Wednesday that it has raised $50 million in a series B funding round led by Swedish investment firm EQT Ventures with Union Square Ventures and angel investors Nat Friedman and Daniel Gross participating. The influx of new funding values the startup at $219 million.
The company also announced the launch of âHume EVI,â a conversational voice API that developers integrate into existing products or build upon to create apps that can detect expressional nuances in audio and text and produce âemotionally attunedâ outputs by adjusting the words and tone of the AI. For instance, if the AI picks up on sadness and anxiety in the userâs voice, it replies with hints of sympathy and âempathic painâ in its own verbal response.
These empathetic responses arenât entirely new. When Forbes tested OpenAIâs ChatGPT Plus with the same prompt â âMy dog died this morningâ â it gave a nearly identical verbal answer to Hume. But the startup aims to distinguish itself on its ability to identify underlying expressions.
To do that, Humeâs in-house large language model and text-to-speech model is trained on data collected from more than a million participants across 30 countries, which includes millions of human interactions and self-reported data from participants reacting to videos and interacting with other participants, Cowen said. The demographic diversity of the database helps the model learn cultural differences and be âexplicitly unbiased,â he said. âOur data is less than 30% Caucasian.â
Hume uses its in-house model to interpret emotional tone, but for more complex content it relies on external LLMs, including OpenAIâs GPT 3.5, Anthropicâs Claude 3 Haiku and Microsoftâs Bing Web Search API generates responses within 700 milliseconds. The 33-year-old CEO said Humeâs technology is built to mimic the style and cadence of human conversations and can detect when a person interrupts the AI to stop the conversation as well as knows when itâs its turn to speak. It also occasionally pauses when speaking, and will even chuckle â which is slightly disconcerting to hear coming from a computer.
Even though Humeâs technology seems to be more sophisticated than previous types of emotional detection AI, which relied more on facial expressions, using any kind of AI to detect complex and multidimensional emotional expressions through voice and text is an imperfect science and one that Humeâs AI admits is one of its biggest challenges. Emotional expressions are highly subjective and are influenced by a range of factors including gender and social and cultural norms. Even if the AI is trained on diverse data, using it to interpret human expressions could give biased results, studies have shown.
When asked about the obstacles AI has to overcome to have human-like conversations, the AI said itâs difficult to respond to âthe nuances of emotion and context and language.â âIt’s a complex task to interpret tone, intent and emotional cues accurately in real time.”
Humeâs AI isnât always accurate, either. When Forbes tested Humeâs AI, asking it questions like âwhat should I eat for lunchâ, the AI detected âboredomâ and five other expressions like âinterestâ and âdetermination.â
Cowen, who has published more than 30 research papers on AI and emotion science, said he first realized the need for tools that can detect and measure human expressions in 2015 while advising Facebook on how to make changes to its recommendation algorithms that would prioritize peopleâs well-being.
Humeâs AI has been integrated into applications in industries like health and wellness, customer service and robotics, Cowen said. For instance, online attorney directory Lawyer.com is using Humeâs AI to measure the quality of their customer service calls and train their agents.
In the healthcare and wellness space, the use cases are more nascent. Stephen Heisig, a research scientist at Icahn School of Medicine, the medical school for New York-based Mount Sinai Health System, said heâs using Humeâs expression AI models to track mental health conditions like depression and borderline personality disorder for patients in an experimental study called âdeep brain stimulation,â a treatment in which patients have electrodes implanted inside their brain. (The study only accepts patients for whom no other treatments or therapies have worked, he said.) Humeâs AI models are used to help detect how patients are feeling and whether the treatment is working on a day-to-day basis. Heisig said Humeâs AI can be used by psychiatrists to give them more context on emotions that may not be easy to detect.
âThe patients we have in the DBS study, they do two video diaries a day. They have sessions with the psychologist and psychiatrist, and we record those, and we use Humeâs models to characterize facial expression and vocal prosody,â Heisig told Forbes.
Humeâs models have also been integrated into Dot, a productivity chatbot that helps people plan and reflect on their day. Samantha Whitmore, cofounder of New Computer, an OpenAI-backed early stage startup thatâs building the chatbot, said that Humeâs AI offers âexpanded contextâ on how a person is feeling.
âIf it detects levels of stress or frustration, it might say âit sounds like there’s a lot on your plate, should we try to figure out how to make this seem more manageable,ââ she said. âIt helps meet them where they are in their state of mind.â