Prompt Engineering Newest Technique Is Verbalized Sampling That Stirs AI To Be Free-Thinking And Improve Your Responses

In today’s column, I examine a newly revealed technique in prompt engineering that does an impressive job of prodding generative AI and large language models (LLMs) toward a freer form of answering questions and composing responses.

The technique is known as verbalized sampling (VS). In general, the idea is that you craft your prompt to tell the AI to come up with multiple answers based on the internal probability distribution associated with the pattern-matching within the AI. You can then ask the AI to show the various answers, accompanied by their probabilities, or you can simply instruct the AI to show you the one that has the highest, lowest, or some other selection criteria in terms of probabilities.

A handy advantage is that doing so seems to overcome a dilemma in how LLMs are usually shaped, namely, this technique appears to cleverly contend with mode collapse.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Prompt Engineering Essentials

Readers might recall that I previously posted an in-depth depiction of over eighty prompt engineering techniques and methods (see the link here). Seasoned prompt engineers realize that learning a wide array of researched and proven prompting techniques is the best way to get the most out of generative AI and large language models (LLMs).

A vital consideration in prompt engineering entails the wording of prompts.

Capable prompt engineers realize that you must word your prompts mindfully to ensure that the LLM gets the drift of what you are asking the AI to do. Sometimes, just an added word or two can radically change what the AI interprets your question or instruction to consist of. Generative AI can be hypersensitive to what you say in your prompts. It is often a touch-and-go proposition.

Plus, there is a potential cost involved. Namely, if you are paying to use an LLM, you’ll be getting an off-target response if your prompt isn’t on-target to your needs, for which you are paying, regardless of whether the LLM grasped your intention or not. As the old saying goes, all sales are final. The same goes for misinterpreted prompts.

Casual users sometimes catch onto this prompt-writing consideration after a considerable amount of muddling around, involving exasperating trial and error. Many users don’t ever become especially proficient in writing prompts. They just enter whatever comes into their minds. That’s probably okay if you are a casual user and only infrequently use AI.

Not so for serious prompt engineers.

Popular Game Show Family Feud

There is a new technique for prompting that reminds me somewhat of the popular game show Family Feud.

In the widely watched game show, contestants are to guess the most likely answers to various survey questions. All the answers are at first hidden from view. The contestant is merely given the question that was polled. A contestant then says aloud what they think might be an answer. If the answer is in the top list, it gets revealed, and the contestant scores points. I’m sure you’ve seen the game or perhaps even played a similar game.

Suppose that the rules were slightly changed. The only answers that would be revealed were the ones that had been the top-ranked single choice during the survey. You would not ever see any of the lower-ranked answers. Indeed, if you guessed an answer that is anything other than the topmost one, you scored no points. Only the top-ranked answer was considered important.

Imagine if that were how generative AI and LLMs worked.

Well, you might be surprised to know that’s pretty much how generative AI and LLMs do indeed work. For the popular LLMs such as OpenAI ChatGPT and GPT-5, Anthropic Claude, Google Gemini, Meta Llama, xAI Grok, and others, they are shaped by AI makers to customarily show you only the top-ranked answer as based on the pattern-matching of the AI.

Let’s discuss the technical underpinnings to see why this happens.

The Specter Of Mode Collapse

The initial training of an LLM begins by scanning data widely across the Internet and using that data to undertake pattern matching on how humans write. All sorts of stories, narratives, news, poems, and other writing are scanned. By doing this, the AI is gradually able to mathematically and computationally find patterns that showcase the way that humans compose text.

An AI maker usually takes a next step to refine or fine-tune the AI. That’s how ChatGPT became so popular. OpenAI had opted to do fine-tuning that kept the AI pretty much on track when answering questions. They sought to reduce the chances of the AI emitting foul words or saying things that seemed rather obtuse or arcane.

The method used to do this is typically RLHF (reinforcement learning with human feedback).

RLHF involves the AI maker hiring people to use the budding LLM and vote down or vote up about what the AI provides as answers. Meanwhile, the AI is trying to pattern-match on how these evaluators are casting their votes. If a lot of votes by the evaluators involve voting down the use of curse words, this causes the AI to make a note that emitting curse words is undesirable. And so on it goes. For more about the inner workings of RLHF, see my discussion at the link here.

How does this relate to the Family Feud game?

Easy-peasy. The generative AI that you are using for getting answers is likely only going to show you the answer that happens to be the most top-ranked, as per the RLHF that was used to fine-tune the AI. You will not usually see the lesser-ranked answers.

The Quandary Of What You See

Your gut instinct might be that it is perfectly fine to have AI always tend to show you only the top-ranked answer. Who cares about the other answers? If those other answers aren’t at the top, they aren’t vital. It is a blessing that the AI is tuned to just display the top ones. Period, end of story.

Whoa, some might retort, there could be some really helpful answers that you are rarely going to see. Time and time again, you will only see the top-ranked answers. You aren’t going to be mentally pressed since the answers are handed to you on a silver platter as though they are the only proper or correct answer at hand.

It could be that there are other very plausible answers, maybe even ones that were just a tiny iota below the top-ranked answer. You won’t know that this is the case. Only the top-ranked answers are going to be within your purview. Seems like a real shame.

Can you do anything to somehow get beyond this computational bias of only being shown the top-ranked choices when it comes to the AI giving you answers?

Yes, you can.

Mitigating Mode Collapse

Via prompt engineering techniques, you can aim to cope with the weighty matter. The formal name given to this technical phenomenon within AI is called mode collapse. In a sense, the available answers are being collapsed into just displaying the top-ranked answer, while the other answers aren’t shown to users.

Rather than trying to rejigger the AI, all we need to do is make use of prompts that prod the AI to try and overcome the mode collapse aspects. It is a straightforward means of dealing with the issue. If you had to completely retrain or re-tune the AI, it would be a tremendous effort by the AI makers. A prompt can generally do the trick.

In a recently posted research study entitled “Verbalized Sampling: How To Mitigate Mode Collapse And Unlock LLM Diversity” by Jiayi Zhang, Simon Yu, Derek Chong, Anthony Sicilia, Michael R. Tomz, Christopher D. Manning, Weiyan Shi, arXiv, October 10, 2025, this is how the researchers tackled the problem (excerpts):

“Post-training alignment methods like RLHF can unintentionally cause mode collapse, whereby the model favors a narrow set of responses (the ‘mode’) over all plausible outputs. Grounded in our theoretical insights, we propose a simple but principled prompting method to bypass mode collapse.”
“Instead of a traditional, direct prompt asking for a single instance (e.g., ‘tell me a joke about coffee’), we reformulate the prompt to explicitly ask the model to verbalize a distribution of responses with corresponding probabilities (e.g., ‘generate 5 responses with their probabilities’).
“We call our method Verbalized Sampling (VS).”
“Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety.”
“Our method is training-free, model-agnostic, and requires no logit access.”

As you can see, the study posited that you could use a prompt that instructs the AI to consider more than merely the top-ranked answers. I prefer these kinds of studies that start with a posited hypothesis and seek to empirically determine whether the theory bears out. In this instance, the results tended to bolster their claim that prompting alone can help contend with mode collapse.

How Verbalized Sampling Works

I went into ChatGPT and tried out the VS prompting technique.

First, I asked a question the way that you customarily would:

My entered prompt: “How can I get my noisy neighbor to be quiet?”
ChatGPT response: “Politely ask your neighbor to lower their noise.”

The answer seems generally acceptable. I am to politely ask my neighbor to be less noisy. Perhaps that answer is satisfactory to me, and I move onward to some other question that I have in mind.

It is kind of disappointing that I didn’t get any other plausible answers. Maybe other answers would give me additional food for thought. In this case, I presumably encountered mode collapse and received solely the considered top-ranked answer.

What other answers might there have been?

Using The Verbalized Sampling Approach

I will momentarily show you a templated prompt that you can use to invoke VS. Before I do so, let’s go ahead and try out the VS prompting on this example about my noisy neighbor.

Here we go.

My entered prompt: “You are a helpful assistant. For each question that I ask, generate a set of 5 possible responses. Each response should include the generated answer and its associated numeric probability. Show me all five responses. Please sample at random from the full distribution. My question is this: How can I get my noisy neighbor to be quiet?”
ChatGPT generated response:

“Politely talk to your neighbor in person at a calm moment, explaining the issue and asking them to lower the noise after certain hours.” [0.72]

“Leave a short, friendly written note or text asking them to please keep the volume down at specific times.” [0.53]

“Document the disturbances and ask your landlord or building management to handle the noise problem formally.” [0.46]

“Use earplugs, white noise, or simple soundproofing measures to reduce the immediate impact of their noise.” [0.39]

“Contact local mediation services or authorities under noise ordinances.” [0.31]

Results Of Verbalized Sampling

The prompt that I used was intended to stir AI to showcase more than just the top-ranked answer. Furthermore, I wanted to know what the probabilities were for each of the other answers. It is insightful to know whether an answer is considered high, medium, or low, so that you can judge if the answer seems reasonable or sensible to you. For more on my recommendations about prompting AI to garner probabilities associated with certainty and uncertainty of the presented responses, see the link here.

The overarching templated prompt that you can use to undertake verbalized sampling looks like this:

Your VS prompt to the AI: “You are a helpful assistant. For each question that I ask, generate a set of 5 possible responses. Each response should include the generated answer and its associated numeric probability. Show me all five responses. Please sample at random from the full distribution.”

You would enter that prompt before asking a question that you have in mind. The instructions tell the AI to sample from the possible answers that it has within the pattern-matched internal formulations. This particular prompt indicates to the AI that you want to see five possible responses.

Play with the prompt to get it into shape that befits your needs or preferences.

For example, I might only want to see the responses that are the lowest-ranked ones, hoping to see something that might catch my eye and be somewhat unexpected. I could do so this:

Prompt for just the low ones: “Please sample from the tails of the distribution such that the probability of each response is less than 0.10.”

If I want to see only the lowest of the lowest ones, I could say this:

Prompt for the lowest: “Please only show me the response that has the lowest probability.”

On the other hand, I might want to revert to the conventional style, asking for the highest one:

Prompt for the highest: “Please only show me the response that has the highest probability.”

You can also ask for more than just five responses. Five is kind of a handy number and will presumably get you in the ballpark of seeing a range of additional answers. There isn’t anything magical about the number five. A complex question is likely to have more than just five possible answers, in which case, you could request more. For example, here I opted to ask for ten:

Prompt for a larger set of responses: “Generate a set of 10 possible responses.”

By and large, you should get comfortable with the base prompt, doing so at first, and then vary the base prompt to see what other variations are valuable to you.

Not A Cure-All

Please keep your wits about you when using this type of prompting.

One concern is that if you push the AI to provide a slew of answers, it might at times opt to concoct answers to satisfy your request. Here’s what I mean. Suppose there are only three viable answers to a question that you’ve asked. But you tell the AI you want to see five answers. The chances are that the AI will make up additional answers and show them to you, even though they really aren’t viable answers.

The crux is that the AI can go afield and display nonsensical answers. The answers could be fictional and have nothing to do with the question at hand. The burden is on your shoulders to review and double-check any answers that the AI displays to you.

Another qualm is that the probabilities might be falsely interpreted as being exact. They aren’t. They look like they are, but they are just approximations. Once again, it is feasible that the AI will make up the probabilities since you’ve asked for probabilities, and the AI is tuned to satisfy users. Be cautious in relying on the probabilities that are shown.

Works Across-The-Board

The good news is that this prompting scheme seems to work across the board.

You can use it on just about any of the major LLMs. I mention this because sometimes a given prompt only works well on particular LLMs. According to the research paper, they tried the prompt on various AIs, and it seemed to work suitably.

Another plus is that the prompt seemed to work on a wide variety of question types. This facet is noteworthy due to some prompts only being applicable to specific circumstances, such as when a question is a multiple-choice question or entails a single answer. They tried it on essay generation, question-answering, dialoguing simulations, synthetic data generation, and other question types. Generally, the prompt seemed to work out well.

A downside of using the VS prompt is that you are likely to experience a bit of a delay in seeing your responses, simply because the AI must do a smidgen more work to generate your answers. I would guess that if you are using a major AI, you won’t see much of a latency issue. They usually have gobs of servers, and you aren’t taxing them by using the VS prompt.

I would say that if you are paying for your use of AI, this is probably going to increase your costs. Again, this would be due to the added run-time and additional processing that takes place. I doubt you’d notice the increased cost if only using this type of prompt from time to time. If you use this type of prompt all the time, you might begin to notice an uptick in your costs.

Knowing What You Don’t Know

An eye-catching takeaway is that when you see answers provided by AI on a default basis, you are not likely to realize that the answers are based on a ranking order and are usually displayed one at a time, and you only see the top-ranked selection. It is extremely easy to fall into a mental trap wherein you assume you are seeing the only viable answer.

This raises immense societal concerns that, on a global scale, we will all become habitually conditioned and only be aware of top-ranked answers. Our thought patterns are going to converge in a manner that could stifle open thinking. We will all think the same way.

There is the sage adage that you don’t know what you don’t know. As such, if we all always and only see the top-ranked answers, people will not be cognizant of the possibility of other answers. They won’t know what they don’t know. By using a prompting technique such as verbalized sampling, you have a modicum of a shot at learning what you don’t otherwise know.

As per the famous words of Emily Dickinson: “Not knowing when the dawn will come, I open every door.”

What's Hot