1. Motivation

How do Large Language Models (LLMs) think?

Chain-of-thought (CoT) reasoning [1, 2, 3] shows that by allowing LLMs to think out loud before answering, their performance actually improves considerably when compared to direct answering techniques without CoT.

CoT prompting example.png

This provides some intuition as to how LLMs reason through their tasks.

Recent work [3, 4] suggests that for CoT reasoning, LLMs answers could be unfaithful to the intermediate reasoning steps (simply put, the LLM answers do not tally with their “workings”).

Some guiding questions:

Do LLMs need to think out loud or are they able to think “internally” like humans?
Are there specific tasks to show that LLMs are not simply relying on the semantic information found in the additional tokens?