Thinking Tokens for Language Modeling

How much is 56 times 37? Language models often make mistakes in these typesof difficult calculations. This is usually explained by their inability toperform complex reasoning. Since language models rely on large training setsand great memorization capability, naturally they are not equipped to runcomplex calculations. However, one can argue that humans also cannot performthis calculation immediately and require a considerable amount of time toconstruct the solution. In order to enhance the generalization capability oflanguage models, and as a parallel to human behavior, we propose to use special’thinking tokens’ which allow the model to perform much more calculationswhenever a complex problem is encountered.

Academy

IntPDF

Thinking Tokens for Language Modeling

Further reading

Thinking Tokens for Language Modeling

Further reading#

Further reading