Discovering How Language Models Choose the Next Word

Joseph Weizenbaum, at MIT between 1964 and 1967, developed ELIZA, an early natural language processing program [1]. He was surprised that people attributed human-like emotions to this program. Today, we often make a similar mistake with Large Language Models, which are more advanced but still fundamentally simple machines.

In this example, we visualise how a small pre-trained model selects the next word for a sentence [2]. The model uses this new sentence as a starting point for the next word, forming a complete statement eventually.

To understand the model's word choices, we examine its 13 hidden layers, focusing on the top-rated tokens in each layer. These hidden layers are the building blocks for the model's final word choice in a sentence.

This analysis is a part of the "Primer in Generative AI for Business" workshop conducted by Philipp Thomann and myself [3].

[1] https://en.wikipedia.org/wiki/ELIZA

[2] https://huggingface.co/gpt2

[3] https://www.academy.d-one.ai/generative-ai

Discovering How Language Models Choose the Next Word

Trend Tuesday: Disruptive Startups Shaping the Future Today

Emulating Lioness Focus: The Path to Success in Data-Driven Value Creation