Ensuring LLMs Stability: Automated Detection of Under-trained Tokens in NLP

Language models have revolutionized the field of natural language processing (NLP) by enabling computers to understand and generate human language. These models, often referred to as large language models (LLMs), are trained on massive amounts of text data, allowing them to acquire a deep understanding of language patterns and structures. However, LLMs can be prone to errors and produce unpredictable outputs when encountering tokens that are under-trained or absent in their training data. To address this issue, researchers from Cohere have developed a groundbreaking AI paper that enhances language model stability through automated detection of under-trained tokens in LLMs.

The Problem with Under-trained Tokens

Tokenization is a crucial process in NLP, where text is divided into smaller units, or tokens, for analysis and modeling purposes. Tokens serve as the building blocks of language models, allowing them to process and generate text effectively. However, when tokens within an LLM’s vocabulary are underrepresented or absent in the training data, the model may encounter what researchers refer to as “glitch tokens.” These glitch tokens can destabilize the model and lead to unexpected outputs or errors.

One common issue in LLMs is the misalignment between tokenizer training and model training. Tokenizers are often trained separately using different datasets, which can result in some vocabulary tokens being under-trained. These under-trained tokens can cause the model to exhibit unintended behaviors or produce nonsensical outputs when encountered in new input data. For example, the infamous “_SolidGoldMagikarp” token has been known to induce hallucinations or generate nonsensical text in language models.

Automated Detection of Under-trained Tokens

To address the challenge of under-trained tokens in LLMs, the researchers from Cohere propose a novel approach that utilizes the model’s embedding weights for automated and scalable detection. By analyzing the embedding matrix of an LLM, the researchers can identify tokens whose embedding weights deviate significantly from those of well-represented tokens. These deviations serve as indicators of insufficient training and can be used to detect under-trained tokens.

The detection process involves calculating the variance and distribution of embedding weights and comparing them against a normative model of adequately trained tokens. Tokens with embedding weights that deviate significantly from the normative model are identified as potentially under-trained. This automated approach provides a systematic way to pinpoint glitch tokens and improve the stability and reliability of language model.

Effectiveness of the Approach

The researchers conducted experiments to evaluate the effectiveness of their automated detection method. They applied the approach to several well-known LLMs, including variations of Google’s BERT and OpenAI’s GPT series. The analysis revealed that a significant percentage of the tokenizer’s vocabulary, up to 10% in some cases, comprised under-trained tokens. These tokens were often specialized or infrequently used words that exhibited the most significant discrepancies in embedding weight patterns.

The results demonstrate the potential of the automated detection method to identify under-trained tokens and address the instability issues in LLMs. By rectifying under-trained tokens, developers can improve the accuracy and robustness of language models, making them more reliable for real-world applications.

Implications for Language Model Development

Language models play a vital role in various applications, from automated writing aids to sophisticated conversational agents. However, the presence of under-trained tokens can hinder their performance and introduce errors. The AI paper from Cohere highlights the importance of addressing this vulnerability in LLM training and presents an automated solution to mitigate the issue.

By integrating automated methods for detecting under-trained tokens into the training and development processes of language models, developers can ensure that all tokens in the model’s vocabulary are adequately prepared to handle real-world data. This advancement enhances the efficacy and reliability of language models, opening up new possibilities for more accurate and effective natural language processing tools.


The AI paper from Cohere introduces an innovative approach to enhance language model stability by automating the detection of under-trained tokens. This method utilizes the embedding weights of the model to identify tokens that deviate significantly from well-represented tokens, thereby identifying potential glitch tokens. The research demonstrates the effectiveness of the approach and its ability to improve the accuracy and robustness of language models. By addressing the issue of under-trained tokens, developers can enhance the performance of language models and enable more reliable and effective natural language processing applications.

The automated detection of under-trained tokens in LLMs is a significant advancement in the field of NLP. As language models continue to play a crucial role in various domains, such as content generation, virtual assistants, and chatbots, ensuring the stability and reliability of these models becomes paramount. The research conducted by Cohere paves the way for more robust training processes and more accurate natural language processing tools. By leveraging automated techniques, developers can optimize language models and unlock their full potential in understanding and generating human language.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on LinkedIn. Do join our active AI community on Discord.

Explore 3600+ latest AI tools at AI Toolhouse 🚀.

Read our other blogs on AI Tools 😁

If you like our work, you will love our Newsletter 📰

Rohan Babbar

Rohan is a fourth-year Computer Science student at Delhi University, specializing in Machine Learning, Data Science, and Backend development. With hands-on experience in these domains, he has also made notable contributions as an open-source contributor.

Leave a Reply

Your email address will not be published. Required fields are marked *