Simple Attention Visualizer

Description

I have created a simple attention visualizer for transformer models. It is available at this link. It can

  1. Visualize all attention heads for a specific layer
  2. Show average attention for each layer.
  3. Single heatmap averaging all layers and heads.

The code should work for any causal LLMs.

More Details in the Repository.

Visualization Examples




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Old NLP Paper Reviews
  • Showcase of EMNLP Sycophancy Papers
  • Torch MPS Basic Speedup Test
  • Random Cosine Similarity Distribution
  • Further Look into Cancer-Myth. Does the LLM Ignore False Presupposition Due to Lack of Knowledge or is it Sycophantic?