Simple Attention Visualizer

Description

I have created a simple attention visualizer for transformer models. It is available at this link. It can

  1. Visualize all attention heads for a specific layer
  2. Show average attention for each layer.
  3. Single heatmap averaging all layers and heads.

The code should work for any causal LLMs.

More Details in the Repository.

Visualization Examples




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Random Idea Exploration 01
  • Llama Token Embedding to Model Head Experiment
  • Louvain Clustering
  • Induction as a Reduction
  • Two Envelope Problem