-
From standard softmax attention to FlashAttention & FlashDecoding (Japanese)
Attention calculation is a key component in transformer models. This blog post explains how attention calculation is accelerated, starting from naive softmax attention to FlashAttention and FlashDecoding. This post is written in Japanese.