Abstract: The recent advancements in neural network architectures, particularly transformers, have played a crucial role in the rapid progress of Large Language Models (LLMs). LLMs are trained on many ...