# Optimizing Computation

Below you can find a list of resources that describe optimizations of AI models.

- [Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm](https://rocm.blogs.amd.com/artificial-intelligence/torch_compile/README.html)

- [Developing Triton Kernels on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/triton/README.html)

- [Accelerating Large Language Models with Flash Attention on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html)

- [Automatic mixed precision in PyTorch using AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/automatic-mixed-precision/README.html)

- [Large language model inference optimizations on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/llm-inference-optimize/README.html)

- [Reduce Memory Footprint and Improve Performance Running LLMs on AMD Ryzen™ AI and Radeon™ Platforms](https://community.amd.com/t5/ai/reduce-memory-footprint-and-improve-performance-running-llms-on/ba-p/686157)

- [Unveiling performance insights with PyTorch Profiler on an AMD GPU](https://rocm.blogs.amd.com/artificial-intelligence/torch_profiler/README.html)

- [AMD Zen Deep Neural Network (ZenDNN)](https://www.amd.com/en/developer/zendnn.html)

- [Accelerating models on ROCm using PyTorch TunableOp](https://rocm.blogs.amd.com/artificial-intelligence/pytorch-tunableop/README.html)

- [Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD](https://rocm.blogs.amd.com/artificial-intelligence/roberta_amp/README.html)

- [Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/bnb-8bit/README.html)

- [Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch](https://rocm.blogs.amd.com/artificial-intelligence/int8-quantization/README.html)

----------
Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.

SPDX-License-Identifier: MIT