Quick Start

This walkthrough profiles a GPU application with Metrix to get human-readable performance metrics.

Profile your application

CLI

# Profile with all metrics (GPU architecture auto-detected)
metrix ./my_app

# Time only (fast)
metrix --time-only -n 10 ./my_app

# Filter kernels by name
metrix --kernel matmul ./my_app

# Specific metrics
metrix --metrics memory.hbm_bandwidth_utilization,memory.l2_hit_rate ./my_app

Python API

from metrix import Metrix

profiler = Metrix()
results = profiler.profile("./my_app", num_replays=5)

for kernel in results.kernels:
    print(f"{kernel.name}: {kernel.duration_us.avg:.2f} μs")
    for metric, stats in kernel.metrics.items():
        print(f"  {metric}: {stats.avg:.2f}")

Example output

================================================================================
Metrix: all metrics (12 total)
Target: ./examples/01_vector_add/vector_add
================================================================================

────────────────────────────────────────────────────────────────────────────────
Dispatch #1: vector_add(float*, float const*, float const*, int)
────────────────────────────────────────────────────────────────────────────────
Duration: 7.29 - 7.29 μs (avg=7.29)

MEMORY BANDWIDTH:
  Total HBM Bytes Transferred                   8400896.00 bytes
  HBM Bandwidth Utilization                           1.34 percent
  HBM Read Bandwidth                                 35.47 GB/s
  HBM Write Bandwidth                                35.36 GB/s

CACHE PERFORMANCE:
  L1 Cache Hit Rate                                  66.67 percent
  L2 Cache Hit Rate                                  26.72 percent

Next steps

Dive deeper into profiling — see Metrix for all available metrics
Map performance to source lines — see Linex for source-level profiling
Extract and isolate a kernel — see Kerncap for standalone reproducers
Inspect GPU execution — see Nexus for HSA packet tracing
Validate optimizations — see Accordo for correctness checking
Set up MCP servers — see MCP Setup for LLM integration