Skip to content

Linex

Map GPU performance metrics to your source code lines.

Installation

Terminal window
pip install -e .

Quick start

from linex import Linex
profiler = Linex()
profiler.profile("./my_app", kernel_filter="my_kernel")
# Show hotspots
for line in profiler.source_lines[:5]:
print(f"{line.file}:{line.line_number}")
print(f" {line.total_cycles:,} cycles ({line.stall_percent:.1f}% stalled)")

What you get

Instruction-level metrics mapped to source lines:

MetricDescription
latency_cyclesTotal GPU cycles
stall_cyclesCycles waiting (memory, dependencies)
idle_cyclesUnused execution slots
execution_countHow many times it ran
instruction_addressWhere in GPU memory

Compiling with and without -g

Buildinstructionssource_linesfile / line
With -gPopulated (ISA + cycles)Populated (aggregated by file:line)Real file path and line number
Without -gPopulated (ISA + cycles)Empty"" and 0
  • Use -g when you want source-line mapping: ISA instructions tied to file:line, and source_lines aggregated by source line.
  • Omit -g when you only need assembly-level metrics: you still get every instruction with isa, latency_cycles, stall_cycles, etc.

API

Linex class

profiler = Linex(
target_cu=0, # Target compute unit
shader_engine_mask="0xFFFFFFFF", # All shader engines
activity=10, # Activity counter polling
)

Methods:

  • profile(command, kernel_filter=None) — run profiling

Properties:

  • source_linesList[SourceLine] sorted by total_cycles
  • instructionsList[InstructionData]

SourceLine

Aggregated metrics for one source code line.

line.file # Source file path
line.line_number # Line number
line.total_cycles # Sum of all instruction cycles
line.stall_cycles # Cycles spent waiting
line.idle_cycles # Cycles slot was idle
line.execution_count # Total executions
line.instructions # List of ISA instructions
line.stall_percent # Convenience: stall_cycles / total_cycles * 100

InstructionData

Per-ISA-instruction metrics.

inst.isa # ISA instruction text
inst.latency_cycles # Total cycles for this instruction
inst.stall_cycles # Cycles spent waiting
inst.idle_cycles # Cycles slot was idle
inst.execution_count # How many times it ran
inst.instruction_address # Virtual address in GPU memory
inst.file # Parsed from source_location (empty without -g)
inst.line # Parsed from source_location (0 without -g)
inst.stall_percent # Convenience: stall_cycles / latency_cycles * 100

Examples

# Find memory-bound lines
memory_bound = [
l for l in profiler.source_lines
if l.stall_percent > 50
]
# Find hotspots with high execution count
hotspots = [
l for l in profiler.source_lines
if l.execution_count > 10000
]
# Instruction-level analysis
for line in profiler.source_lines[:1]:
for inst in line.instructions:
print(f"{inst.isa}: {inst.latency_cycles} cycles")

Requirements

  • Python >= 3.8
  • ROCm 7.0+ with rocprofv3