rocprof Metrics Terms

AMD provides a native profiler named rocprof in their ROCm stack to allow users to trace HIP/HSA or custom roctx annotated profiling range. To obtain the demanding hardware metric, one needs to specify the collected metrics in a file and pass to rocprof via -i option. The supported hardware metrics (basic or derived) can be listed via the following commands¹ :

rocprof --list-basic
rocprof --list-derived

To obtain specific metrics for a HIP executable, just run:

rocprof -i ./prof_metrics.txt -d ./data -t ./tmp -o output/tcc.csv ./gpu-stream-hip

However, there are a portion of AMD GPU conception and their abbreviations in the list, which confuse the profiler users². This article records some primary terms appearing in the profiler metrics.

Terms

TCC: texture channel cache, i.e., L2 cache or LLC cache in AMD GPU
TCP: texture cache private, i.e., L0 cache in RDNA or L1 cache in GCN/CDNA
EA: the interconnect between L2 and HBM (NoC)³
SQ: sequencer, i.e., hardware dispatcher⁴, which is responsible for issuing instructions
TA: texture address block, used to determine the effective address of load/store instructions for later coalesce⁴

Basic and derived metrics

In rocprf, there are two kinds of metrics: basic and derived, where the former is extracted directly from hardware performance monitor counters while the latter is obtained from the arithmetic expressions of several basic metrics. There are some components replicated across the entire GPU like LLC slice and HBM channels. For metrics related to them, rocprof is able to offer separate (for one single component) or aggregated (over the whole components). For instance, one can obtain the read requests from LLC slice 0 to the off-chip NoC with TCC_EA_RDREQ0 (there are total 32 LLC slices in MI100, so rocprof depicts this counter as TCC_EA_RDREQ[0-31]), or the aggregated (sum of average) value with TCC_EA_RDREQ_sum.

Epilog

Additionally, AMD research also maintains a research project called Omnitrace to collaboratively collect CPU + GPU profiles for parallel applications.

🌿 Xuanteng's Wiki

Explorer

rocprof Metrics Terms

Terms

Basic and derived metrics

Epilog

Graph View

Table of Contents

Backlinks

🌿 Xuanteng's Wiki

Explorer

rocprof Metrics Terms

Terms §

Basic and derived metrics §

Epilog §

Footnotes §

Graph View

Table of Contents

Backlinks

Terms

Basic and derived metrics

Epilog

Footnotes