Papers: Tensor Core

What’s Tensor Core (TC)? It’s a ASIC integrated in the general purpose GPU (GPGPU) to designed for accelerating GEMM workload composing a large portion of machine learning applications. However, since there are obstacles to exploit TC effectively in CUDA, programmers are hardly to make use of TC to speedup their applications.

Dissections & Microbenchmarks

[TPDS ‘23] Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors
[IPDPS ‘20] Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply
Dissecting the NVIDIA Turing T4 GPU via Microbenchmarking

TC with Intra-SM Parallelism

[HPCA ‘22] Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
[ISVLSI ‘22] Improving GPU Throughput through Parallel Execution Using Tensor Cores and CUDA Cores
[ICCD ‘21] Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks

GEMM / Scientific / DL App. with TC

[SC ‘22] Efficient quantized sparse matrix operations on tensor cores
[ISCA ‘22] SIMD2: a generalized matrix instruction set for accelerating tensor computation beyond GEMM
[ISC ‘22] Toward accelerated stencil computation by adapting tensor core unit on GPU
[MICRO ‘20] Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores
[ISC ‘19] Accelerating reduction and scan using tensor core units

GNN with TC

[OSDI ‘23] TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs
[PPoPP ‘22] QGTC: accelerating quantized graph neural networks via GPU tensor core

🌿 Xuanteng's Wiki

Explorer

Papers: Tensor Core

Dissections & Microbenchmarks

TC with Intra-SM Parallelism

GEMM / Scientific / DL App. with TC

GNN with TC

Graph View

Table of Contents

Backlinks

🌿 Xuanteng's Wiki

Explorer

Papers: Tensor Core

Dissections & Microbenchmarks §

TC with Intra-SM Parallelism §

GEMM / Scientific / DL App. with TC §

GNN with TC §

Graph View

Table of Contents

Backlinks

Dissections & Microbenchmarks

TC with Intra-SM Parallelism

GEMM / Scientific / DL App. with TC

GNN with TC