Papers: Cloud GPU services

Papers

[ATC ‘21] INFaaS: Automated Model-less Inference Serving
[IPDPS ‘22] DGSF: Disaggregated GPUs for Serverless Functions Remote GPU + serverless, CUDA API interception and remoting, virtual resource handle.
[IPDPS ‘23] GPU-enabled Function-as-a-Service for Machine Learning Inference Treat uploaded GPUs models resident in device memory as cache items, and forward request to GPU agents where the demanding model exists (cache locality-aware scheduling).
[ASPLOS ‘22] Astraea: Towards QoS-aware and Resource-efficient Multi-stage GPU Services Deploy multi-stage microservices among GPU clusters with minimal communication overhead.