Papers

  • [EuroSys ‘23] Pocket: ML Serving from the Edge

Code: https://github.com/GTkernel/Pocket

  • [ATC ‘21] INFaaS: Automated Model-less Inference Serving

  • [IPDPS ‘22] DGSF: Disaggregated GPUs for Serverless Functions Remote GPU + serverless, CUDA API interception and remoting, virtual resource handle.

  • [IPDPS ‘23] GPU-enabled Function-as-a-Service for Machine Learning Inference Treat uploaded GPUs models resident in device memory as cache items, and forward request to GPU agents where the demanding model exists (cache locality-aware scheduling).

  • [ASPLOS ‘22] Astraea: Towards QoS-aware and Resource-efficient Multi-stage GPU Services Deploy multi-stage microservices among GPU clusters with minimal communication overhead.