Monday, June 20
09:00 AM - 09:30 AM
Live in San Francisco
Less Details
The deep learning models driving innovation in autonomous vehicles are becoming more ambitious by the day, but their supporting infrastructures often struggle to keep up. Because a single GPU can’t accommodate the complex neural networks of enterprise AV projects, distributed training has emerged as the solution for training DL models and large data sets. In distributed training, storage, compute power and batch size are magnified with each GPU added to the cluster, dramatically reducing training time.
In this talk, we address a lane detection use case where Run:ai, Microsoft, and NetApp jointly built a distributed training DL solution at scale that runs in the Azure cloud. This solution enables data scientists to fully embrace the Azure cloud scaling capabilities and cost benefits for automotive use cases.
You’ll learn:
Run:ai’s cloud-native compute orchestration platform, Atlas, helps enterprises dramatically reduce the time to train and productize AI models by creating a virtual pool of compute resources and automating allocation. With dynamic, workload-aware scheduling, IT can achieve dreamed-about levels of GPU utilization, and ensure business goals are met with custom prioritization rules and dashboards. Data scientists can start experiments and run hundreds of training jobs without ever touching code. Run:ai partnered with Microsoft and NetApp to address a lane-detection use case by building a distributed training deep learning solution at scale that runs in the cloud. This solution enables data scientists to fully embrace cloud scaling capabilities and cost benefits for automotive use cases.
Since its inception in 2018, Run:ai has continued to break through the known limits of GPU technology, releasing multiple new capabilities in rapid succession, such as fractional GPU allocation, thin GPU provisioning, job swapping, and dynamic scheduling for NVIDIA’s Multi-Instance GPU (MIG) technology. It is the only AI infrastructure solution boasting near-100% GPU utilization for its enterprise customers. As cited in The Forrester Wave: AI Infrastructure, Q4 2021, Run:ai offers enterprises “complete flexibility in the hardware they choose to use and where they choose to run it.