Efficient GPU infrastructure at LinkedIn // Animesh Singh // MLOps Podcast #299

This episode explores the challenges and advancements in deploying Large Language Models (LLMs) and AI agents at scale, particularly within the context of LinkedIn's infrastructure. Against the backdrop of LinkedIn's significant investment in GPU infrastructure (a 7x increase in fleet size and 150x increase in model training size), the conversation highlights the rising cost of inferencing as a major hurdle. More significantly, the discussion delves into the complexities of applying LLMs to traditional machine learning (ML) tasks like recommendation ranking (Rexis), questioning whether it's always the optimal solution and emphasizing the need for cost-effective, low-latency solutions. For instance, the use of LLMs in real-time feed generation is examined, weighing the benefits of improved personalization against the increased computational expense. The development of Liger, an open-source initiative aimed at improving GPU kernel efficiency and reducing training times, is presented as a key innovation in addressing these challenges. Finally, the episode touches upon the evolving role of memory optimization and the need for more elastic and serverless architectures to maximize GPU utilization and minimize costs, concluding with a discussion on the convergence and divergence of infrastructure needs for traditional ML and LLM-based applications.

Outlines

Sign in to continue reading, translating and more.

Continue

MLOps.community

Introduction and Animesh Singh's Background

The Success of LLMs and Generative AI at LinkedIn

Challenges and Costs Associated with LLMs and Inferencing

LLMs vs. Traditional Recommendation Ranking Models

Optimizing GPU Utilization and Addressing Infrastructure Challenges

Memory Optimization and Checkpoint Improvements

Bridging the Gap Between Traditional ML and LLMs at LinkedIn

Efficient GPU infrastructure at LinkedIn // Animesh Singh // MLOps Podcast #299

MLOps.community

00:00Introduction and Animesh Singh's Background

Introduction and Animesh Singh's Background

02:26The Success of LLMs and Generative AI at LinkedIn

The Success of LLMs and Generative AI at LinkedIn

07:14Challenges and Costs Associated with LLMs and Inferencing

Challenges and Costs Associated with LLMs and Inferencing

14:03LLMs vs. Traditional Recommendation Ranking Models

LLMs vs. Traditional Recommendation Ranking Models

21:00Optimizing GPU Utilization and Addressing Infrastructure Challenges

Optimizing GPU Utilization and Addressing Infrastructure Challenges

34:00Memory Optimization and Checkpoint Improvements

Memory Optimization and Checkpoint Improvements

42:25Bridging the Gap Between Traditional ML and LLMs at LinkedIn

Bridging the Gap Between Traditional ML and LLMs at LinkedIn