43 Llm Inference Optimization

May 24, 2026

Media Summary: Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Speaker: Maksim Khadkevich, Sr. Software Engineering Manager, Dynamo, NVIDIA Khadkevich discusses data center scale ...

43 Llm Inference Optimization - Detailed Analysis & Overview

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Speaker: Maksim Khadkevich, Sr. Software Engineering Manager, Dynamo, NVIDIA Khadkevich discusses data center scale ... Discover a simple method to calculate GPU memory requirements for large language models like Llama 70B. Learn how the ... ... training cost so why do we focus on the Don't miss out! Join us at our next KubeCon + CloudNativeCon events in Mumbai, India (18-19 June, 2026), Yokohama, Japan ...

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Did you know that 90% of ML models never make it into production? Even among the few that do, many face critical challenges ... Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of