Media Summary: Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Speaker: Maksim Khadkevich, Sr. Software Engineering Manager, Dynamo, NVIDIA Khadkevich discusses data center scale ...

43 Llm Inference Optimization - Detailed Analysis & Overview

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Speaker: Maksim Khadkevich, Sr. Software Engineering Manager, Dynamo, NVIDIA Khadkevich discusses data center scale ... Discover a simple method to calculate GPU memory requirements for large language models like Llama 70B. Learn how the ... ... training cost so why do we focus on the Don't miss out! Join us at our next KubeCon + CloudNativeCon events in Mumbai, India (18-19 June, 2026), Yokohama, Japan ...

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Did you know that 90% of ML models never make it into production? Even among the few that do, many face critical challenges ... Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of

Photo Gallery

43 - LLM Inference Optimization
Deep Dive: Optimizing LLM inference
Optimizing LLM Inference Requests
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
Faster LLMs: Accelerate Inference with Speculative Decoding
Improving LLM Throughput via Data Center-Scale Inference Optimizations
How Much GPU Memory is Needed for LLM Inference?
LLM inference optimization: Architecture, KV cache and Flash attention
AI Optimization Lecture 01 -  Prefill vs Decode - Mastering LLM Techniques from NVIDIA
Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft
Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
Scheduling Impacts on LLM Inference
Sponsored
Sponsored
View Detailed Profile
Sponsored
Sponsored