Dmitry Vyukov discusses latency profiling, contrasting it with traditional CPU profiling and highlighting its importance for optimizing real-time performance. He introduces a new latency profiler integrated into the Linux perf tool, which samples wall clock time rather than CPU time to identify bottlenecks that impact latency. Using a TensorFlow optimization story, Vyukov illustrates how the latency profiler revealed that Python code, previously considered negligible, was a significant latency contributor. He also explains how the profiler's parallelism histogram helps identify mis-parallelization opportunities, leading to further performance improvements. The discussion covers the tool's availability in Linux 6.15 and its potential applications in various performance optimization scenarios, including machine learning and game development. Questions from the audience further explore practical applications and comparisons to other profiling techniques.
Sign in to continue reading, translating and more.
Continue