
The discussion centers on building and optimizing AI infrastructure, particularly within the HR tech space. It highlights the importance of balancing cost, performance, latency, throughput, and accuracy when deploying AI solutions. The guest details their experience in transitioning into AI leadership, emphasizing continuous learning and adaptation. They share strategies for managing AI costs, including scheduled and dynamic scaling of GPUs based on traffic patterns. The conversation also covers techniques for reducing cold start times, such as using faster storage and embedding models in container images, as well as leveraging tools like TensorRT LLM to cut latency. The guest touches upon the shift in customer attitudes towards AI, from initial skepticism to actively seeking AI integration and the challenges of ensuring responsible AI.
Sign in to continue reading, translating and more.
Continue