The podcast features William, a computer engineering undergrad, discussing the design and tape out of a tiny TPU chip. William details the chip's hardware architecture, focusing on the systolic array and its implementation within area constraints, and explains the design process using Verilog and open-source CAD tools. He addresses the challenges of limited I/O ports and throughput, detailing a streaming pattern to maximize efficiency. William also covers the implementation of floating-point arithmetic, the verification process, and the integration with PyTorch for machine learning inference, including quantization-aware training. The talk concludes with a discussion of potential future work and career interests.
Sign in to continue reading, translating and more.
Continue