Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence › Deep Learning
TensorRT
feature · part of Deep Learning
TensorRT is NVIDIA's high-performance deep learning inference optimizer and runtime library, designed to deliver low latency and high throughput for production deployments. It works by taking trained models from frameworks like TensorFlow, PyTorch, or ONNX and applying optimizations such as layer fusion, precision calibration (FP16, INT8, INT4), kernel auto-tuning, and memory reuse. These optimizations are hardware-specific, targeting NVIDIA GPUs to maximize utilization of tensor cores and other architecture features. TensorRT is critical for real-time applications like autonomous driving, video analytics, and cloud inference where latency and throughput are paramount. As a standalone feature, it serves as the bridge between trained models and efficient GPU execution, enabling deployment without the overhead of full training frameworks.
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.