Monthly Archives: November 2023

AWS Blog In Collaboration With Nvidia – Optimizing Inference For Seq2Seq And Encoder Only Models Using Nvidia GPU And Triton Model Server

Blurb: Deep Learning Transformer models are complex in architecture and can have hundreds of millions (or even billions) of parameters, which leads to slow real time inference. Real time low latency inference of Deep Learning models is a critical requirement … Continue reading

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment