-
Recent Posts
- Real Time Inferencing Of Deep Learning Models
- AWS Blog In Collaboration With Nvidia – Optimizing Inference For Seq2Seq And Encoder Only Models Using Nvidia GPU And Triton Model Server
- ~30% Compression Of LLM (Flan-T5-Base) With Low Rank Decomposition Of Attention Weight Matrices
- Adapter Based Fine Tuning BART And T5-Flan-XXL For Single Word Spell Correction
- Revamping Dual Encoder Model Architecture: A layered approach to fuse multi-modal features and plug-and-play integration of Encoders
Recent Comments
Archives
Categories
Meta
Category Archives: machine learning
~30% Compression Of LLM (Flan-T5-Base) With Low Rank Decomposition Of Attention Weight Matrices
Colab Link To Reproduce Experiment: LLM Compression Via Low Rank Decomposition.ipynb Context A neural network contains many dense layers which perform matrix multiplication. In the case of Transformers, Attention module has Key, Query, Value and Output matrices (along with the … Continue reading
Posted in Large Language Models, llm, machine learning
Tagged large language model, machine learning
Leave a comment
Neural Ranking Architectures
Glimpses On Implicit/Explicit, Dense/Sparse, Gated/Non Gated, Low Rank And Many More Layered Interactions 101 Ranking Model Architecture Neural ranking models are the most important component in multi stage retrieval and ranking pipeline. Whether it is e-commerce search, ads targeting, music search or … Continue reading
CTR Prediction System – Online Machine Learning
The Anatomy Of Large Scale CTR* Prediction System * With little or no modifications the proposed system design and algorithms can be used for optimizing other metrics like Cost Per Viewable Completion, Cost Per Completion, Cost Per Engagement, Cost etc … Continue reading
Posted in ctr, machine learning, online ads, Uncategorized
3 Comments