-
Recent Posts
- ~30% Compression Of LLM (Flan-T5-Base) With Low Rank Decomposition Of Attention Weight Matrices
- Adapter Based Fine Tuning BART And T5-Flan-XXL For Single Word Spell Correction
- Revamping Dual Encoder Model Architecture: A layered approach to fuse multi-modal features and plug-and-play integration of Encoders
- Summary Of Adapter Based Performance Efficient Fine Tuning (PEFT) Techniques For Large Language Models
- Neural Ranking Architectures
Recent Comments
Archives
Categories
Meta
Category Archives: Large Language Models
~30% Compression Of LLM (Flan-T5-Base) With Low Rank Decomposition Of Attention Weight Matrices
Colab Link To Reproduce Experiment: LLM Compression Via Low Rank Decomposition.ipynb Context A neural network contains many dense layers which perform matrix multiplication. In the case of Transformers, Attention module has Key, Query, Value and Output matrices (along with the … Continue reading
Posted in Large Language Models, llm, machine learning
Tagged large language model, machine learning
Leave a comment