参考文献
本文档中关于大语言模型的精度调优经验主要来自于以下文献:
- GLM-130B: AN OPEN BILINGUAL PRE-TRAINEDMODEL
- OPT: Open Pre-trained Transformer Language Models
- PaLM: Scaling Language Modeling with Pathways
- OLMo: Accelerating the Science of Language Models
- Train With Mixed Precision-NVIDIA DOCS HUB
- OLMo-7B wandb training metrics
- A Theory on Adam Instability in Large-Scale Machine Learning
- Understanding and Mitigating Hardware Failures in Deep Learning Training Systems
父主题: 精度调试