Hu and Singh 2021 - Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
Paper for: hu21_trans_is_all_you_need
Train 7 tasks – some of them language-vision, some of them language only – at the same time using one model. Language goes through a language encoder. Images go through an image encoder. There is a separate head for each task on the decoder.
See Transformer
Cites carion20_end_to_end_objec_detec_with_trans (see Carion et al 2020 - End-to-End Object Detection with Transformers)
1 things to look up:
- warm-up cosine learning rate for Adam optimizer
2 bib
Bibliography
- [hu21_trans_is_all_you_need] Hu & Singh, Transformer Is All You Need: Multimodal Multitask Learning With a Unified Transformer, CoRR, (2021). link.
- [carion20_end_to_end_objec_detec_with_trans] Carion, Massa, Synnaeve, Gabriel, Usunier, Kirillov, & Zagoruyko, End-To-End Object Detection With Transformers, CoRR, (2020). link.