Hu and Singh 2021 - Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Paper for: hu21_trans_is_all_you_need

Train 7 tasks – some of them language-vision, some of them language only – at the same time using one model. Language goes through a language encoder. Images go through an image encoder. There is a separate head for each task on the decoder.

See Transformer

Cites carion20_end_to_end_objec_detec_with_trans (see Carion et al 2020 - End-to-End Object Detection with Transformers)

1 things to look up:

warm-up cosine learning rate for Adam optimizer

2 bib

Bibliography

[hu21_trans_is_all_you_need] Hu & Singh, Transformer Is All You Need: Multimodal Multitask Learning With a Unified Transformer, CoRR, (2021). link.
[carion20_end_to_end_objec_detec_with_trans] Carion, Massa, Synnaeve, Gabriel, Usunier, Kirillov, & Zagoruyko, End-To-End Object Detection With Transformers, CoRR, (2020). link.