UP | HOME

Prasanna 2020 – When BERT Plays The Lottery, All Tickets Are Winning

Notes for prasanna-etal-2020-bert

1 pruning BERT

  • BERT can be pruned, removing either:
    • the weights with smallest magnitude (call this magnitude pruning)
    • the self attention heads of least importance (call this structural pruning)

2 methods

  • use BERT embeddings for downstream tasks, for which a top layer network is fine-tuned
  • iteratively prune, while checking to see if performance is maintained
  • investigate whether the pruned heads/weights are invariant across:
    • random initializations of the task-specific top layer
    • different tasks

3 Key takeaways from this paper:

  • the heads which survive structural pruning do not seem to encode much linguistic/structural information
  • all the heads which can be pruned contribute about as much as those which were not pruned. This indicates that perhaps they did redundant work
  • points away from a theory that BERT is composed of modules which do individual jobs. points towards a theory that language processing is distributed across BERT in many heads

Bibliography

  • [prasanna-etal-2020-bert] "Prasanna, , Rogers, & Rumshisky, When BERT Plays the Lottery, All Tickets Are Winning, 3208-3229, in in: "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", edited by Association for Computational Linguistics (2020)

Created: 2021-09-14 Tue 21:43