Maximizing Efficiency of Language Model Pre-training for Learning Representation.

Junmo Kang,Suwon Shin,Jeonghwan Kim,Jae Young Jo,Sung-Hyon Myaeng

Maximizing Efficiency of Language Model Pre-training for Learning Representation.

2021

Junmo Kang
Suwon Shin
Jeonghwan Kim
Jae Young Jo
Sung-Hyon Myaeng

Pre-trained language models in the past years have shown exponential growth in model parameters and compute time. ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models (e.g. BERT) based on masked language modeling (MLM) by addressing the sample inefficiency problem with the replaced token detection (RTD) task. Our work proposes adaptive early exit strategy to maximize the efficiency of the pre-training process by relieving the model's subsequent layers of the need to process latent features by leveraging earlier layer representations. Moreover, we evaluate an initial approach to the problem that has not succeeded in maintaining the accuracy of the model while showing a promising compute efficiency by thoroughly investigating the necessity of the generator module of ELECTRA.

Keywords:

Process (engineering)
Sample (statistics)
Security token
Machine learning
task
Computer science
Representation (mathematics)
Language model
Inefficiency
Artificial intelligence
generator

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations