Skip to the content.

Perplexity (lower is better) of Neural Language Models

Exclude scores of methods using test data statistics such as cache mechanism

updated at 2018/9/28

Penn Treebank (PTB)

Publication Model Parameters Valid Test
Mikolov and Zweig ‘12 RNN - 124.7  
Mikolov and Zweig ‘12 RNN + LDA + Kneser-Ney smoothing - 98.3  
Zaremba et al. ‘14 LSTM (medium) 20M 86.2 82.7
Gal and Ghahramani. ‘16 Variational LSTM (medium) + Word Tying 20M 81.8 ± 0.2 79.2 ± 0.1
Kim et al. ‘16 CharCNN + LSTM 19M - 78.9
Zaremba et al. ‘14 LSTM (large) 66M 82.2 78.4
Gal and Ghahramani. ‘16 Variational LSTM (large) + Word Tying 66M 77.3 ± 0.2 75.0 ± 0.1
Gal and Ghahramani. ‘16 Variational LSTM (large) + Word Tying + MC dropout 66M - 73.4 ± 0.0
Zaremba et al. ‘14 LSTM (large) Ensemble 2.5G 71.9 68.7
Zilly et al. ‘17 Variational RHN + Word Tying 23M 67.9 65.4
Takase et al. ‘17 Variational RHN + Word Tying + IOG 29M 67.0 64.4
Zoph and Le ‘17 Neural Architecture Search + Word Tying 54M - 62.4
Takase et al. ‘17 Variational RHN + IOG Ensemble 326M 64.1 61.4
Melis et al. ‘18 LSTM with skip connections 24M 60.9 58.3
Merity et al ‘18 AWD-LSTM 24M 60.0 57.3
Yang et al ‘18 AWD-LSTM-MoS 22M 56.54 54.44
Gong et al ‘18 AWD-LSTM-MoS + FRAGE 22M 55.52 53.31
Takase et al ‘18 AWD-LSTM-DOC 23M 54.12 52.38
Takase et al ‘18 AWD-LSTM-DOC Ensemble 114M 48.63 47.17