Perplexity (lower is better) of Neural Language Models
Exclude scores of methods using test data statistics such as cache mechanism
updated at 2018/9/28
Penn Treebank (PTB)
Publication | Model | Parameters | Valid | Test |
---|---|---|---|---|
Mikolov and Zweig ‘12 | RNN | - | 124.7 | |
Mikolov and Zweig ‘12 | RNN + LDA + Kneser-Ney smoothing | - | 98.3 | |
Zaremba et al. ‘14 | LSTM (medium) | 20M | 86.2 | 82.7 |
Gal and Ghahramani. ‘16 | Variational LSTM (medium) + Word Tying | 20M | 81.8 ± 0.2 | 79.2 ± 0.1 |
Kim et al. ‘16 | CharCNN + LSTM | 19M | - | 78.9 |
Zaremba et al. ‘14 | LSTM (large) | 66M | 82.2 | 78.4 |
Gal and Ghahramani. ‘16 | Variational LSTM (large) + Word Tying | 66M | 77.3 ± 0.2 | 75.0 ± 0.1 |
Gal and Ghahramani. ‘16 | Variational LSTM (large) + Word Tying + MC dropout | 66M | - | 73.4 ± 0.0 |
Zaremba et al. ‘14 | LSTM (large) Ensemble | 2.5G | 71.9 | 68.7 |
Zilly et al. ‘17 | Variational RHN + Word Tying | 23M | 67.9 | 65.4 |
Takase et al. ‘17 | Variational RHN + Word Tying + IOG | 29M | 67.0 | 64.4 |
Zoph and Le ‘17 | Neural Architecture Search + Word Tying | 54M | - | 62.4 |
Takase et al. ‘17 | Variational RHN + IOG Ensemble | 326M | 64.1 | 61.4 |
Melis et al. ‘18 | LSTM with skip connections | 24M | 60.9 | 58.3 |
Merity et al ‘18 | AWD-LSTM | 24M | 60.0 | 57.3 |
Yang et al ‘18 | AWD-LSTM-MoS | 22M | 56.54 | 54.44 |
Gong et al ‘18 | AWD-LSTM-MoS + FRAGE | 22M | 55.52 | 53.31 |
Takase et al ‘18 | AWD-LSTM-DOC | 23M | 54.12 | 52.38 |
Takase et al ‘18 | AWD-LSTM-DOC Ensemble | 114M | 48.63 | 47.17 |