Batch size | 16 |
Dropout | 0.2 |
Adam optimizer \(\beta _1\) | 0.9 |
Adam optimizer \(\beta _2\) | 0.98 |
Adam optimizer \(\epsilon\) | \(10^{-9}\) |
Learning rate annealing steps (alignment) | 1000 |
Learning rate annealing rate (alignment) | 0.5 |
Gradient clipping threshold | 1.0 |
Warm up steps (alignment) | 1000 |
Warm up steps (synthesis) | 4000 |
Training steps (alignment) | 10,000 |
Training steps (synthesis) | 100,000 |