From: Channel and temporal-frequency attention UNet for monaural speech enhancement
Layer name | Input size | Hyperparameters | Output size |
---|---|---|---|
conv2d-1 | \(C \times F \times L\) | (3,3),(1,1) | \(C \times F \times L\) |
conv2d-2 | \(C \times F \times L\) | (3,3),(1,1) | \(C \times F \times L\) |
avg pooling2d | \(C \times F \times L\) | - | \(C \times 1 \times 1\) |
conv2d-3 | \(C \times 1 \times 1\) | (1,1),(1,1) | \(C/4 \times 1 \times 1\) |
conv2d-4 | \(C/4 \times 1 \times 1\) | (1,1),(1,1) | \(C \times 1 \times 1\) |