From: Channel and temporal-frequency attention UNet for monaural speech enhancement
Layer name | Input size | Hyperparameters | Output size |
---|---|---|---|
avg pooling2d-1 | \(C \times F \times L\) | - | \(C \times 1 \times 1\) |
max pooling2d-1 | \(C \times F \times L\) | - | \(C \times 1 \times 1\) |
conv2d-1 | \(C \times 1 \times 1\) | (1,1),(1,1) | \(C \times 1 \times 1\) |
conv2d-2 | \(C \times 1 \times 1\) | (1,1),(1,1) | \(C \times 1 \times 1\) |
avg pooling2d-2 | \(C \times F \times L\) | - | \(1 \times F \times L\) |
max pooling2d-2 | \(C \times F \times L\) | - | \(1 \times F \times L\) |
concatenation | \(2, 1 \times F \times L\) | - | \(2 \times F \times L\) |
conv2d-3 | \(2 \times F \times L\) | (7,7),(1,1) | \(1 \times F \times L\) |