From: Predominant audio source separation in polyphonic music
Input size | Description |
---|---|
3 × 256 × 256 | Input spectrogram |
64 × 256 × 256 | 7 × 7 Conv, 64 filters, stride 1, pad 3 |
64 × 256 × 256 | Instance normalization |
64 × 256 × 256 | ReLU |
128 × 128 × 128 | 3 × 3 Conv, 128 filters, stride 2, pad 1 |
128 × 128 × 128 | Instance normalization |
128 × 128 × 128 | ReLU |
256 × 64 × 64 | 3 × 3 Conv, 256 filters, stride 2, pad 1 |
256 × 64 × 64 | Instance normalization |
256 × 64 × 64 | ReLU |
256 × 64 × 64 | 9 consecutive Resnet blocks, 256 filters |
128 × 128 × 128 | 3 × 3 Conv, 128 filters, stride 2, pad 1 |
128 × 128 × 128 | Instance normalization |
128 × 128 × 128 | ReLU |
64 × 256 × 256 | 3 × 3 Conv, 64 filters, stride 1, pad 3 |
64 × 256 × 256 | Instance normalization |
64 × 256 × 256 | ReLU |
3 × 256 × 256 | 7 × 7 Conv, stride 1, pad 3 |
3 × 256 × 256 | Instance normalization |
3 × 256 × 256 | Tanh |