Skip to main content

Table 4 Architecture of RCAM

From: Channel and temporal-frequency attention UNet for monaural speech enhancement

Layer name

Input size

Hyperparameters

Output size

conv2d-1

\(C \times F \times L\)

(3,3),(1,1)

\(C \times F \times L\)

conv2d-2

\(C \times F \times L\)

(3,3),(1,1)

\(C \times F \times L\)

avg pooling2d

\(C \times F \times L\)

-

\(C \times 1 \times 1\)

conv2d-3

\(C \times 1 \times 1\)

(1,1),(1,1)

\(C/4 \times 1 \times 1\)

conv2d-4

\(C/4 \times 1 \times 1\)

(1,1),(1,1)

\(C \times 1 \times 1\)