Skip to main content

Table 5 Architecture of CTFSC

From: Channel and temporal-frequency attention UNet for monaural speech enhancement

Layer name

Input size

Hyperparameters

Output size

avg pooling2d-1

\(C \times F \times L\)

-

\(C \times 1 \times 1\)

max pooling2d-1

\(C \times F \times L\)

-

\(C \times 1 \times 1\)

conv2d-1

\(C \times 1 \times 1\)

(1,1),(1,1)

\(C \times 1 \times 1\)

conv2d-2

\(C \times 1 \times 1\)

(1,1),(1,1)

\(C \times 1 \times 1\)

avg pooling2d-2

\(C \times F \times L\)

-

\(1 \times F \times L\)

max pooling2d-2

\(C \times F \times L\)

-

\(1 \times F \times L\)

concatenation

\(2, 1 \times F \times L\)

-

\(2 \times F \times L\)

conv2d-3

\(2 \times F \times L\)

(7,7),(1,1)

\(1 \times F \times L\)