Skip to main content
Fig. 3 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 3

From: YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation

Fig. 3

The framework of our proposed YuYin for background music recommendation of e-commerce advertisements. In detail, a WFM fuses emotion features and audio features as music features. Then the extracted features are projected in the common space for multi-task learning. The video \(z_v\), music \(z_m\), and text projections \(z_t\) in the common space are pair-wise cross-matched to compute NCE loss and pass through a weight-shared classifier to get the prediction probabilities \(p_v\), \(p_m\), and \(p_t\), which will be further used to compute cross-entropy loss with the true label as the prediction loss

Back to article page