Articles
Page 1 of 11
-
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:21
-
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancem...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:20 -
Multi-rate modulation encoding via unsupervised learning for audio event detection
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:19 -
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic sp...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:18 -
Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks
Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to physical space constraints. In this article, we propose a te...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:17 -
An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment
Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:16 -
Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge
The vast amount of information stored in audio repositories makes necessary the development of efficient and automatic methods to search on audio content. In that direction, search on speech (SoS) has received...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:15 -
Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning
Analyzing songs is a problem that is being investigated to aid various operations on music access platforms. At the beginning of these problems is the identification of the person who sings the song. In this s...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:14 -
Sound field reconstruction using neural processes with dynamic kernels
Accurately representing the sound field with high spatial resolution is crucial for immersive and interactive sound field reproduction technology. In recent studies, there has been a notable emphasis on effici...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:13 -
Automatic classification of the physical surface in sound uroflowmetry using machine learning methods
This work constitutes the first approach for automatically classifying the surface that the voiding flow impacts in non-invasive sound uroflowmetry tests using machine learning. Often, the voiding flow impacts...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:12 -
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of excep...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:11 -
Vulnerability issues in Automatic Speaker Verification (ASV) systems
Claimed identities of speakers can be verified by means of automatic speaker verification (ASV) systems, also known as voice biometric systems. Focusing on security and robustness against spoofing attacks on A...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:10 -
Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling
Audio effects are an ubiquitous tool in music production due to the interesting ways in which they can shape the sound of music. Guitar effects, the subset of all audio effects focusing on guitar signals, are ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:9 -
Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement
Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a t...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:8 -
Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music
Chinese traditional music, a vital expression of Chinese cultural heritage, possesses both a profound emotional resonance and artistic allure. This study sets forth to refine and analyze the acoustical feature...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:7 -
Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder
Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:6 -
Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:5 -
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density
Melody harmonization, which involves generating a chord progression that complements a user-provided melody, continues to pose a significant challenge. A chord progression must not only be in harmony with the ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:4 -
Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control
Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such a...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:3 -
Significance of relative phase features for shouted and normal speech classification
Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are d...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:2 -
Deep semantic learning for acoustic scene classification
Acoustic scene classification (ASC) is the process of identifying the acoustic environment or scene from which an audio signal is recorded. In this work, we propose an encoder-decoder-based approach to ASC, wh...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:1 -
Online distributed waveform-synchronization for acoustic sensor networks with dynamic topology
Acoustic sensing by multiple devices connected in a wireless acoustic sensor network (WASN) creates new opportunities for multichannel signal processing. However, the autonomy of agents in such a network still...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:55 -
Signal processing and machine learning for speech and audio in acoustic sensor networks
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:54 -
Lightweight target speaker separation network based on joint training
Target speaker separation aims to separate the speech components of the target speaker from mixed speech and remove extraneous components such as noise. In recent years, deep learning-based speech separation m...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:53 -
Piano score rearrangement into multiple difficulty levels via notation-to-notation approach
Musical score rearrangement is an emerging area in symbolic music processing, which aims to transform a musical score into a different style. This study focuses on the task of changing the playing difficulty o...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:52 -
Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model
The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as a...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:51 -
Effective acoustic parameters for automatic classification of performed and synthesized Guzheng music
This study focuses on exploring the acoustic differences between synthesized Guzheng pieces and real Guzheng performances, with the aim of improving the quality of synthesized Guzheng music. A dataset with con...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:50 -
Predominant audio source separation in polyphonic music
Predominant source separation is the separation of one or more desired predominant signals, such as voice or leading instruments, from polyphonic music. The proposed work uses time-frequency filtering on predo...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:49 -
A survey of technologies for automatic Dysarthric speech recognition
Speakers with dysarthria often struggle to accurately pronounce words and effectively communicate with others. Automatic speech recognition (ASR) is a powerful tool for extracting the content from speakers wit...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:48 -
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
This article presents the research work on improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. The speech recognition system is b...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:47 -
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios
Speaker embeddings, from the ECAPA-TDNN speaker verification network, were recently introduced as features for the task of clustering microphones in ad hoc arrays. Our previous work demonstrated that, in compa...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:46 -
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision
Non-parallel data voice conversion (VC) has achieved considerable breakthroughs due to self-supervised pre-trained representation (SSPR) being used in recent years. Features extracted by the pre-trained model ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:45 -
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which mak...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:44 -
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy
Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:43 -
Transformer-based autoencoder with ID constraint for unsupervised anomalous sound detection
Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:42 -
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each inte...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:41 -
Speech emotion recognition based on Graph-LSTM neural network
Currently, Graph Neural Networks have been extended to the field of speech signal processing. It is the more compact and flexible way to represent speech sequences by graphs. However, the structures of the rel...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:40 -
An acoustic echo canceller optimized for hands-free speech telecommunication in large vehicle cabins
Acoustic echo cancelation (AEC) is a system identification problem that has been addressed by various techniques and most commonly by normalized least mean square (NLMS) adaptive algorithms. However, performin...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:39 -
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization
In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:38 -
Cascade algorithms for combined acoustic feedback cancelation and noise reduction
This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelatio...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:37 -
Learning-based robust speaker counting and separation with the aid of spatial coherence
A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whiten...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:36 -
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization
In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, i...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:35 -
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study
Traffic congestion can lead to negative driving emotions, significantly increasing the likelihood of traffic accidents. Reducing negative driving emotions as a means to mitigate speeding, reckless overtaking, ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:34 -
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning
Speaker recognition, the process of automatically identifying a speaker based on individual characteristics in speech signals, presents significant challenges when addressing heterogeneous-domain conditions. F...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:33 -
Dual input neural networks for positional sound source localization
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) a...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:32 -
Training audio transformers for cover song identification
In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitati...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:31 -
Channel and temporal-frequency attention UNet for monaural speech enhancement
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancemen...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:30 -
Microphone utility estimation in acoustic sensor networks using single-channel signal features
In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computation...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:29 -
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
Personalized voice triggering is a key technology in voice assistants and serves as the first step for users to activate the voice assistant. Personalized voice triggering involves keyword spotting (KWS) and s...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:28 -
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of acti...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:27
Follow
Who reads the journal?
Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide
Affiliated with
Annual Journal Metrics
-
2022 Citation Impact
2.4 - 2-year Impact Factor
2.0 - 5-year Impact Factor
1.081 - SNIP (Source Normalized Impact per Paper)
0.458 - SJR (SCImago Journal Rank)2023 Speed
17 days submission to first editorial decision for all manuscripts (Median)
154 days submission to accept (Median)2023 Usage
368,607 downloads
70 Altmetric mentions
Funding your APC
- ISSN: 1687-4722 (electronic)