Effects of emotional speech on forensic voice comparison using deep speaker embeddings

Emotional conditions play a significant role in forensic voice comparison and speaker verification systems. When emotion is present in speech, the verification's performance will deteriorate. In this paper, speaker verification has been investigated and analyzed in the case of emotional speech...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Abed Mohammed Hamzah Sztahó Dávid
Testületi szerző:	Magyar számítógépes nyelvészeti konferencia (19.)
Dokumentumtípus:	Könyv része
Megjelent:	2023
Sorozat:	Magyar Számítógépes Nyelvészeti Konferencia 19
Kulcsszavak:	Nyelvészet - számítógép alkalmazása
Tárgyszavak:	Természettudományok Számítás- és információtudomány
Online Access:	http://acta.bibl.u-szeged.hu/78411


LEADER	01945naa a2200241 i 4500
001	acta78411
005	20230316080450.0
008	230316s2023 hu o 1\|\| eng d
020			\|a 978-963-306-912-7
040			\|a SZTE Egyetemi Kiadványok Repozitórium \|b hun
041			\|a eng
100	1		\|a Abed Mohammed Hamzah
245	1	0	\|a Effects of emotional speech on forensic voice comparison using deep speaker embeddings \|h [elektronikus dokumentum] / \|c Abed Mohammed Hamzah
260			\|c 2023
300			\|a 159-170
490	0		\|a Magyar Számítógépes Nyelvészeti Konferencia \|v 19
520	3		\|a Emotional conditions play a significant role in forensic voice comparison and speaker verification systems. When emotion is present in speech, the verification's performance will deteriorate. In this paper, speaker verification has been investigated and analyzed in the case of emotional speech using metrics evaluating the performance of forensic voice comparison using pre-trained speaker embedding models: x-vector and ECAPA-TDNN for embedded feature extraction. This study investigates whether emotional content affects the forensic voice comparison and verification performance evaluated on a Hungarian speech dataset. The speaker verification performance has been assessed using the likelihood-ratio framework using Cllr and Cllrmin and Equal Error Rate. The ECAPATDNN achieved higher performance than the x-vector. In the same emotion scenario, the best EERs were 2.6% and 7.7% for ECAPA-TDNN and x-vector. Both models are sensitive to the emotional content of the speech samples.
650		4	\|a Természettudományok
650		4	\|a Számítás- és információtudomány
695			\|a Nyelvészet - számítógép alkalmazása
700	0	1	\|a Sztahó Dávid \|e aut
711			\|a Magyar számítógépes nyelvészeti konferencia (19.) \|c Szeged \|d 2023. január 26-27.
856	4	0	\|u http://acta.bibl.u-szeged.hu/78411/1/msznykonf_019_159-170..pdf \|z Dokumentum-elérés

Effects of emotional speech on forensic voice comparison using deep speaker embeddings

Hasonló tételek