Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accura...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Tóth Krisztina Farkas Richárd Kocsor András
Testületi szerző:	Conference for PhD Students in Computer Science (5.) (2006) (Szeged)
Dokumentumtípus:	Cikk
Megjelent:	2008
Sorozat:	Acta cybernetica 18 No. 3
Kulcsszavak:	Számítástechnika, Kibernetika, Algoritmus
Tárgyszavak:	Természettudományok Számítás- és információtudomány
Online Access:	http://acta.bibl.u-szeged.hu/12830

Leíró adatok
Tartalmi kivonat:	We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair.
Terjedelem/Fizikai jellemzők:	463-478
ISSN:	0324-721X

Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Hasonló tételek