Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accura...

Teljes leírás

Elmentve itt :
Bibliográfiai részletek
Szerzők: Tóth Krisztina
Farkas Richárd
Kocsor András
Testületi szerző: Conference for PhD Students in Computer Science (5.) (2006) (Szeged)
Dokumentumtípus: Cikk
Megjelent: 2008
Sorozat:Acta cybernetica 18 No. 3
Kulcsszavak:Számítástechnika, Kibernetika, Algoritmus
Tárgyszavak:
Online Access:http://acta.bibl.u-szeged.hu/12830
LEADER 01597nab a2200253 i 4500
001 acta12830
005 20220616155334.0
008 161015s2008 hu o 0|| eng d
022 |a 0324-721X 
040 |a SZTE Egyetemi Kiadványok Repozitórium  |b hun 
041 |a eng 
100 1 |a Tóth Krisztina 
245 1 0 |a Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm  |h [elektronikus dokumentum] /  |c  Tóth Krisztina 
260 |c 2008 
300 |a 463-478 
490 0 |a Acta cybernetica  |v 18 No. 3 
520 3 |a We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair. 
650 4 |a Természettudományok 
650 4 |a Számítás- és információtudomány 
695 |a Számítástechnika, Kibernetika, Algoritmus 
700 0 1 |a Farkas Richárd  |e aut 
700 0 1 |a Kocsor András  |e aut 
710 |a Conference for PhD Students in Computer Science (5.) (2006) (Szeged) 
856 4 0 |u http://acta.bibl.u-szeged.hu/12830/1/Toth_2008_ActaCybernetica.pdf  |z Dokumentum-elérés