Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm
We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accura...
Elmentve itt :
| Szerzők: | |
|---|---|
| Testületi szerző: | |
| Dokumentumtípus: | Cikk |
| Megjelent: |
2008
|
| Sorozat: | Acta cybernetica
18 No. 3 |
| Kulcsszavak: | Számítástechnika, Kibernetika, Algoritmus |
| Tárgyszavak: | |
| Online Access: | http://acta.bibl.u-szeged.hu/12830 |
| LEADER | 01597nab a2200253 i 4500 | ||
|---|---|---|---|
| 001 | acta12830 | ||
| 005 | 20220616155334.0 | ||
| 008 | 161015s2008 hu o 0|| eng d | ||
| 022 | |a 0324-721X | ||
| 040 | |a SZTE Egyetemi Kiadványok Repozitórium |b hun | ||
| 041 | |a eng | ||
| 100 | 1 | |a Tóth Krisztina | |
| 245 | 1 | 0 | |a Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm |h [elektronikus dokumentum] / |c Tóth Krisztina |
| 260 | |c 2008 | ||
| 300 | |a 463-478 | ||
| 490 | 0 | |a Acta cybernetica |v 18 No. 3 | |
| 520 | 3 | |a We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair. | |
| 650 | 4 | |a Természettudományok | |
| 650 | 4 | |a Számítás- és információtudomány | |
| 695 | |a Számítástechnika, Kibernetika, Algoritmus | ||
| 700 | 0 | 1 | |a Farkas Richárd |e aut |
| 700 | 0 | 1 | |a Kocsor András |e aut |
| 710 | |a Conference for PhD Students in Computer Science (5.) (2006) (Szeged) | ||
| 856 | 4 | 0 | |u http://acta.bibl.u-szeged.hu/12830/1/Toth_2008_ActaCybernetica.pdf |z Dokumentum-elérés |