Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm
We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accura...
Elmentve itt :
| Szerzők: | |
|---|---|
| Testületi szerző: | |
| Dokumentumtípus: | Cikk |
| Megjelent: |
2008
|
| Sorozat: | Acta cybernetica
18 No. 3 |
| Kulcsszavak: | Számítástechnika, Kibernetika, Algoritmus |
| Tárgyszavak: | |
| Online Access: | http://acta.bibl.u-szeged.hu/12830 |
| Tartalmi kivonat: | We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair. |
|---|---|
| Terjedelem/Fizikai jellemzők: | 463-478 |
| ISSN: | 0324-721X |