Transliteration Corpus for wordMint
As we have been working for past sometime on preparation of training corpus, we have come up with a good quality corpus for english to hindi back transliteration which is sentence aligned. The corpus is licensed under Creative Commons Attribute Share-alike India 2.5 License. So you can use/modify/distribute the corpus for any purpose as long as you attribute the work to the wordMint team and keep the freedoms intact.
The corpus is a collection of about 100 songs which are written in romanized hindi and parallel hindi in devnagari.
Click on the download link below to download the complete corpus.
You’re currently reading “Transliteration Corpus for wordMint”, an entry on wordMint
- Published:
- 01.05.09 / 1am
- Tags:
- corpus
- Post Navigation:
- « Algorithm : wordMint
GIZA++ : the wordMint implementation »