|
‘GyanNidhi’
which stands for ‘Knowledge Resource’
is parallel in 11 Indian languages , a project sponsored
by TDIL, DIT, MC &IT and Government of India.
What it is?The multilingual parallel
text corpus contains the same text translated in more
than one language.
What GyanNidhi contains?GyanNidhi
corpus consists of text in English and 11 Indian languages
(Hindi, Punjabi,Marathi, Bengali, Oriya, Gujarati, Telugu,
Tamil, Kannada, Malayalam, Assamese). It aims to digitise
1 million pages altogether containing at least 50,000
pages in each Indian language and English.
Source for Parallel Corpus
- National Book Trust India
- Sahitya Akademi
- Navjivan Publishing House
- Publications Division
- SABDA, Pondicherry
- Pustak Mahal
Prabandhika: Corpus Manager
Platform
: Windows
Data Encoding
: XML, UNICODE
Portability of Data : Data in
XML format supports various platforms
Applications of Gyan Nidhi
-
Automatic Dictionary extraction
-
Creation of Translation memory
-
Example Based Machine Translation
(EBMT)
-
Language research study and
analysis
-
Language Modeling
|