Aim of the Project
  • To build bidirectional English-IL and IL-IL machine translation (IL-ILMT) systems for English<->Hindi,English<->Telugu and 9 Indian language pairs (Hindi <-> Punjabi, Telugu, Urdu, Gujarati, Kannada, Odia, Kashmiri, Sindhi and Dogri) which can be easily integrated into a Speech to Speech Machine Translation System (SSMT) pipeline .
  • Domain adapt the systems for the following domains and their subdomains : Governance; Educational Content in the fields of Science and Technology (Biology, Chemistry, Physics, Environmental Science, Computer Science Engineering, Electrical Engineering,Mechanical Engineering), Law, Economics, Management; Health Care (Consent Forms and Information Sheets, Awareness and Pharma); Judiciary (Case Files); Agriculture and Food Security
Corpora Development
  • Parallel Corpora Development: We will develop 800k parallel corpora in 11 language pairs in the following two domains in the project
    • Governance
    • Health Care
      • - Consent Forms and Information Sheet
      • - Awareness and Pharma
  • The corpora will be developed using the following 3 methods.
    • End-to-End (E2E) Translation -
    • Back Translation -
    • Collecting existing corpora and cleaning
  • Annotated Corpora Development
    • We will develop 198k annotated corpora (POS, CHunk, Morph, NER) across 11 languages.
  • Domain Dictionary Development
    • We will develop domain dictionaries for 11 languages..
Evaluation methods and Benchmarks
  • To evaluate Machine Translation systems, the project, in addition to making use of existing metrics, will propose novel metrics, methods, and tools, wherever the current methodologies are found to be lacking.
  • Benchmark data Development
    • In this project, we work towards jotting down guidelines (considering language properties along with best practices) for preparing benchmark data-set for machine translation and its components.
Productization and deployment
  • The productization will be carried out by an engineering team set up specially to translate the lab technologies into products and also develop related technical solutions. The Academic Institutes will develop the core technology. For the field usable systems/modules, the engineering team will engage with a large number of Startups and other agencies to make the technology ready for use and also provide the required services in the field using the developed technology. The engineering team will connect research with markets, and startups with research..
Workshops/Training
  • One of the objectives of the project is to take the systems to the field where these will actually be used users Along with it,the goal is to reduce human effort in the task of translation with the support of technology. In order to do that, one has to familiarize the respective language experts community with the developed tools and technologies. Therefore, the project proposes to have multiple workshops for language experts (translators) and technology developers. For such workshops, guidelines, handouts and respective educational content will be created and distributed.
Language Technologies Research Centre (LTRC)
1st Floor, Kohli Center on Intelligent Systems (KCIS)
International Institute of Information Technology, Hyderabad
Gachibowli, Hyderabad, Telangana - 500 032
India
  ltrcoffice1@iiit.ac.in

  +9140-6653 1581

  +9140-6653 1413
2022 © Copyright ltrc. All Rights Reserved