Welcome to PANACEA

Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition
of Language Resources for Human Language Technologies

A strategic challenge for Europe in today´s globalized economy is to overcome language barriers through technological means. In particular, Machine Translation (MT) systems are expected to have a significant impact on the management of multilingualism in Europe, making it possible to translate the huge quantity of (written or oral) data produced, and thus, covering the needs of hundreds of millions of citizens.

PANACEA is addressing the most critical aspect for MT: the so-called language-resource bottleneck. Although MT technologies may consist of language-independent engines, they depend on the availability of language-dependent knowledge for their real-life implementation, i.e., they require Language Resources. In order to supply MT every pair of European languages, to every domain, and to every text genre, appropriate language resources covering all these aspects must be found, processed and supplied to MT developers. These should be provided in the format and with the information demanded by their systems. At present, this is mostly done by hand. Moreover, a language resource for a given language can never be considered complete nor final because of the characteristics of natural language: language changes and the emergence of new knowledge domains and new language varieties. What is needed is an automatic, dynamic and adaptive system for compiling, producing and validating language resources, a system conceived as integrated machinery for the production of LRs.

The objective of PANACEA is to build a factory of language resources that automates the stages involved in the acquisition, production, updating and maintenance of language resources required by MT systems, and by other applications based on Language Technologies, and in the time required. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time are the only way to guarantee the continuous supply of language resources that Machine Translation and other Language Technologies will be demanding in the multilingual Europe.

The automatic production of a large number of LRs for MT and other Language Technologies through the use of advanced components for the acquisition and normalization of corpora, monolingual and parallel corpora, the alignment of parallel corpora; the derivation of bilingual dictionaries out of subsententially aligned corpora; and the production of rich information monolingual lexica using corpus based automatic methods.

In order to achieve this objective, PANACEA will work in the following areas.

  1. the creation of a platform, which will be designed as a dedicated workflow manager, for the composition of a number of processes for LR production, based on combinations of different web services.
  2. the evaluation of the platform and the LR production chain within the framework of both R&D and industrial settings.
  3. the proposal of solutions for sorting out legal and IPR issues for making the produced LRs available: clear guidelines for addressing legal issues, model agreements for providers and users, etc.

