seventh framework
  • Castellano
  • Français
  • English
  • Deutsch
  • Italiano
  • Ελληνικά

PANACEA Annotated Italian Environment Corpus v.2

PANACEA Annotated Italian Environment Corpus Version 2 consists of Italian texts in the Environment (ENV) domain that were collected and automatically annotated in the framework of PANACEA, an EU-FP7 Funded Project under Grant Agreement 248064. The texts were crawled web pages that were automatically detected to be in the Italian language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011. The automatically assigned annotations deal with sentence and token segmentation, POS and lemma, dependency relations and named entities.

Size information:

  • tokens: 36 million
  • sentences: 1,431,914

  • Download location

    DISCLAIMER: “The right to use the sentences contained in this data set has been granted by their copyright holders. This usage is exclusive for research purposes and no profit can be made out of it. We are grateful to all sources for their kind and generous contribution. For further information on these sources, please see: Acknowledgements

    This resource is distributed under the following licence: CC-BY-SA