Elhuyar Fundazioaren logoa

Elhuyar Fundazioa - Language Services 

Go to top of page

R&D

Corpus tools

Aims and general description

The Science and Technology Corpus Project revealed how laborious it is to obtain a corpus by collecting texts, and that the growing international trend is to obtain these texts through the Internet. As a service that allows the web to be consulted as if it were a huge Basque corpus, CorpEus was developed in line with this strategy. The use of corpora in language technologies is becoming more and more widespread, and that is why suitable corpora need to be obtained in the short term.

The main aim of Co3 is to develop a tool to obtain comparable corpora using the Web as the starting point. Even though this is its general aim, it has more specific intermediate aims:

  • To build specialized corpora
  • To produce Basque corpora

The tools that attempt to obtain corpora of languages with limited resources are oriented towards obtaining the largest corpora possible, and are not designed for obtaining specialized corpora. Moreover, in the few attempts made to obtain specific corpora, languages with a large presence on the Internet have been used, without getting involved in the problems posed by small languages for obtaining corpora with proper size.

As pointed out already, Co3 attempts to build corpora using the Internet as the source. To do this, documents from Internet are gathered, and their suitability is analysed by applying a range of techniques until a corpus of suitable size possible has been obtained.

If this process is used to obtain corpora of different languages but always on the same subject, it is possible to obtain comparable corpora. In this respect, the research that the AzerHitz project is carrying out on techniques to measure the level of comparability of corpora could be very useful.

Spreading

Go to top of page

Services

Go to top of page
Dictionnaire Elhuyar hiztegia euskara-frantsesa / français-basque
22,80€Buy
Euskara-Errusiera / Errusiera-Euskara hiztegia
Euskara-Errusiera / Errusiera-Euskara hiztegia
15,58€Buy
Go to top of page Go to top of page

Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789

webmaster@elhuyar.com

Diseinua: Blanco

Go to top of page