Outils


This page is only available in English


The following tools were used, enhanced or created for this project.
  • NATools
    NATools is a set of tools for processing parallel corpora. It includes a sentence aligner, an extractor of probabilistic translation dictionaries, a word aligner and a host of other tools to study the alignment of parallel corpora.

    During the course of Per-fide, NATools was improved to extract PTDs efficiently in large corpora, and other tools were added:

    • Lingua::PTD: tools for handling translation dictionaries;
    • Lingua::PTD::More: tools for extracting resources such as UMTS from PTDs.
    • Math::KullbackLeibler::Discrete: a module developed for the comparison of distributions in probabilistic translation dictionaries based on the Kullback-Leibler algorithm.
  • Open Corpus Workbench
    The IMS Open Corpus Workbench (CWB) is a collection of free tools for managing and querying large corpora (with dimensions of the order of 10 million to 2 billion words) with linguistic annotations.

    As for Open-CWB, the following tools were developed:

    • XML::TMX::CWB: a tool for the direct incorporation of translation memories in the Open-CWB system;
    • CWB::CQP::More: a high-level interface for the Open-CWB Perl modules;
    • POSIX::Open3: a module developed to allow the use of OpenCWB via web pages, particularly in the Dancer framework.
  • JSpell
    Jspell is a morphological analyzer derived from the ispell spell checker. (Jspell = + + ispell). It has been adapted for use in the Portuguese language. However, there are dictionaries for other languages.
  • Freeling3
    Freeling is a template library developed by Lluís Padró for the (lexical and syntactic) processing of several languages, including all the languages of the Per-Fide project (except German).

    During the life span of Per-Fide, the following tools were created:

    • Lingua::FreeLing2: an interface to version 2 of Freeling. It was discontinued when FreeLing3 became available.
    • Lingua::FreeLing3: an interface to version 3 of Freeling using Perl language.
    • Lingua::FreeLing3::Utils: a set of features and utilities implemented on Lingua::FreeLing3.
  • XML::TMX
    a Perl library for handling translation memories. It includes tools for tokenization and tagging of corpora using the Lingua::FreeLing3 library..
    • XML::DT::Sequence: a system was implemented to process large XML files based on item repetition.
  • TreeTagger
    TreeTagger is a well-known morphosyntactic tagger. It was used because FreeLing3 does not support the German language. In the scope of Per-Fide, the following modules were developed:

    • Lingua::TreeTagger::Installer: a tool for automating the installation of TreeTagger as well as the language models.
    • Lingua::TreeTagger: although it was not developed as part of Per-Fide, project members have been involved in improving the tool.
  • Lingua::Identify::CLD
    A Perl interface and compiling system were developed for Chrome Language Detection Library, which was created by Google.