Debian Science Project
Summary
Linguistics
Debian Science Linguistics packages

This metapackage is part of the Debian Pure Blend "Debian Science" and installs packages related to Linguistics.

The list to the right includes various software projects which are of some interest to the Debian Science Project. Currently, only a few of them are available as Debian packages. It is our goal, however, to include all software in Debian Science which can sensibly add to a high quality Debian Pure Blend.

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Science to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Science mailing list

Links to other tasks

Debian Science Linguistics packages

Official Debian packages with high relevance

Apertium
Shallow-transfer machine translation engine
Versions of package apertium
ReleaseVersionArchitectures
sid3.1.0-1.1s390,alpha,amd64,armel,hppa,hurd-i386,i386,ia64,mips,mipsel,powerpc,sparc
squeeze3.1.0-1.1sparc,powerpc,ia64,i386,hppa,s390,armel,amd64,mipsel,mips
lenny3.0.7+1-2~lenny2+b1amd64
lenny3.0.7+1-2~lenny2armel,i386,sparc,hppa,ia64,s390,mips,arm,powerpc,alpha,mipsel
etch1.0.3-3ia64,arm,s390,mips,mipsel,amd64,sparc,hppa,alpha,i386
Debtags of package apertium:
fieldlinguistics
roleprogram
Popcon: 32 users (9 upd.)*
Versions and Archs
Debtags
License: DFSG free

An open-source shallow-transfer machine translation engine, Apertium is initially aimed at related-language pairs.

It uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state based chunking for structural transfer.

The system is largely based upon systems already developed by the Transducens group at the Universitat d'Alacant, such as interNOSTRUM (Spanish-Catalan, http://www.internostrum.com/welcome.php) and Traductor Universia (Spanish-Portuguese, http://traductor.universia.net).

It will be possible to use Apertium to build machine translation systems for a variety of related-language pairs simply providing the linguistic data needed in the right format.

Artha
A handy off-line thesaurus based on WordNet
Versions of package artha
ReleaseVersionArchitectures
sid0.9.1-1mips,armel,mipsel,i386,powerpc,amd64,s390,ia64,sparc,alpha,hppa
squeeze0.9.1-1sparc,amd64,armel,hppa,i386,ia64,mips,mipsel,powerpc,s390
Popcon: 13 users (21 upd.)*
Versions and Archs
License: DFSG free
Svn

Artha is a handy English thesaurus having distinct features like look up on a global hot key press, passive notifications of a selected text's definitions, suggestions for misspelled words, etc. Once launched, it sits on the system tray monitoring for a pre-set hot key combination. When some text is selected on any window and the hotkey is pressed, it pops-up with the word looked-up. Should the user prefer passive notifications over the app. popping-up, this can be done by enabling the notifications option.

Artha is written from scratch in pure C using GTK+, with WordNet as it database corpus. It may be used as an advanced replacement for the proprietary WordWeb in GNU/Linux environments.

Screenshots of package artha
Link-grammar
Carnegie Mellon University's link grammar parser for English
Maintainer: Ken Bloom
Versions of package link-grammar
ReleaseVersionArchitectures
sid4.3.9-2i386,mips,mipsel,hppa,sparc,powerpc,armel,s390,amd64,ia64,alpha
squeeze4.3.9-2armel,mips,sparc,i386,mipsel,hppa,amd64,ia64,s390,powerpc
lenny4.2.5-1s390,alpha,amd64,arm,armel,hppa,i386,ia64,mips,mipsel,powerpc,sparc
sid4.2.5-1hurd-i386
etch4.2.2-4etch1mipsel,amd64,sparc,s390,hppa,mips,alpha,arm,powerpc,ia64,i386
etch-security4.2.2-4etch1sparc,alpha,mips,ia64,arm,i386,s390,hppa,amd64,powerpc,mipsel
Debtags of package link-grammar:
fieldlinguistics
interfacecommandline
roleprogram
usechecking
works-withdictionary
Popcon: 12 users (6 upd.)*
Versions and Archs
Debtags
License: DFSG free

In Selator, D. and Temperly, D. "Parsing English with a Link Grammar" (1991), the authors defined a new formal grammatical system called a "link grammar". A sequence of words is in the language of a link grammar if there is a way to draw "links" between words in such a way that the local requirements of each word are satisfied, the links do not cross, and the words form a connected graph. The authors encoded English grammar into such a system, and wrote this program to parse English using this grammar.

link-grammar can be used for linguistic parsing for information retrieval or extraction from natural language documents. It can also be used as a grammar checker.

This package contains the user-executable binary.

Wordnet
electronic lexical database of English language
Versions of package wordnet
ReleaseVersionArchitectures
sid3.0-21s390,hppa,ia64,mips,i386,alpha
sid3.0-20powerpc,sparc,mipsel
sid3.0-18amd64,armel
squeeze3.0-18ia64,mipsel,i386,hppa,sparc,powerpc,mips,armel,amd64,s390
sid3.0-14hurd-i386
lenny3.0-13s390,alpha,amd64,arm,armel,hppa,i386,ia64,mips,mipsel,powerpc,sparc
etch2.1-4+etch2mipsel,amd64,sparc,s390,hppa,mips,alpha,arm,powerpc,ia64,i386
etch-security2.1-4+etch2sparc,alpha,mips,ia64,arm,i386,s390,hppa,amd64,powerpc,mipsel
Debtags of package wordnet:
fieldlinguistics
interfacex11
roleprogram
scopeapplication
uitoolkittk
usechecking
works-withdictionary
x11application
Popcon: 128 users (125 upd.)*
Versions and Archs
Debtags
License: DFSG free
Svn

WordNet(C) is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.

WordNet was developed by the Cognitive Science Laboratory (http://www.cogsci.princeton.edu/) at Princeton University under the direction of Professor George A. Miller (Principal Investigator).

WordNet is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is inspired by current psycholinguistic and computational theories of human lexical memory.

Binary and manpages of WordNet as well as general manpages.

Screenshots of package wordnet

No known packages available

Wnsqlbuilder
SQL version of WordNet 3.0
License: GPL
Debian package not available

WordNet SQL Builder is a Java utility to generate SQL database from WordNet standard database as released by the WordNet Project (Princeton University)

Features

  • Support for MySql and PostGreSQL.
  • Complete port (however, orphaned morphological forms are dropped, and so are VerbNet/XWordNet data that cannot be linked to WordNet entries).
  • Incremental build support.
  • Retains synset index as primary key allowing easy reference to wordnet original database
  • Includes support for WordNet 3.0
  • Includes support for WordNet 2.0 to 2.1, 2.1 to 3.0, 2.0 to 3.0 sense maps
  • Includes support for VerbNet 2.3
  • Includes support for XWordNet 2.0-1.1
  • Ready-to-use database (see wnsqldatabase package in download section) including
  • WordNet 3.0
  • WordNet 2.0 to 2.1, 2.1 to 3.0, 2.0 to 3.0 sense maps
  • VerbNet 2.3
  • XWordNet 2.0-1.1
  • British National Corpus statistical data (for commonly used-words)
*Popularitycontest results: number of people who use this package regularly (number of people who upgraded this package recently) out of 89590