System for automatic morphological analysis of Russian

 

Also: automatic morphological generation of Russian, beta version  (see below)

 

 

Go to the personal page of Grigori Sidorov.

Go to the personal page of Alexander Gelbukh.

 

This is a program that performs lemmatization and provides grammar information of each form. See detailed description below.

The system is an EXE file for Windows. DLL is available on request.

LICENSE:

  1. You can use this program freely for academic purposes. No warranty.
  2. You should inform us about the usage of the program, and
  3. You should cite the corresponding paper in your publications obtained with the help of the program (We would be grateful if you inform us about such citations).

 

Paper for citing:

A. Gelbukh, G. Sidorov. Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Computational Linguistics and Intelligent Text Processing (CICLing-2003), Lecture Notes in Computer Science, N 2588, Springer-Verlag, 2003, pp. 215–220; www.cic.ipn.mx/~sidorov/GelbukhSidorovMorphCICLING2003.ps

Download:

Downloading means that you accept the license. Thank you.

 

NEW: Version August 18, 2011. Bug related with line length fixed.

Download the system for analysis

Descripción: D:\..\..\cgi-bin\rcnt2.exe?id=noolab&style=point&color=ffffff&node=NodeNameDetailed description of the system for automatic morphological analysis of Russian:

 

The input file is any standard text file (mind the encoding).

 

Output can be either individual files with a prefix “c_” or a common file for all input files.

 

The output has the format: word lemma1 [info1] lemma2 [info2] ...

 

There are three possible formats of grammar info for output: English, Russian Short and Russian Long (the last two differs in the quantity of the grammar information added).

 

Example of the English output:

·         большой большой [Adj, Full, Abl, Sg, Fem, ] большой [Adj, Full, Acc, Sg, Masc, ] большой [Adj, Full, Dat, Sg, Fem, ] большой [Adj, Full, Gen, Sg, Fem, ] большой [Adj, Full, Nom, Sg, Masc, ] большой [Adj, Full, Prep, Sg, Fem, ]

·         стол стол [Noun, Acc, Sg, Masc, Unanim, ] стол [Noun, Nom, Sg, Masc, Unanim, ]

·         был быть [Verb, Indic, Past, Sg, Masc, ]

·         у у [Interj, ] у [Noun, unchanging ] у [Preposition, ]

·         окна окно [Noun, Acc, Pl, Neutr, Unanim, ] окно [Noun, Gen, Sg, Neutr, Unanim, ] окно [Noun, Nom, Pl, Neutr, Unanim, ]

 

Example of the Russian Short output:

·         большой большой [полн.,дат.п.,ед.ч.,жен.р.,] большой [полн.,вин.п.,ед.ч.,муж.р.,] большой [полн.,род.п.,ед.ч.,жен.р.,] большой [полн.,им.п.,ед.ч.,муж.р.,] большой [полн.,пр.п.,ед.ч.,жен.р.,] большой [полн.,тв.п.,ед.ч.,жен.р.,]

·         и и [частиц.,] и [межд.,] и [неизм.,] и [союз,]

·         . . [.]

·         Долго долгий [кратк.,ед.ч.,ср.р.,] долго [нареч.,]

·         быстро быстрый [кратк.,ед.ч.,ср.р.,]

·         Стол стол [вин.п.,ед.ч.,] стол [им.п.,ед.ч.,]

·         был быть [изъяв.,прош.вр.,ед.ч.,муж.р.,]

·         у у [межд.,] у [неизм.,] у [предл.,]

·         окна окно [вин.п.,мн.ч.,] окно [род.п.,ед.ч.,] окно [им.п.,мн.ч.,]

 

Example of the Russian Long output:

·         большой большой [Прил., полн., Дат. п., ед. ч., жен. р., ] большой [Прил., полн., Вин. п., ед. ч., муж. р., ] большой [Прил., полн., Род. п., ед. ч., жен. р., ] большой [Прил., полн., Им. п., ед. ч., муж. р., ] большой [Прил., полн., Пр. п., ед. ч., жен.

·         Долго долгий [Прил., кратк., ед. ч., ср. р., ] долго [Наречие, ]

·         и и [Частица, ] и [Междометие, ] и [Союз, ] и [Сущ., неизмен ]

·         быстро быстрый [Прил., кратк., ед. ч., ср. р., ]

·         . . [.]

·         Стол стол [Сущ., Вин. п., ед. ч., муж. р., неод., ] стол [Сущ., Им. п., ед. ч., муж. р., неод., ]

·         был быть [Гл., изъяв. накл., прош. вр., ед. ч., муж. р., ]

·         у у [Междометие, ] у [Предлог, ] у [Сущ., неизмен ]

 

 

Interface:

 

 

 

Automatic morphological generation of Russian (beta version)

 

Download:

Downloading means that you accept the license. Thank you.

 

Download the system for generation

 

Examples

 

Examples of input data encoding (see also the example file “encode.pas”):

 

NMUNS   Noun, Masculine, Unanimated, Nominative, Singular  // Masculine and Unanimated are taken from the dicitonary for nouns, so these two letter are ignored

NFAGP   Noun, Feminine, Animated, Genitive, Plural  // Feminine and Animated are taken from the dicitonary for nouns, so these two letter are ignored

 

AFNSM  Adjective, Full form, Nominative, Singular, Masculine

AFAP  Adjective, Full form, Accusative, Plural     // No gender in Plural

ABSM  Adjective, Brief form,  Singular, Masculine   //

ABAP  Adjective, Brief form, Plural     // No gender in Plural

AC    Adjective, Comparison form

 

VF   Verb Infinitive

VIP1S  Verb, Indicitive, Present/Future, 1 person, Singular

VITSM  Verb, Indicitive, Past, Singular, Masculine

VM2S  Verb, Imperative, 2 person, Singular

 

EXAMPLE INPUT FILE FORMAT

(Note that the input word can have any form)

 

стекло NNUNS

стол NMUNS

стол NMUGS

стол NMUDS

стол NMUPS

столами NMUNS

столами NMUGS

столами NMUDS

столами NMUPS

делал VF

делаю VIP1S

делать VITSM

делаем VM2S

 

OUTPUT FILE

стекло/

стол/

стола/

столу/

столе/

стол/

стола/

столу/

столе/

быстрые/быстрых/  

делать/

делаю/

делал/

делаем/

 

There are two forms of an adjective, because one is animated and the other is not.

 

Go to the personal page of Grigori Sidorov.

Go to the personal page of Alexander Gelbukh.