![]() |
Grigori SIDOROV
PhD, Professor and researcher
(Full Professor)
|
Text processing techniques and systems, automatic dictionary processing, automatic morphological analysis of different languages, automatic syntactic analysis, anaphora resolution, word sense disambiguation, corpus linguistics, parallel texts, linguistic software development.
Current projects: Linguistic tools; parallel texts; automatic analysis of explanatory dictionaries, sentiment analysis, authorship attribution, syntactic n-grams.
You can use all these programs freely for academic purposes. No warranty.
New IDEA: Syntactic n-grams = n-grams constructed by following syntactic trees = using syntax in machine learning.
Spanish Emotion Lexicon (SEL) (zip, text, full text).
SEL contains 2,036 words that are associated with the measure of Probability Factor of Affective use (PFA) with respect to at least one basic emotion: joy, anger, fear, sadness, surprise, and disgust. It was marked manually by 19 annotators (scale: null, low, medium, high) and certain thresholds on agreement were implemented. Example of the results, see the table. It means that, say, for the word abundancia (abundance), 50% of annotators chose “medium” and 50% chose “high” values.
|
Word |
Null[%] |
Low[%] |
Medium[%] |
High[%] |
|
abundancia (abundance) |
0 |
0 |
50 |
50 |
|
aceptable (acceptable) |
0 |
20 |
80 |
0 |
|
acallar (to silence) |
50 |
40 |
10 |
0 |
A new measure for each word is proposed: Probability Factor of Affective use (PFA). It is based on the percentages presented in the table. Note that PFA is 1 if 100% of annotators relate it to the “high” value of the association with the emotion, and it is 0 if 100% of annotators relate it to the “null” value. So, intuitively it has very clear meaning: the higher the value of the PFA is, the more probable the association of the word with the emotion is. Example of SEL word list:
Palabra PFA Categoría
abundancia 0.83 Alegría
acabalar 0.396 Alegría
acallar 0.198 Alegría
acatar 0.198 Alegría
acción 0.397 Alegría
aceptable 0.594 Alegría
aceptación 0.696 Alegría
acicate 0.429 Alegría
aclamación 0.799 Alegría
aclamar 0.799 Alegría
acogedor 0.83 Alegría...
The data similar to the data in the table is available as well (see full text o xlsx file).
Paper for citing for Spanish Emotion Lexicon (SEL):
Grigori Sidorov, Sabino Miranda-Jiménez, Francisco Viveros-Jiménez, Alexander
Gelbukh, Noé Castro-Sánchez, Francisco Velásquez, Ismael Díaz-Rangel, Sergio
Suárez-Guerra, Alejandro Treviño, and Juan Gordon.
Empirical Study of Opinion Mining in Spanish Tweets.
LNAI 7629-7630, 2012, 14 p.
English-Spanish dictionary of weighted morphological forms. Forms are weighted according to the distributions of corresponding grammar classes in corpora. Unicode. Spanish-English version is available on request. For example:
'cause porque 1.0000000
'til hasta 1.0000000
a un 0.4603677
a una 0.3662918
a unas 0.0734382
a uno 0.0031157
a unos 0.0967866
abaci ábaco 0.0561639
abaci ábacos 0.9438361
abacus ábaco 0.9890721
abacus ábacos 0.0109279
abacuses ábaco 0.0561639
abacuses ábacos 0.9438361
abandon abandonábamos 0.0024804
abandon abandonáis 0.0005694
abandon abandonáramos 0.0004860
abandon abandonáremos 0.00071134
abandon abandonásemos 0.0004860...
...abandon abandonaba 0.0779384
abandon abandonabais 0.0000805
abandon abandonaban 0.0226584...
Paper for citing for English-Spanish dictionary of weighted morphological
forms:
Grigori Sidorov, Alberto Barrón-Cedeño and Paolo Rosso.
English-Spanish Large
Statistical Dictionary of Inflectional Forms. In: Proceedings of the Seventh
International Conference on Language Resources and Evaluation (LREC'10),
Valletta, Malta. European Language Resources Association (ELRA), 2010, pp.
277-281.
Interface for the system for fast search of Maya glyphs based on their visual structural description (ZIP) or Compressed as EXE file.
Beta-version. The system uses the dictionary of J. Montgomery.
EXE: Download the Glyphs.exe file, execute it, the files will be copied to the folder you choose. Then execute the file SETUP.EXE.
ZIP: Download the Glyphs.zip file, unzip files to the folder you choose . Then execute the file SETUP.EXE.
Papers for citing for glyph search system:
1. Obdulia Pichardo Lagunas, Grigori Sidorov.
Diccionario de los glifos maya con descripción visual
estructural. In: Proc. of International Conference EURALEX-2008, Barcelona,
Spain, July 2008, pp 747–751.
2. Grigori Sidorov, Obdulia Pichardo-Lagunas, and Liliana Chanona-Hernandez.
Search Interface to a Mayan Glyph Database based on
Visual Characteristics. LNCS 5723,
2009, pp. 222–229.
System for automatic morphological analysis of Spanish
(2000-2006) A complete wordlist (beta-version) generated with this system is available.
System for automatic morphological analysis of Russian
(1992-2000)
These are EXE files for Windows; DLLs are available on request.
These are the programs that perform lemmatization and provide grammar information of each word form of Spanish or Russian correspondingly.
See detailed description on the corresponding pages – follow the links.
Paper for citing for morphological analysis systems:
A. Gelbukh, G. Sidorov.
Approach to construction of
automatic morphological analysis systems for inflective languages with little
effort. LNCS 2588, 2003, pp. 215–220.
More than 170 scientific publications. More than 300 references to my works (without self-citing), h-index 11.
Grigori Sidorov, Obdulia Pichardo-Lagunas, and Liliana Chanona-Hernandez. Search Interface to a Mayan Glyph Database based on Visual Characteristics. LNCS 5723, 2009, pp. 222–229.
You can find more information about the papers and about our laboratory on the page of Alexander Gelbukh. More information about the annual International Conference on computational linguistics CICLing (Springer, LNCS series) or about Mexican International Conference on Artificial Intelligence MICAI (Springer, LNAI series) .
Statistics of visitors since 09/10/2012
Main site: www.cic.ipn.mx/~sidorov
MIRROR site: www.g-sidorov.org
ANOTHER counter since 29/01/2013:
View
visitors statistics
[Mirror counter:
View
visitors statistics (mirror)]