Computational Analysis of the Croatian Literary Language
Main researcher
: MOGUŠ, MILAN (83210) Assistants
VONČINA, JOSIP (52994)
VEDRIŠ, MLADEN (128763)
BRATANIĆ, MARIJA (77812)
IVANETIĆ, NADA (93970)
GLOVACKI-BERNARDI, ZRINJKA (75042)
VINCE, ZLATKO (52325)
MENAC-MIHALIĆ, MIRA (77022)
CVITAŠ, MAJA (15596)
TADIĆ, MARKO (157043)
Type of research: basic Duration from: 01/01/91. to 12/31/95. Papers on project (total): 79
Papers on project quoted in Current Contents: 8
Institution name: Filozofski fakultet - Humanističke znanosti, Zagreb (130) Department/Institute: Institute of Linguistics Address: Ivana Lučića 3 City: 10000 - Zagreb, Croatia
Communication
Phone: 385 (0)1 61 20 011
Fax: 385 (0)41 51 38 34
Summary: The analysis of the Croatian literary language is based on
theinvestigation of texts dating from the oldest written documentsup to the
present time. A list of patterns and criteria for thecompilation of
representative corpora have been devised. Theproject is divided into three
sections: The first sections in thelinguistic processing of such - until
now nonexistent - corporarequires special computer programs by means of
which analphabetical concordance of the corpus and its lemmatization
havebeen completed. The basis for these concordances is theone-million
corpus of the Croatian literary language (comprisingfive basis stylistic
genres - poetry, prose, drama, scientificprose, newspapers). The "Frequency
dictionary of the StandardCroatian language" has been prepared for
publication. TheOrtographic Dictionary has been compiled as well as
severaldictionaries of individual works of the older Croatianliterature.
The second section of the analysis is based onspecial features in building
a text as a linguistic andinformational entity. The third section of the
research isconceentrated on the cultural determination of the
Croatianliterary language.
Keywords: frequency dictionary of the one-million corpus, frequency dictionary of the corpus of Croatian poetry, prose, drama, scientific texts, newspapers, lemmatization, compilation of the Dictionary of orthography, text as linguistic and informational entity, cultural determination of the Croatian literary language
Research goals: The research on the project "Computational Analysis
of the Croatian Literary Language" is carried out in three main directions:
the compilation of the Frequency Dictionary of the Croatian Language, based
on the corpus of one-million words (the so-called 'Moguš corpus'); the
compilation of the Orthographic Dictionary of the Croatian Literary
Language; the compilation of dictionaries of the literary works of older
Croatian literature (from the first written documents to the 19th century);
compiling monographs on Croatian philologists of older periods, especially
grammarians and lexicographers; working out the typology of literary
genres.
COOPERATION - PROJECTS
Name of project
: Proučavanje civilizacijske terminologije
središnje i istočne Europe Name of institution: Austrijska akademija znanosti City: 1000 - Beč, Austrija