Project code: 6-03-048

Computational Analysis of the Croatian Literary Language

Main researcher: MOGUŠ, MILAN (83210)

Type of research: basic
Duration from: 01/01/91. to 12/31/95.

Papers on project (total): 79
Papers on project quoted in Current Contents: 8
Institution name: Filozofski fakultet - Humanističke znanosti, Zagreb (130)
Department/Institute: Institute of Linguistics
Address: Ivana Lučića 3
City: 10000 - Zagreb, Croatia
Phone: 385 (0)1 61 20 011
Fax: 385 (0)41 51 38 34

Summary: The analysis of the Croatian literary language is based on theinvestigation of texts dating from the oldest written documentsup to the present time. A list of patterns and criteria for thecompilation of representative corpora have been devised. Theproject is divided into three sections: The first sections in thelinguistic processing of such - until now nonexistent - corporarequires special computer programs by means of which analphabetical concordance of the corpus and its lemmatization havebeen completed. The basis for these concordances is theone-million corpus of the Croatian literary language (comprisingfive basis stylistic genres - poetry, prose, drama, scientificprose, newspapers). The "Frequency dictionary of the StandardCroatian language" has been prepared for publication. TheOrtographic Dictionary has been compiled as well as severaldictionaries of individual works of the older Croatianliterature. The second section of the analysis is based onspecial features in building a text as a linguistic andinformational entity. The third section of the research isconceentrated on the cultural determination of the Croatianliterary language.

Keywords: frequency dictionary of the one-million corpus, frequency dictionary of the corpus of Croatian poetry, prose, drama, scientific texts, newspapers, lemmatization, compilation of the Dictionary of orthography, text as linguistic and informational entity, cultural determination of the Croatian literary language

Research goals: The research on the project "Computational Analysis of the Croatian Literary Language" is carried out in three main directions: the compilation of the Frequency Dictionary of the Croatian Language, based on the corpus of one-million words (the so-called 'Moguš corpus'); the compilation of the Orthographic Dictionary of the Croatian Literary Language; the compilation of dictionaries of the literary works of older Croatian literature (from the first written documents to the 19th century); compiling monographs on Croatian philologists of older periods, especially grammarians and lexicographers; working out the typology of literary genres.


  1. Name of project: Proučavanje civilizacijske terminologije središnje i istočne Europe
    Name of institution: Austrijska akademija znanosti
    City: 1000 - Beč, Austrija


  1. Name of institution: Austrijska Akademija znanosti
    Type of cooperation: Joint project
    City: 1000 - Beč, Austrija

