Towards a distant and deep reading: a pilot corpus of Golden-Age Spanish poetry




Distant reading, Poetry, Golden-Age, Meter., Natural Language Processing, Corpus annotation


This paper shows the necessity of combine the distant reading of literary texts (panoramic analysis of a great amount of texts) with «deep» reading (close analysis in detail of implicit linguistic or literary aspects of texts). With this objective, the development of large annotated corpora of literary texts is proposed. Taking advantage of recent developments of Natural Language Processing, the linguistic and literary implicit information could be annotated semi-automatically. In order to show the viability of this proposal, a pilot corpus of Golden-Age Spanish poetry is presented. The corpus is made-up of different types of poems (sonnets, romances, eclogues, etc.) and several poets. Nowadays it has more than 52,000 lines annotated at metrical and morphological level: metrical patterns of each line, and the lemma, part of speech and morphological information of each word. The annotation was developed automatically. 5,069 lines has been revised manually and emended (if necessary). This Gold Standard is the first step both for a distant and deep literary analysis of Golden-Age Spanish poetry and for the development of poetry-specific models of Natural Language Processing.


Download data is not yet available.

Métricas alternativas



How to Cite

Navarro Colorado, B. (2019) “Towards a distant and deep reading: a pilot corpus of Golden-Age Spanish poetry”, Revista de poética medieval, 33, pp. 51–76. doi: 10.37536/RPM.2019.33.0.69109.
