Lexical Complexity of Editorial Articles in the Spanish Press: A Selection of Four Newspapers

Authors

DOI:

https://doi.org/10.5944/rhd.vol.6.2021.30861

Keywords:

Lexical Complexity, Lexical Diversity, Lexical Sophistication, NLP, Spanish Press

Agencies:

Sound and Meaning in Golden Age Literature (FWF Austrian Science Fund, P32563).

Abstract

This case study explores the differences in lexical complexity (LC) in the Spanish quality press. The results show variability of the lexical quality among papers and lack of correlation of number of readers and higher LC. The lexical sophistication indexes LS1 and CVS1 and lexical diversity indexes HD-D, MAAS, and MTLD were calculated for 2741 editorial articles of Abc, El Mundo, El País, and El Periódico published online in 2019. The results revealed significant differences in both LD and LS between the newspapers, with El Mundo producing the most and El Periódico the less complex texts overall. Posthoc analyses showed further differences between publications, being El Periódico the most disparate. Additionally, the comparison of HD-D, MAAS, and MTLD with TTR-based measures suggests benefits of the former for samples of heterogeneous sizes.

Downloads

Download data is not yet available.

Author Biography

Fernando Sanz-Lázaro, Universidad de Viena

Fernando Sanz-Lázaro es estudiante de doctorado en el Departamento de Estudios Románicos de la Universidad de Viena. Sus áreas de interés son la litera española del Siglo de Oro y  las humanidades digitales, particularmente la automatización de análisis dramétricos y escansión de obras teatrales áureas.

References

Asociación para la Investigación de Medios de Comunicación. (2019). Ranking de diarios. Estudio General de Medios 2019 3a Ola. http://reporting.aimc.es/index.html#/main/diarios

Bathke, A. C., Friedrich, S., Pauly, M., Konietschke, F., Staffen, W., Strobl, N., and Höller, Y. (2018). Testing Mean Differences among Groups: Multivariate and Repeated Measures Analysis with Minimal Assumptions. Multivariate Behavioral Research, 53(3), 348–359. https://doi.org/10.1080/00273171.2018.1446320

David, A., Myles, F., Rogers, V., and Rule, S. (2009). Lexical development in instructed L2 learners of French: Is there a relationship with morphosyntactic development? In B. J. Richards, D. D. Malvern, M. H. Daller, P. Meara, J. Milton, and J. Treffers-Daller (Eds.), Vocabulary Studies in First and Second Language Acquisition: The Interface Between Theory and Application (pp. 147–163). Palgrave.

El Mundo. (2019, August 15). Por un turismo de mayor calidad. El Mundo. https://www.elmundo.es/opinion/2019/08/16/5d5590dffdddffa4548b45cf.html

El País. (2019, August 13). Derrota en dos tiempos. El País. https://elpais.com/elpais/2019/08/12/opinion/1565629594_068797.html

Friedrich, S., Konietschke, F., and Pauly, M. (2017). A wild bootstrap approach for nonparametric repeated measurements. Computational Statistics & Data Analysis, 113, 38–52. https://doi.org/10.1016/j.csda.2016.06.016

Friedrich, S., Konietschke, F., and Pauly, M. (2019). MANOVA.RM (Version 3.4.0) [Software]. https://cran.r-project.org/web/packages/MANOVA.RM/index.html

Friedrich, S., and Pauly, M. (2017). MATS: Inference for potentially Singular and Heteroscedastic MANOVA. Journal of Multivariate Analysis, 165, 166–179. https://doi.org/10.1016/j.jmva.2017.12.008

Imbert, G., and Vidal-Beneyto, J. (1986). El País o la referencia dominante. Mitre.

Konietschke, F., Bathke, A. C., Harrar, S. W., and Pauly, M. (2015). Parametric and nonparametric bootstrap methods for general MANOVA. Journal of Multivariate Analysis, 140, 291–301. https://doi.org/10.1016/j.jmva.2015.05.001

Laufer, B., and Nation, P. (1995). Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics, 16(3), 307–322. https://doi.org/10.1093/applin/16.3.307

Linnarud, M. (1986). Lexis in composition: A performanceanalysis of Swedish learners’ written English. CWK Gleerup.

Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal, 96(2), 190–208. https://doi.org/10.1111/j.1540-4781.2011.01232_1.x

Lu, X., Gamson, D. A., and Eckert, S. A. (2014). Lexical difficulty and diversity of American elementary school reading textbooks. International Journal of Corpus Linguistics, 19(1), 94–117. https://doi.org/10.1075/ijcl.19.1.04lu

Maas, H.-D. (1972). Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift Für Literaturwissenschaft Und Linguistik, 8, 73–79.

MacWhinney, B. (2019). Tools for Analyzing Talk, Part 2: The CLAN Program. Carnegie Mellon University. https://talkbank.org/manuals/CLAN.pdf

Mair, P., and Wilcox, R. R. (2019). Robust Statistical Methods in R Using the WRS2 Package. Behavior Research Methods. https://doi.org/10.3758/s13428-019-01246-w

Malvern, D. D., Richards, B. J., Chipere, N., and Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Springer.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55–60. https://doi.org/10.3115/v1/P14-5010

Martínez Alonso, H., and Zeman, D. (2017). UD Spanish AnCora (Version 2.0) [Software]. https://universaldependencies.org/treebanks/es_ancora/index.html

McCarthy, P. M. (2005). An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD) [Tesis doctoral, University of Memphis]. https://search.proquest.com/openview/860b2901fa90c6e68e46cd9111bd2d1c/1?pq-origsite=gscholar&cbl=18750&diss=y

McCarthy, P. M., and Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488. https://doi.org/10.1177/0265532207080767

McCarthy, P. M., and Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. https://doi.org/10.3758/BRM.42.2.381

Qi, P., Dozat, T., Zhang, Y., and Manning, C. D. (2018). Universal Dependency Parsing from Scratch. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 160–170. https://doi.org/10.18653/v1/K18-2016

R Core Team. (2019). R: A Language and Environment for Statistical Computing (Version 3.6.2) [Software]. R Foundation for Statistical Computing. https://www.R-project.org/

Real Academia Española. (2018). Banco de datos (CORPES XXI). Corpus del Español del Siglo XXI (CORPES). https://www.rae.es/recursos/banco-de-datos/corpes-xxi

Shen Yan Shun, L. (2018). lexicalrichness: A small module to compute textual lexical richness (Version 0.1.3) [Software]. https://github.com/LSYS/lexicalrichness

SIGNLL. (2018). CoNLL 2018 Shared Task. SIGNLL: ACL’s Special Interest Group on Natural Language Learning. https://universaldependencies.org/conll18/

Stanford NLP Group. (2018). System Performance. StandfordNLP. https://stanfordnlp.github.io/stanfordnlp/performance.html

Templin, M. C. (1957). Certain language skills in children. University of Minnesota Press.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer. http://ggplot2.org

Wolfe-Quintero, K., Inagaki, S., and Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. Univof Hawai’i Press.

Published

2021-11-26

How to Cite

Sanz-Lázaro, F. (2021). Lexical Complexity of Editorial Articles in the Spanish Press: A Selection of Four Newspapers . Revista de Humanidades Digitales, 6, 85–100. https://doi.org/10.5944/rhd.vol.6.2021.30861

Issue

Section

Artículos Académicos