A corpus-based approach to discovering semantic relationships between named entities

Authors

  • Reynier Ávila Peña Empresa de desarrollo de aplicaciones, tecnología y sistemas (Datys), Cuba
  • Celia María Pérez Marqués Universidad de Oriente, Cuba

Keywords:

corpus linguistics, discourse analysis, named entities, linguistic corpus, computational linguistics

Abstract

Introduction: The objective of this study is to analyze a news text related to cultural identity, which is part of a labeled
linguistic corpus, in order to annotate the syntactic and semantic relationships between the named entities in the text.
Materials and methods: A classification of the semantic relationships established between the named entities and how
they function in a labeled XML format is presented, using grammatical tagging and syntactic analysis. 20 named entities,
13 grammatical relationships, and 36 semantic relationships were tagged. Results: The proposal presented in this article
proves to be useful for developing and evaluating new open information extraction systems in Spanish. Discussion:
Linguistic corpora, corpus linguistics, and computational linguistics are valuable tools in the process of machine learning
for natural language understanding. The analysis of syntactic and semantic relationships between named entities in
a news text is crucial for relevant information extraction and linguistic pattern identification. Conclusion: This study
highlights the relevance of labeled linguistic corpora and corpus linguistics in the analysis of natural language and in the
development of natural language processing systems that are capable of understanding and analyzing human language
in different contexts. The importance of this work lies in the need to develop natural language processing systems that
enable computers to understand and analyze human language in different contexts.

References

Alonso, L. (1998). El análisis sociológico de los discursos: una aproximación desde los usos concretos. Ed. Fundamentos.

Análisis del Discurso. (2015). https://metodosdeinvestigaciondcgunefa.wordpress.com/2015/07/04/analisisdel-discurso/

Arredondo Toledo, L. M. (2018). Extracción de relaciones entre las entidades nombradas en el idioma español [Tesis de Maestría].

Bernal Chávez, J. A. y Hincapié Moreno, D. A. (2018). Lingüística de corpus. http://bibliotecadigital.caroycuervo.gov.co/1703/1/Linguistica-de-corpus-2018.pdf

Boillos Pereira, M. M. (2018). La elaboración de un corpus del profesorado de español (copele): ¿utopía o realidad? Disponible en: https://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-48832018000200153

Cruz Piñol, M. (2017). Lingüística de corpus y enseñanza del español como 2/L. Arco/Libros. https://www.arcomuralla.com/detalle_libro.php?id=872

Culotta, A., & Sorensen, J. (2004). Dependency tree kernels for relation extraction. In Proceedings of the 42nd annual meeting on association for computational linguistics (p. 423). Association for Computational Linguistics.

Culotta, A., McCallum, A. & Betz, J. (2006). Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 296-303). Association for Computational Linguistics.

Filología e informática. (1999): nuevas tecnologías en los estudios filológicos (pp. 45-77). Milenio.

Jurafsky, D., & Martin, J. H. (2017). Vector Semantics. Speech and Language Processing: An Introduction to Natural Language Processing. Computational Linguistics, and Speech Recognition (3rd ed draft chapter 15-16).

Lyons, John. (1997). Semántica lingüística. Paidós.

Martín Peris, Ernesto. (coord.) (2008). Diccionario de términos clave de ELE. SGEL.

Mercado, H. (2008). Fundamentos de la lingüística de corpus. (s.e.).

Pardo Abril, N. G. (2002). El contexto y el discurso público. https://revistas.udistrital.edu.co/index.php/enunc/article/view/2465/3432.

Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Torruela, J. & Llisterri, J. (1999). Diseño de corpus textuales y orales. En Filología e informática: nuevas tecnologías en los estudios filológicos (pp. 45-77). Milenio.

Wallis, S. and Nelson G. (s.f.). Knowledge discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery, 5: 307–340.

Published

2023-10-25

How to Cite

Ávila Peña, R., & Pérez Marqués, C. M. (2023). A corpus-based approach to discovering semantic relationships between named entities. Maestro Y Sociedad, 23–31. Retrieved from https://maestroysociedad.uo.edu.cu/index.php/MyS/article/view/6197

Issue

Section

Número Especial