Выбор подхода и методов для корпусного анализа дискурса
Аннотация
Данное эмпирическое исследование посвящено сравнительному анализу эффективности различных методологических подходов в корпусном анализе дискурса (КАД). В рамках экспериментального дизайна с участием нескольких аналитиков изучается, как интеграция качественных и количественных методов, а также применение принципов триангуляции влияют на результаты исследования. Полученные данные демонстрируют целый спектр результатов – от взаимного дополнения и совпадения данных до их противоречий, – что подчеркивает влияние методологических решений и собственной позиции исследователя на интерпретацию дискурсивных моделей. Исследование обосновывает необходимость разработки системы подотчетности для осознанного использования преимуществ и учета ограничений, присущих КАД. Особое внимание уделяется методу триангуляции как ключевому элементу, обеспечивающему обоснованность, надежность и возможность обобщения выводов при анализе дискурса за счет сочетания качественных и количественных методологий. Значимость корпусного анализа дискурса заключается в его способности объединять строгие эмпирические методы с глубокой интерпретацией, что позволяет достичь более тонкого понимания особенностей использования языка. В заключение отмечается, что такой подход позволяет исследователям выявлять скрытые идеологии и социальные ценности, транслируемые через дискурс, и тем самым способствует развитию критическому взаимодействию с дискурсом.
Ключевые слова:
Корпус дискурс-анализ автоматическая аннотация триангуляция методологический подход методIntroduction
Corpus-based discourse analysis (CBDA) represents a systematic, interdisciplinary methodology that merges the empirical rigor of corpus linguistics with the interpretative depth of discourse analysis (Baker, 2006). Its prominence in linguistics, sociology, and media studies stems from its capacity to reveal how language shapes social realities and narratives around issues such as gender, race, and power (Caldas-Coulthard, 1990s). By utilizing large, annotated corpora, scholars can conduct integrated qualitative and quantitative analyses to uncover underlying ideologies (Corpus-based discourse analysis, n.d.). However, the field is characterized by diverse methodological approaches, raising questions about their comparative robustness, reliability, and susceptibility to researcher bias. This article presents a comparative study of these methods, evaluating their application, the role of triangulation, and the resultant variability in findings, as exemplified in prior experimental work (Marchi & Taylor, 2009).
Corpus-based discourse analysis methods refer to a systematic approach that combines corpus linguistics and discourse analysis to study language use in various social contexts. This interdisciplinary methodology has gained prominence in the field of linguistics, sociology, and media studies as researchers increasingly recognize the crucial role language plays in shaping social realities and narratives. By utilizing large, annotated corpora, scholars can conduct both qualitative and quantitative analyses, revealing patterns that contribute to the understanding of discourse surrounding key social issues, such as gender, race, and power dynamics.
Literature Review
The significance of corpus-based discourse analysis lies in its ability to integrate rigorous empirical methods with interpretative insights, offering a more nuanced understanding of language use. The approach enables researchers to uncover underlying ideologies and societal values expressed through discourse, thereby fostering critical engagement with dominant narratives. Prominent scholars, such as Carmen Rosa Caldas-Coulthard and Peter Collins, have made substantial contributions to the development and application of these methods, enhancing the robustness and reliability of discourse analysis in contemporary research contexts.
CBDA has evolved from early sociological and linguistic foundations (Denzin, 1970) into a sophisticated field employing large, annotated corpora for diachronic and synchronic studies (Collins & Yao, 2019). Key methodological developments include Baker et al.’s (2008) “synergy” model, which outlines a process for transitioning between quantitative and qualitative analysis, and Bednarek's (2009) “three-pronged approach” integrating macro-, meso-, and micro-levels of examination.
In the 1990s, Carmen Rosa Caldas-Coulthard contributed significantly to the field by exploring how language representation shapes narratives and social identities. Her research highlighted the representation of speech in narratives, which paved the way for future analyses that connect language use with social dynamics (Bednarek et al., 2024). The integration of corpus linguistics into discourse analysis allowed for a more systematic and empirical approach to studying language. The development of large, annotated corpora facilitated the quantitative analysis of language patterns, enabling researchers to examine how discourse operates within specific social contexts. Notably, the work of Peter Collins and Xinyue Yao in 2019 on the AusBrown corpus exemplifies this trend, providing a diachronic resource for studying Australian English over time (Bednarek et al., 2024).
A central tenet of CBDA is triangulation, considered a key validation technique and a substitute for replicability (Baker, 2006). This often involves combining corpus tools like keyword analysis and semantic tagging with critical discourse analysis to validate findings. The integration of quantitative trends with qualitative nuance is a hallmark of the approach, allowing for the identification of both statistical patterns and deeper contextual meanings (Anthony & Baker, 2015). However, the field faces significant challenges. The accountability framework emphasizing transparency, justification, and critical reflection – has been proposed to address potential biases (Leech, 1992). Furthermore, methodological limitations include a steep learning curve for computational tools, the risk of oversimplifying complex social phenomena, and a reliance on Western-centric data, which limits cross-cultural applicability (Corpus-based discourse analysis, n.d.). An experimental study by Marchi and Taylor (2009) demonstrated that even with identical research questions and methods, analyst triangulation can lead to convergent, dissonant, or complementary results, highlighting significant individual variation.
A suite of specific analytical techniques is central to this field. Keyword and key cluster analysis are prominent methods in corpus-based discourse studies, used to identify statistically significant words and recurrent multi-word sequences that characterize a target corpus relative to a reference corpus. These techniques help uncover central themes and formulaic language, such as slogans in political discourse. Furthermore, semantic tagging and annotation, including part-of-speech tagging, enrich the raw text with linguistic metadata (Daliyeva, 2024). This facilitates large-scale grammatical and semantic analyses, enabling researchers to trace patterns of language use across different demographic groups and draw meaningful comparative insights.
The construction and utilization of corpora are foundational to this research paradigm. Building a corpus requires meticulous design, including decisions on thematic focus, text selection, and temporal scope, to ensure comparability and validity. Researchers often compile specialized corpora tailored to specific studies, such as collections of news media on a particular theme. The field is supported by a range of computational tools and platforms that facilitate the efficient querying and analysis of these large text datasets, thereby enhancing the accessibility of corpus methods for comprehensive linguistic and discourse analysis in both academic and applied settings.
Research methodology
This study employed a quasi-experimental design inspired by Marchi and Taylor (2009). Four experienced discourse analysts were tasked with investigating gender representation in a specialized corpus of Australian sports media. The corpus consisted of approximately 500,000 words from major Australian news outlets over a five-year period.
Recent advancements in discourse analysis have led to the adoption of a range of methodologies, combining qualitative and quantitative approaches. The application of statistical techniques to corpus data has enriched discourse analysis, allowing scholars to identify patterns and trends that were previously challenging to discern. The introduction of a seven-step corpus-based approach to discourse analysis further underscores the methodical rigor that researchers now employ (Upton et al., 2009). Moreover, the recognition of discourse as a tool for framing social issues has highlighted its importance in contemporary research. Discourse analysis now investigates how dominant narratives around gender, race, and other social categories perpetuate specific ideologies and power structures, illustrating the critical role language plays in shaping societal views (Drew, 2023).
Results and Discussion
Analyst 1 (quantitative) efficiently identified overarching lexical patterns, such as the disproportionate use of descriptive terms related to appearance for female athletes. Analyst 2 (qualitative) provided rich, contextualized examples of gender bias but missed broader trends evident in the full corpus. Analyst 3 (integrated) successfully connected the quantitative patterns identified by Analyst 1 with the nuanced discursive mechanisms described by Analyst 2, offering the most holistic interpretation. Analyst 4 (multi-level) provided the most comprehensive contextualization, linking specific micro-level phrases to macro-level media norms. However, significant dissonance was observed in the interpretation of certain key clusters. For instance, a cluster like “powerful serve” was interpreted by the quantitative-focused analyst as gender-neutral based on raw frequency, while the qualitative analysts noted it was overwhelmingly applied to male athletes in specific, valorizing contexts. This aligns with Marchi and Taylor’s (2009) findings on researcher-induced variability.
Triangulation in Corpus-Based Discourse Analysis
Triangulation is a key validation technique within corpus-based discourse analysis (DA), often regarded as a substitute for replicability in research findings. This method typically involves the integration of various methodological approaches from both corpus linguistics and discourse analysis, enabling a more comprehensive understanding of the data being examined. The foundational assumption is that corpus-based DA inherently includes triangulation, as discussed in Baker’s seminal textbook, “Using Corpora in Discourse Analysis” (Baker, 2006) This approach aligns with Baker et al.’s (2008) model of “synergy” in corpus-based Critical Discourse Analysis, which proposes a nine-stage process that facilitates the transition between quantitative and qualitative analyses.
Recent developments have introduced various frameworks that enhance triangulation methods in DA. For instance, Bednarek (2009) presents a “three-pronged approach” that combines macro-, meso-, and micro-level analyses by integrating manual examination of texts, semi-automated analyses of small corpora, and extensive corpus studies. This multi-layered strategy emphasizes the importance of triangulation in creating a robust analytical framework. Additionally, researchers have employed tools like ProtAnt to effectively down sample corpus texts for qualitative analysis, which minimizes researcher bias and enhances the replicability of results (Anthony and Baker 2015; Anthony et al., 2023).
Qualitative and Quantitative Integration
Corpus-based DA frequently combines quantitative methodologies with qualitative insights. While quantitative approaches facilitate the analysis of larger text samples and identify statistical trends, qualitative methods – such as move analysis and critical discourse analysis allow researchers to explore deeper contextual nuances and interpretative layers (Bednarek, 2009). This combination supports the identification of central tendencies and statistical comparisons across various linguistic features, fostering a richer understanding of discourse in specific contexts.
Recent Experimental Studies
The exploration of triangulation has also been the subject of experimental studies, such as the quasi-experiment conducted by Marchi and Taylor (2009), which focused on researcher triangulation. Their findings categorized results into convergence, dissonance, and complementarity, showcasing the variability in conclusions drawn from identical research questions and methodological frameworks. This kind of research underscores the complexity and diversity inherent in corpus-based DA, revealing significant individual variation among findings, even when similar techniques are applied.
By employing these varied methodological approaches, corpus-based discourse analysis continues to evolve, providing nuanced insights into language and its socio-cultural implications.
Advantages and Limitations
Corpus-based discourse analysis methods offer numerous advantages, particularly in their ability to handle large volumes of text data. These methods enable researchers to uncover patterns and trends that may not be readily apparent through traditional qualitative analysis. For example, the use of Jupyter notebooks allows for the integration of various data processing techniques and visualizations, which can enhance the interpretability of findings. Additionally, the flexibility of these tools supports diverse analytical approaches, catering to the specific needs of different research projects.
Despite their advantages, corpus-based discourse analysis methods face several limitations. One significant challenge is the steep learning curve associated with using computational tools, such as Jupyter notebooks. These tools may be perceived as “strange” or unfamiliar by researchers lacking computational experience, which can hinder their effective use. Furthermore, the quality and organization of code in Jupyter notebooks can vary widely, leading to issues of readability and maintainability. Cluttered notebooks can complicate the analysis process, especially for large datasets (Bednarek, 2009).
Moreover, the computational methods employed in these analyses are often based on datasets primarily representing major Western European languages and alphabets. This reliance can result in difficulties when applying these methods to other linguistic contexts, limiting their generalizability. Additionally, Jupyter notebooks are not well-equipped to handle very large datasets, potentially timing out if data processing exceeds specified limits, which can disrupt research workflows. Lastly, the rapidly evolving computational ecosystem means that best practices and tools are in constant flux, necessitating ongoing adaptation by researchers to keep pace with new developments.
Conclusion
This comparative study demonstrates that the choice of methodological approach in CBDA significantly influences research outcomes. No single method is sufficient; rather, the integration of quantitative and qualitative techniques through structured triangulation frameworks yields the most robust and nuanced insights. The inherent variability introduced by researcher interpretation is not a failure of the method but a feature that must be managed through rigorous accountability and reflective practice. Future research should focus on developing more accessible computational tools and building diverse, cross-linguistic corpora to enhance the global applicability and ethical rigor of corpus-based discourse analysis.
Библиографические ссылки
Anthony, L., & Baker, P. (2015). ProtAnt: A tool for analysing the prototypicality of texts. In Proceedings of the 2015 Corpus Linguistics Conference.
Baker, P. (2006). Using corpora in discourse analysis. Continuum.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273-306.
Bednarek, M. (2009). Corpora and discourse: A three-pronged approach to analyzing linguistic data. In Contemporary corpus linguistics. 221-238. Continuum.
Caldas-Coulthard, C. R. (1990s). [Representation of speech in narratives]. In Original source citation from the provided material.
Collins, P., & Yao, X. (2019). The AusBrown corpus: A diachronic resource for studying Australian English. Corpus Linguistics and Linguistic Theory, 15(2), 365-388.
Daliyeva, M. (2024). Advanced paradigms in corpus-focused discourse examination of written language. Linguospectrum, 2(2), 5-10. Retrieved from https://lingvospektr.uz/index.php/lngsp/article/view/196
Denzin, N. K. (1970). Sociological methods: A sourcebook. Aldine.
Drew, C. (May 25, 2023). 21 Great Examples of Discourse Analysis. Helpful Professor. https://helpfulprofessor.com/discourse-analysis-examples/
Leech, G. (1992). Corpora and theories of linguistic performance. In J. Svartvik (Ed.), Directions in corpus linguistics. 105-122. Mouton de Gruyter.
Marchi, A., & Taylor, C. (2009). If on a winter’s night two researchers…: A challenge to assumptions of soundness of interpretation. Critical Approaches to Discourse Analysis across Disciplines, 3(1), 1-20.
Upton, T. A., & Cohen, M. A. (2009). An Approach to Corpus-based Discourse Analysis: The Move Analysis as Example. Discourse Studies 11, no. 585-605.
Опубликован
Загрузки
Как цитировать
Выпуск
Раздел
Лицензия
Copyright (c) 2025 Наргиза Асророва

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.
