Корпусное исследование функционирования лексических пучков (чанков) в англоязычном научном дискурсе

Авторы

  • Узбекский государственный университет мировых языков
Корпусное исследование функционирования лексических пучков (чанков) в англоязычном научном дискурсе

Аннотация

Данная статья посвящена корпусному исследованию функционирования лексических пучков (lexical bundles) в англоязычном научном дискурсе. Исследование проведено на материале академической подкорпусной базы COCA, где были выявлены наиболее частотные трех- и четырехсловные пучки и проанализированы их функциональные типы. Результаты показали, что в научном дискурсе лексические пучки выполняют три основные функции: тексто-ориентированные, исследовательско-ориентированные и выражающие отношение (stance bundles). Изучение этих единиц имеет важное значение для совершенствования навыков академического письма студентов.

Ключевые слова:

Лексические пучки научный дискурс корпусная лингвистика COCA академическое письмо

Introduction

Corpus linguistics has transformed the way researchers study authentic language use by providing systematic access to large-scale, representative collections of texts. Unlike traditional approaches that rely heavily on intuition, corpus-based research allows for empirical analysis of naturally occurring language patterns, revealing linguistic features that might otherwise go unnoticed. One important phenomenon widely examined within this framework is lexical bundles, also referred to as “chunks” or recurrent sequences of words. These bundles, typically three- to five-word combinations, occur frequently in specific discourse contexts and serve as building blocks of fluent communication.

In the context of scientific discourse, lexical bundles are particularly significant. Academic writing is expected to display accuracy, clarity, and objectivity, and these qualities are often achieved through the use of recurring multiword expressions that structure arguments, report findings, and signal stance (Radjabova, 2025). For example, expressions such as the results of the, as can be seen in, or     it is important to note help writers convey research aims, link evidence, and guide        readers through complex information. Such bundles not only enhance textual cohesion        but also contribute to disciplinary identity, as certain bundles are more common in scientific writing compared to other registers like journalism or fiction.

Previous research (Biber et al., 2004; Hyland, 2008; Cortes, 2004) has shown that lexical bundles in academic texts can be grouped into functional categories, most notably research-oriented, text-oriented, and stance bundles. Each category fulfills a distinct communicative role: while research-oriented bundles describe methodology and findings, text-oriented bundles organize discourse, and stance bundles reflect the writer’s evaluation       or degree of certainty. Understanding the distribution and function of these bundles is therefore crucial for both linguistic description and pedagogy, as they represent formulaic patterns that are central to academic literacy.

The present article aims to provide a corpus-based analysis of lexical bundles in English-language scientific discourse, with particular reference to the academic sub-corpus of the Corpus of Contemporary American English (COCA). By identifying the most frequent bundles and examining their functions, this research seeks to shed light on how these recurrent sequences contribute to the communicative effectiveness of scientific writing. Furthermore, the study considers the pedagogical implications of lexical bundles for teaching academic writing, especially in contexts where English is used as a foreign or second language.

Research Methods

This article investigates the academic sub-corpus of the Corpus of Contemporary American English (COCA) as the primary data source. COCA, developed by Mark Davies (2009), is one of the most comprehensive corpora of English, containing over one billion words across a variety of registers, including spoken, fiction, magazines, newspapers, and academic texts. For this study, the focus was placed specifically on the academic sub-corpus, which consists of journal articles, textbooks, research reports, and other forms of scientific writing from a wide range of disciplines (Radjabova, 2022). The inclusion of multiple disciplines ensures that the analysis is not restricted to a single field but instead reflects general tendencies of lexical bundle usage in academic discourse. This breadth and representativeness make COCA particularly suitable for research into the recurrent phraseological patterns of scientific writing.

The first stage of the analysis involved the identification of three- and four-word lexical bundles in the academic sub-corpus. Following Biber, Johansson, Leech, Conrad, and Finegan (1999), lexical bundles were defined as continuous sequences of words that recur at least 20 times per million words in the corpus. This frequency threshold ensures that the selected bundles represent genuinely conventionalized patterns rather than idiosyncratic or one-off sequences. To obtain these bundles, the corpus search function was employed to generate n-gram frequency lists, which were subsequently filtered to exclude random co-occurrences and contextually irrelevant strings (e.g., names, numbers, or fragments without semantic or discourse value) (Radjabova, 2021).

In order to refine the dataset, several exclusion criteria were applied. For example, bundles that consisted primarily of function words without discernible discourse function were omitted unless they played a role in cohesion (e.g., in the case of). Similarly, bundles limited to discipline-specific terminology (e.g., the polymerase chain reaction) were considered only when they also served a broader discourse-organizing function. This filtering process resulted in a list of the most frequent and pedagogically relevant bundles, which formed the basis for further analysis (Giyosiddinovna, 2025).

Once identified, the bundles were classified according to the functional taxonomy developed by Biber, Conrad, and Cortes (2004) and later elaborated by Hyland (2008).            This taxonomy distinguishes three primary categories:

  • Research-oriented bundles which help writers describe research processes, goals, and findings. Examples include the purpose of the, the results of the, and as shown in figure. Such bundles enable authors to situate their work within a research tradition and to guide readers through methodological or empirical sections.
  • Text-oriented bundles which facilitate the structuring of discourse and cohesion across sections of a text. Typical examples are in the context of, on the basis of, or as a result of. They link ideas, provide transitions, and create logical flow, which are essential in maintaining clarity in extended scientific arguments.
  • Stance bundles which reflect the writer’s perspective, evaluation, or level of certainty, such as it is clear that, it may be argued that, or it should be noted. In scientific writing, stance bundles allow authors to balance objectivity with cautious interpretation, marking their claims in relation to disciplinary norms of evidentiality and modesty (Giyosiddinovna, 2025).

Each identified bundle was manually examined and assigned to one of these categories based on its discourse function in context. Where ambiguity arose, concordance lines were retrieved from COCA to observe the bundle in multiple occurrences, ensuring accuracy of functional classification. The study combined quantitative and qualitative methods. The quantitative component involved calculating the raw and normalized frequencies of each lexical bundle across the academic sub-corpus (Giyosiddinovna, 2025). Normalization per million words was essential to allow meaningful comparison across disciplines of differing corpus sizes. The top 50 bundles were selected for detailed analysis, representing         the most conventionalized patterns of English scientific writing. The qualitative component involved functional interpretation of the bundles. This was carried out through concordance analysis, where sample occurrences were examined to determine how bundles contributed to meaning-making in actual discourse contexts. For instance, the results of the was consistently used to introduce findings in the Results and Discussion sections of research articles, while in the context of was found to serve as a framing device in introductions and literature reviews. By analyzing usage patterns, the study was able to connect frequency data with discourse functions, offering a richer understanding of the role of bundles in academic texts (Giyosiddinovna, 2025).

To ensure methodological rigor, several measures were taken. First, the choice of COCA ensured access to a large, balanced, and up-to-date corpus that reflects authentic academic usage. Second, the use of established frequency thresholds and functional taxonomies ensured comparability with previous studies, facilitating triangulation with existing findings in the literature. Third, manual verification of bundles through concordance lines reduced the risk of misclassification and confirmed that the observed patterns were genuinely recurrent and functionally significant (Radjabova, 2023).

In summary, the research design was based on a systematic corpus-driven approach. The COCA academic sub-corpus provided the empirical foundation, frequency analysis identified recurrent lexical bundles, functional classification clarified their roles in discourse, and concordance-based qualitative analysis offered contextual interpretation. Together, these methods enabled a comprehensive investigation of how lexical bundles function in scientific discourse and how they contribute to the construction of academic knowledge.

Results and Discussion

The analysis of the COCA academic sub-corpus revealed that lexical bundles are pervasive in scientific discourse and serve crucial roles in structuring arguments, presenting information, and guiding readers through complex texts. The findings align with earlier corpus-based studies of academic writing (Biber et al., 2004; Cortes, 2004;          Hyland, 2008), confirming that scientific discourse relies heavily on a limited set of recurrent multiword expressions that perform predictable communicative functions. The most frequent bundles in the data were concentrated within three primary functional categories: research-oriented bundles, text-oriented bundles, and stance bundles. Quantitative analysis showed that research-oriented bundles accounted for the majority of the most frequent three- and four-word clusters, followed by text-oriented bundles, with stance bundles being the least frequent but still functionally important. This distribution reflects the highly informational and objective nature of scientific discourse, where the reporting of methods, results, and interpretations is prioritized over overt evaluation or subjective positioning. Research-oriented bundles were the most frequent in the dataset. These bundles are employed to describe procedures, highlight aims, and present results (Раджабова, 2025). Examples such as the results of the, the purpose of the, and as shown in figure were pervasive across disciplines. Their high frequency suggests that writers rely on conventionalized formulae to present empirical findings in a concise and standardized manner. This finding corresponds with Biber, Conrad, and Cortes (2004), who argue that academic texts are characterized by the routinization of certain phrases that facilitate the reporting of research. Similarly, Hyland (2008) emphasizes that such bundles reflect disciplinary norms, as they embody “preferred ways of constructing knowledge”. For instance, the results of the not only signals the transition to empirical findings but also meets reader expectations about how data should be presented in research articles.

Text-oriented bundles formed the second largest category, with expressions like in the context of, on the basis of, and with respect to the. These bundles are instrumental in structuring discourse, guiding the reader       across sections, and ensuring logical coherence. They were particularly frequent in introductions and literature reviews, where writers needed to situate their research within broader theoretical and methodological frameworks (Раджабова, 2025).

Cortes (2004) observed that text-oriented bundles are especially prevalent in student and novice writing, as learners often rely on these patterns to maintain cohesion. However, the present findings suggest that even expert writers in scientific discourse consistently employ such bundles, highlighting their centrality to academic communication. As Hyland (2008) notes, academic texts are “densely intertextual,” and text-oriented bundles provide the scaffolding that allows authors to weave together previous research, present arguments, and guide readers toward conclusions.

Although stance bundles were less frequent, they played an important role in expressing evaluation and caution. Examples such as it is clear that, it should be noted, and it is possible that were commonly used to introduce claims while simultaneously hedging or reinforcing their strength. Their relatively lower frequency compared to research- and text-oriented bundles is consistent with the disciplinary expectation that scientific writing should minimize overt subjectivity. Nevertheless, their presence is crucial, as stance bundles enable authors to position themselves in relation to knowledge claims, thus fulfilling the evaluative dimension of discourse           (Hyland, 2005). For instance, it is clear that serves to strengthen the writer’s authority by presenting a claim as evident, while it is possible that introduces caution, signaling epistemic uncertainty. Such bundles are vital in balancing objectivity with rhetorical persuasion, ensuring that claims are neither overstated nor devoid of authorial voice.

Comparative observations

A comparison with other registers in COCA reveals that the use of lexical bundles in scientific discourse is distinct both in frequency and function. Whereas conversational registers are characterized by interpersonal and interactional bundles (e.g., I don’t know if), scientific discourse prioritizes informational precision and discourse organization (Biber           et al., 1999). This confirms that lexical bundles are register-specific and adapt to the communicative purposes of each domain (Radjabova, 2024).

Pedagogical implications

The findings have clear implications for academic writing instruction. Since lexical bundles are integral to constructing coherent, disciplinary-appropriate texts, explicit       teaching of their forms and functions can significantly enhance students’ writing competence. As Wray (2002) observes, formulaic sequences reduce processing effort for both writers and readers, making communication more fluent and accessible. Introducing learners to research-oriented bundles, for example, can help them report findings more effectively, while text-oriented bundles can improve cohesion, and stance bundles can assist them in developing a balanced academic voice. Furthermore, corpus-based tools such as COCA allow learners to observe authentic patterns of bundle use in context, encouraging data-driven discovery rather than rote memorization (Boulton, 2021). By engaging with real academic texts, students can develop a deeper awareness of how bundles function in authentic discourse, thereby fostering autonomy and critical language awareness.

Overall, the results underscore the centrality of lexical bundles in English-language scientific discourse. Research-oriented      bundles dominate the discourse, reflecting the priority of empirical reporting, while text-oriented bundles maintain coherence, and stance bundles provide evaluative nuance (Radjabova, 2024). Together, these patterns demonstrate how formulaic language underpins the communicative effectiveness of academic texts, supporting both the transmission of knowledge and the construction of disciplinary identity.

Conclusion

This study confirms that lexical bundles are indispensable in scientific discourse. Research-oriented bundles support the presentation of research, text-oriented      bundles enhance textual organization, and stance bundles reflect the author’s evaluative stance. Pedagogically, explicit instruction in lexical bundles can help students develop academic writing skills, enabling them to produce more coherent, precise, and          authentic texts.

Библиографические ссылки

Biber, D., Conrad, S., & Cortes, V. (2004). If you look at …: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371-405. https://doi.org/10.1093/applin/25.3.371

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Pearson Education.

Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397-423. https://doi.org/10.1016/j.esp.2003.12.001

Davies, M. (2009). The 385+ million word Corpus of Contemporary American English (COCA): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics, 14(2), 159-190. https://doi.org/10.1075/ijcl.14.2.02dav

Giyosiddinovna, R. G. (2025). Linguistic aspects of automatic text alignment in parallel corpora. Western European Journal of Linguistics and Education, 3(05), 146-150.

Giyosiddinovna, R. G. (2025). The impact of gamification on vocabulary retention and student motivation. Ilm fan taraqqiyotida raqamli iqtisodiyot va zamonaviy ta’limning o‘rni hamda rivojlanish omillari, 6(1), 64-69.

Giyosiddinovna, Radjabova, G. (2021). The implementation of spoken corpora in creating teaching materials. International Journal on Integrated Education, 4(5), 349-354.

Giyosiddinovna, Radjabova, G. (2022). Methodological characteristics of corpus technologies in teaching foreign language. International Journal on Integrated Education, 5(1), 157-163. https://doi.org/10.31149/ijie.v5i1.2645

Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4-21. https://doi.org/10.1016/j.esp.2007.06.001

Radjabova, G. (2023). Corpus technologies in teaching academic writing. Foreign Languages in Uzbekistan, 1(48), 92-103.

Radjabova, G. G. (2024). Adjusting the perspective of corpus linguistics: Bridging research and the classroom. American Journal of Modern World Sciences, 1(5), 324-332.

Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press.

Раджабова, Г. (2025). Authenticity as a significant feature of the corpus-based DDL approach for improving students’ writing competence. Лингвоспектр, 3(1), 636-640.

Опубликован

Загрузки

Биография автора

Гулноза Раджабова,
Узбекский государственный университет мировых языков

PhD, Доцент

Как цитировать

Раджабова, Г. (2025). Корпусное исследование функционирования лексических пучков (чанков) в англоязычном научном дискурсе. Лингвоспектр, 9(1), 133–138. извлечено от https://lingvospektr.uz/index.php/lngsp/article/view/1037

Похожие статьи

<< < 4 5 6 7 8 9 10 11 12 13 > >> 

Вы также можете начать расширеннвй поиск похожих статей для этой статьи.