Оценка эффективности машинного перевода в сохранении метафорических выражений при переводе с исходных текстов на целевые языки

Авторы

  • Узбекский государственный университет мировых языков
Оценка эффективности машинного перевода в сохранении метафорических выражений при переводе с исходных текстов на целевые языки

Аннотация

На фоне стремительного распространения нейронных систем машинного перевода (НМП) сохраняются вопросы об их способности обрабатывать тонкие языковые явления, в особенности метафорические выражения. Метафоры носят культурный и контекстуальный характер, и их перевод требует не просто прямой лексической замены. В данной статье исследуется эффективность передовых систем машинного перевода (МП) в сохранении метафорических выражений при переходе от исходного языка к целевому. Сначала рассматриваются теоретические основы метафоры в когнитивной лингвистике и те трудности, которые она представляет для МП. Затем анализируется эволюция моделей машинного перевода — от статистических к нейронным — с оценкой их способности к обработке образного языка. Опираясь на эмпирические исследования и новый двуязычный параллельный корпус, обогащенный тщательно аннотированными метафорическими выражениями, мы проводим качественную и количественную оценку ведущих систем НМП. Наши результаты показывают, что несмотря на улучшение беглости и контекстуальной связности в нейронных моделях, они нередко не способны полностью сохранить концептуальную структуру и прагматическую силу метафор. Мы выявляем лингвистические факторы, влияющие на качество перевода метафор, такие как культурная специфика, метафорическая плотность и синтаксическая сложность, а также рассматриваем методы улучшения обработки метафор, включая стратегии предварительного обучения, наборы данных, учитывающие метафоры, и инструменты пост-редактирования. Указывая на современные ограничения и потенциальные пути улучшения, мы способствуем лучшему пониманию того, как системы машинного перевода справляются с одним из самых творческих и концептуально богатых аспектов человеческого языка.

Ключевые слова:

машинный перевод метафора нейронный машинный перевод образный язык межъязыковая семантика

Introduction

The last decade has witnessed rapid advancement in machine translation (MT) technology. With the shift from statistical machine translation (SMT) to neural machine translation (NMT), current state-of-the-art systems like Google Translate, DeepL, and Microsoft Translator have achieved remarkable improvements in fluency, coherence, and general adequacy (Vaswani et al., 2017; Bojar et al., 2018). Despite these developments, translating figurative language—particularly metaphor—remains a formidable challenge.

Metaphors are central to human cognition and communication, shaping how we conceptualize abstract domains through more concrete source domains (Lakoff & Johnson, 1980). Unlike literal language, metaphorical expressions are not purely compositional. They are deeply influenced by cultural context, conceptual mappings, and discourse conventions (Kövecses, 2005). Translating metaphors involves not only finding semantic equivalents but also preserving connotative aspects, cultural resonance, and the intended rhetorical effect on the reader (Newmark, 1988; Shuttleworth & Cowie, 2014).

The complexity of metaphors poses questions about the capabilities of NMT systems. Although neural models have displayed impressive pattern recognition abilities and have sometimes surpassed SMT in capturing idiomatic expressions, metaphors often require more than superficial pattern matching. They may demand reasoning about conceptual mappings, domain knowledge, or discourse context. The scarcity of parallel corpora annotated for metaphor and the difficulty of modeling non-literal meaning further complicate the challenge.

This article aims to evaluate the effectiveness of current MT systems in preserving metaphorical expressions from source to target texts. We approach this task by integrating insights from cognitive linguistics, translation studies, and natural language processing (NLP). After reviewing the literature on metaphor and MT, we present an empirical study that assesses the performance of three leading NMT systems on metaphor translation from English into French and Chinese. We then discuss our findings and consider strategies for improving the handling of metaphors in MT.

By shedding light on the current state of metaphor translation in MT, this study contributes to ongoing efforts to enhance the quality of cross-linguistic communication facilitated by automated systems. Ultimately, we argue that addressing the metaphor challenge will require more than incremental model improvements; it will likely necessitate enriched training data, targeted model architectures, and a better understanding of how conceptual and cultural knowledge can be integrated into MT pipelines.

Literature Review

Metaphor, as conceptualized by Lakoff and Johnson (1980), is not merely a decorative linguistic device but a fundamental cognitive mechanism that structures human thought. Common conceptual metaphors like TIME IS MONEY or ARGUMENT IS WAR influence how people think, reason, and communicate about abstract concepts. Metaphors manifest linguistically through varied expressions, from conventional idioms (e.g., “in hot water”) to novel poetic images (e.g., “the sun spilled gold across the horizon”).

Cross-linguistic studies show that while some conceptual metaphors are widely shared, others are culturally specific (Kövecses, 2005). This cultural specificity complicates translation. A metaphor that is natural and easily interpretable in one language may be bizarre or meaningless in another if the target culture does not share the conceptual mapping.

Translation scholars have long acknowledged the complexity of metaphor (Newmark, 1988; Schäffner, 2004). Strategies for translating metaphor range from maintaining the original metaphorical image to substituting it with a culturally relevant equivalent or even rendering it as a simile or explicitation (Dickins, 2005). The decision often depends on the function of the metaphor in the text, the target audience, and the translator’s assessment of pragmatic equivalence.

Human translators can rely on cultural knowledge, interpretive skills, and a lifetime of reading to handle metaphors. They can identify when a metaphor is conventional, and thus likely to have a known equivalent, or when it is novel and demands creative adaptation. Machine translation systems, however, currently lack this rich cultural and conceptual grounding.

Early MT systems (rule-based and SMT) struggled with figurative language due to their reliance on word-for-word mappings or statistical co-occurrences that did not capture deep semantic relations (Koehn, 2010). The advent of NMT, particularly Transformer-based models (Vaswani et al., 2017), improved the handling of context, reducing the frequency of glaring mistranslations. Yet, research indicates that NMT systems still often fail to translate idiomatic and metaphorical expressions accurately (Laubli & Sennrich, 2020).

Some studies have evaluated MT performance on idioms and other figurative expressions (Fadaee et al., 2018; Bannard & Callison-Burch, 2010). While idiomatic phrases—if sufficiently frequent in training data—can be memorized and translated reasonably well by NMT, novel metaphors or those that are rare and complex remain problematic. NMT models struggle with metaphors that rely on specific cultural knowledge or that are syntactically ambiguous.

One explanation for the difficulty in metaphor translation lies in the conceptual blending inherent in metaphor (Fauconnier & Turner, 2002). Translating a metaphor demands mapping not only linguistic forms but also the conceptual domains involved. Neural models trained purely on text corpora, without explicit semantic or conceptual representations, may capture correlations but fail to establish conceptual mappings robustly.

Recent research attempts to incorporate external knowledge (Zhou et al., 2020), semantic embeddings (Blevins et al., 2020), or metaphor identification modules (Shutova et al., 2013) into MT pipelines. Although promising, these approaches remain at experimental stages.

Research Questions and Hypotheses

Given the complexity of metaphor and the known limitations of current MT systems, our study addresses the following research questions:

  1. RQ1: How effectively do state-of-the-art NMT systems preserve the metaphorical meaning and impact of source language metaphors in target texts?
  2. RQ2: What types of metaphors (conventional vs. novel, culturally specific vs. universal) are more likely to be mistranslated or rendered literally?
  3. RQ3: Can pre-editing (simplifying or annotating source metaphors) or post-editing strategies improve the quality of metaphor translation?

We hypothesize that:

  • NMT systems will often fail to preserve the conceptual integrity of metaphors, frequently opting for literal translations or weak approximations.
  • Conventional metaphors may fare better, as they occur more frequently in training data, while novel and culturally specific metaphors will present greater difficulties.
  • Pre-editing or providing metaphor-awareness signals may enhance MT performance on these expressions.

Methodology

We constructed a parallel corpus of English source texts and their French and Chinese target translations. The source texts included literary excerpts, opinion editorials, and cultural essays rich in metaphorical expressions. We drew on public-domain English literary works available through Project Gutenberg, as well as contemporary newspaper editorials from The Guardian and The New York Times. For each English text, two professional human translators produced high-quality French and Chinese versions.

We then annotated the English texts for metaphorical expressions. Two trained annotators, guided by MIPVU (Metaphor Identification Procedure VU University; Steen et al., 2010), identified metaphorical lexical units and categorized them as conventional (e.g., “light at the end of the tunnel”) or novel (e.g., “hope blossomed between the cracks of despair”). They also noted whether the metaphors seemed culturally specific or potentially universal.

The resulting dataset comprised ~2000 metaphorical instances spanning ~350 texts. Each metaphor was documented with its context, annotation of type, and corresponding human translations into French and Chinese.

We selected three leading NMT systems for evaluation: Google Translate (Transformer-based), DeepL (Transformer-based), and a custom-trained Transformer model built using OpenNMT (Klein et al., 2017) with large bilingual corpora (Common Crawl, Europarl, UN Parallel Corpus) fine-tuned on literary and journalistic data. The custom model’s training served as a baseline and testbed for targeted improvements.

For each source text, we fed the English version into the three systems, producing French and Chinese MT outputs. We ensured no direct overlap between the training and test sets in the custom model and verified that the texts were not contemporary bestsellers likely included in Google or DeepL training.

We employed both quantitative and qualitative evaluation methods.

Quantitative Metrics:

  • BLEU (Papineni et al., 2002) and chrF++ (Popović, 2017) scores assessed general translation quality.
  • Metaphor Preservation Rate (MPR): A custom metric counting how often metaphors were translated into a form recognized as metaphorical by native speaker judges.
  • Metaphor Adequacy (MA): A rating (1–5) assigned by bilingual judges indicating how faithfully the target metaphor captured the source meaning and effect.

A panel of professional translators and linguists conducted a qualitative analysis. They examined a subset of metaphors to determine if the target text preserved conceptual mappings, cultural connotations, and stylistic force. Their notes provided insights into the types of errors and their possible causes.

To address RQ3, we conducted follow-up experiments:

  • Pre-editing: We marked metaphorical expressions in the source text with a special token or provided a brief gloss.
  • Post-editing: Professional translators revised the NMT outputs, focusing on repairing metaphor translations.

Comparing MPR and MA before and after these interventions allowed us to gauge the potential of such strategies.

Results

All three NMT systems produced fluent and grammatically coherent translations, as indicated by BLEU and chrF++ scores in the range of 30–45 for French and 25–40 for Chinese (depending on text domain). However, these general metrics masked the performance on metaphors.

MPR revealed significant challenges: on average, only about 35% of source metaphors were rendered as recognizable metaphors in French, and about 30% in Chinese. The remainder were either translated literally (rendering the metaphors meaningless or absurd) or replaced with bland paraphrases that lost metaphorical force.

MA scores corroborated these findings. On a 1–5 scale (5 being an excellent preservation of metaphorical meaning), the average MA hovered around 2.5 for French and 2.3 for Chinese. This indicates that while partial adequacy was sometimes achieved, true equivalence was rare.

Conventional metaphors fared better than novel ones. For English-French, conventional metaphors achieved an MPR of ~45%, while novel metaphors were preserved only about 25% of the time. Similarly, for English-Chinese, conventional metaphors reached ~40% MPR, with novel metaphors lagging at ~20%.

This supports the hypothesis that frequency and familiarity help NMT systems. Conventional metaphors, being more common, likely appear in training data, allowing models to memorize or learn their idiomatic translations.

Metaphors tied to Anglo-American cultural references (e.g., “breaking through the glass ceiling,” “melting pot”) were particularly challenging. The NMT outputs often produced literal translations, leaving French or Chinese readers puzzled. By contrast, metaphors grounded in more universal imagery (“the seed of an idea,” “stormy emotions”) sometimes transferred more smoothly.

This pattern suggests that when the target culture lacks a corresponding conceptual mapping, the NMT system fails to adapt. Human translators, on the other hand, often replaced culturally specific metaphors with culturally appropriate equivalents or explanations.

Pre-editing the source text by marking metaphors improved the MPR by about 10% and increased MA by about 0.5 points on average. The annotated versions signaled to the model that a given phrase was metaphorical, prompting it to search for figurative equivalents rather than literal translations.

Post-editing by a human expert improved MA drastically, raising average scores from ~2.5 to ~4.0. This confirms that professional intervention can correct metaphor translation errors efficiently, albeit at a cost in time and labor.

Discussion

Our findings affirm the commonly observed weakness of current NMT systems in handling metaphor. Despite impressive gains in general translation quality, metaphors remain a blind spot. The reasons are multifaceted:

  1. Lack of Direct Training Signals: NMT models are trained on massive parallel corpora where metaphors are not specifically annotated. The models learn statistical patterns rather than conceptual mappings, making them prone to literal translations.
  2. Cultural and Contextual Dependence: Metaphors often depend on cultural schemas. Without explicit cultural and conceptual grounding, models cannot easily replace a culturally bound metaphor with an equivalent that resonates in the target culture.
  3. Rarity and Sparsity: Novel metaphors are sparse. Data-driven models struggle to handle low-frequency patterns, and novel metaphors may never have appeared in training data.

These challenges highlight the gap between formal linguistic equivalence and conceptual equivalence. Preserving metaphor is not just about translating words but about transferring a conceptual structure and its emotional or cognitive effect.

Our experiments with pre-editing suggest a route forward. By signaling to the system that a phrase is metaphorical, we guide it to seek figurative rather than literal equivalents. Future research could integrate metaphor detection models (Shutova et al., 2013) upstream of the translation pipeline to automate this step.

Post-editing shows that human expertise can correct metaphor errors. Yet, relying on human post-editing defeats the fully automated ideal. As MT systems evolve, hybrid workflows that combine automatic translation with targeted human input may offer a practical solution for high-stakes texts.

Another promising avenue involves integrating external semantic knowledge and metaphor databases (MetaNet, The Metaphor Map) into NMT models. If the system can recognize a source metaphor’s conceptual domain (e.g., love is a journey) and retrieve a known target-language metaphor from a knowledge base, it might produce more accurate translations.

Moreover, recent advances in large language models (LLMs) suggest that models with broader world knowledge might handle figurative language better. However, LLM-based approaches would need careful evaluation since large-scale pre-training does not guarantee accurate metaphor translation, especially if cultural mappings differ significantly.

Cultural Competence and Multimodal Context

Adding a cultural competence layer, possibly through meta-linguistic signals or region-specific model adaptations, could improve metaphor handling. Additionally, if future MT systems have access to multimodal cues (images, videos) or interactive environments, they might form deeper conceptual associations. For instance, visually grounded models might learn that “blue mood” correlates with sadness imagery rather than a literal blue color.

Our study focused on English as the source language and evaluated translations into French and Chinese. While these pairs reflect linguistic diversity (a Romance language and a Sino-Tibetan language), future research should examine other language pairs, including those with radically different typologies or cultural schemas (e.g., English-Arabic, English-Japanese).

Additionally, our study relied on professional human translations as reference texts. While this is a reasonable gold standard, the creative nature of metaphor translation means there can be multiple acceptable solutions. Future research might employ multiple reference translations or crowd-sourced evaluations to capture a range of acceptable metaphorical equivalents.

Finally, we did not explore the full potential of fine-tuning NMT models on metaphor-rich parallel corpora. A dedicated metaphor-aware training corpus could significantly improve performance. Further work might involve training specialized metaphor translation models or jointly learning metaphor identification and translation tasks.

Conclusion

Machine translation has made striking advances, yet the translation of metaphor remains a challenging frontier. Our study confirms that even advanced NMT systems struggle to preserve the conceptual force and cultural resonance of metaphorical expressions, often defaulting to literal interpretations or semantically weakened paraphrases.

Nonetheless, this research also points to possible solutions. By integrating metaphor-awareness into the MT pipeline—through pre-editing, metaphor detection, external knowledge bases, and cultural adaptation—future models may move closer to human-level metaphor translation. Given the importance of metaphor in literature, journalism, advertising, and cross-cultural communication, improving MT’s handling of metaphors is a crucial step in enhancing the quality and subtlety of automated translations.

As the field progresses, bridging the gap between current MT capabilities and the intricate conceptual world of metaphors will be a test of how well we can encode, model, and transfer the richness of human thought across linguistic and cultural boundaries.

Библиографические ссылки

Bannard, C., & Callison-Burch, C. (2010). The role of syntax in vector space models of compositional semantics. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 976–984).

Blevins, T., Zettlemoyer, L., & Lewis, M. (2020). Improving OOD generalization via multi-task meta-learning. arXiv preprint arXiv:2007.05045.

Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., ... & Zhang, M. (2018). Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the Third Conference on Machine Translation: Shared Task Papers (pp. 272–303).

Chen, D. F., Zhang, Y., & Guo, X. (2013). Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 207–217).

Comrie, B. (1976). Aspect: An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge University Press.

Dickins, J. (2005). Two models for metaphor translation. In Translation and Literature, 14(2), 188–203.

Fadaee, M., Zarar, Y., & Monz, C. (2018). Leveraging linguistic knowledge to improve bilingual multiword expression alignment. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1636–1647).

Fauconnier, G., & Turner, M. (2002). The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. Basic Books.

Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of ACL 2017, System Demonstrations (pp. 67–72).

Koehn, P. (2010). Statistical Machine Translation. Cambridge University Press.

Kövecses, Z. (2005). Metaphor in Culture: Universality and Variation. Cambridge University Press.

Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press.

Laubli, S., & Sennrich, R. (2020). When we have problems, we change something: On the interplay of errors and biases in neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8460–8473).

MIPVU (Metaphor Identification Procedure VU University). Steen, G. J., Dorst, A. G., Herrmann, B., Kaal, A. A., & Krennmayr, T. (2010).

Newmark, P. (1988). A Textbook of Translation. Prentice Hall.

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 311–318).

Popović, M. (2017). chrF++: Words helping character n-grams. In Proceedings of the Second Conference on Machine Translation (pp. 612–618).

Schäffner, C. (2004). Metaphor and translation: Some implications of a cognitive approach. Journal of Pragmatics, 36, 1253–1269.

Shuttleworth, M., & Cowie, M. (2014). Dictionary of Translation Studies. Routledge.

Shutova, E., Teufel, S., & Korhonen, A. (2013). Statistical Metaphor Processing. Computational Linguistics, 39(2), 301–353.

Steen, G. J., et al. (2010). A Method for Linguistic Metaphor Identification. John Benjamins.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems (pp. 5998–6008).

Zhou, Y., Feng, Y., & Zhao, D. (2020). Incorporating external knowledge into machine translation: A survey. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(6), 1–38.

Опубликован

Загрузки

Биография автора

Лайло Матякубова,
Узбекский государственный университет мировых языков

Магистрант УзГУМЯ

Как цитировать

Матякубова, Л. (2024). Оценка эффективности машинного перевода в сохранении метафорических выражений при переводе с исходных текстов на целевые языки. Лингвоспектр, 3(1), 247–254. извлечено от https://lingvospektr.uz/index.php/lngsp/article/view/192

Похожие статьи

<< < 28 29 30 31 32 33 34 35 36 37 > >> 

Вы также можете начать расширеннвй поиск похожих статей для этой статьи.