ინგლისური ენის სასწავლო კორპუსის შექმნა: ლექსიკური სიმდიდრე და სიტყვათხმარების თავისებურებები ინგლისური ენის ქართველ შემსწავლელებში
Date Issued
2024
Author(s)
Makhatadze, Marine
Advisor
Rusieshvili, Manana
Abstract
The title of our work is "Compiling English Learner Corpus: Lexical Richness and Peculiarities of Word Usage in Georgian Learner English."
To study lexical richness, word usage, and norms of linguistic behavior, we have created a corpus of Georgian learners of English (GLEAN) consisting of up to nine million words. This corpus includes English written data of different genres performed by Georgian learners of English.
The present thesis will study the aspects of language realization of Georgian learners of English, as well as the consistent and multi-purpose (mostly lexicographic) collection and processing of linguistic data. Therefore, the research goals of the work are:
• Structural classification of collocations based on the frequency-based approach and the significance-oriented approach, systematization of functional taxonomy of three- and four-word lexical bundles;
• Through a contrastive interlanguage analysis: a) to demonstrate the peculiarities of the use of phraseological units and collocations (adverb + adjective, verb+noun, etc.) in the ICNALE and GLEAN and MICUSP corpora; b) based on statistical measures, which phraseological units are overused or underused in the writings of Georgian students; c) to present some effective lexicographic ways through which the learner corpus data is applied into the English-Georgian dictionary microstructure, for example some usage notes are modified and included to the dictionary entries.
• To present a dictionary entries of 30 multi-word lexical bundles based on the learning corpus created by us. This material can become part of a specialized academic writing (EAP) dictionary.
The empirical base includes the written material of 65 Georgian learners of English. The data represents a different genre repertoire of a foreign language. The number of texts and tokens in the learner corpus is distributed as follows: the corpus contains 34,527 texts, and the number of tokens is 9,812,931.
In Chapter I - "Learner Corpus as a Resource for Interdisciplinary Research" - we touch upon the theoretical foundations of learning corpus research and methodology, both its theoretical and practical aspects.
In Chapter II - "Principles of Creating a Corpus of Georgian Learners of English" - we discuss the selection criteria of texts included in the corpus, genre classification, and details of data collection—also, reasons and criteria for selection of international study corpora for contrast analysis. In the training corpus, we have identified four categories of genres (essay, academic, journalistic, conversational) and their subcategories. The genre of essay writing includes argumentative essays, narrative essays, descriptive essays, and free composition essays. The academic genre includes field research articles in linguistics and literature. In the publicistic genre, news chronicles, reports, and newspaper articles on political and non-political issues are combined, and in the conversational genre of blogs and correspondence (informal and semi-formal emails). In chapter III - "Analysis of the corpus of Georgian learners of the English language (GLEAN)" - the subject of our research is the structural-semantic analysis of two-component expressions; we paid particular attention to the research of the booster and maximizer collocations (very, utterly, extremely, etc. + adjective). Functional discourse analysis of three- and fourcomponent lexical phrases is also presented.
Chapter IV - "Learning Corpus of Georgian Learners of English Language (GLEAN) as a Source for Compiling Word-Articles of English-Georgian Specialized (EAP) Dictionary" - refers to the macro- and micro-structural description of lexical word articles of 30 multi-component lexical phrases based on the learner corpus. Each word article element is based on D. with minor modifications—the principles of reflection in the dictionary of head-phrases proposed by Sipman.
To study lexical richness, word usage, and norms of linguistic behavior, we have created a corpus of Georgian learners of English (GLEAN) consisting of up to nine million words. This corpus includes English written data of different genres performed by Georgian learners of English.
The present thesis will study the aspects of language realization of Georgian learners of English, as well as the consistent and multi-purpose (mostly lexicographic) collection and processing of linguistic data. Therefore, the research goals of the work are:
• Structural classification of collocations based on the frequency-based approach and the significance-oriented approach, systematization of functional taxonomy of three- and four-word lexical bundles;
• Through a contrastive interlanguage analysis: a) to demonstrate the peculiarities of the use of phraseological units and collocations (adverb + adjective, verb+noun, etc.) in the ICNALE and GLEAN and MICUSP corpora; b) based on statistical measures, which phraseological units are overused or underused in the writings of Georgian students; c) to present some effective lexicographic ways through which the learner corpus data is applied into the English-Georgian dictionary microstructure, for example some usage notes are modified and included to the dictionary entries.
• To present a dictionary entries of 30 multi-word lexical bundles based on the learning corpus created by us. This material can become part of a specialized academic writing (EAP) dictionary.
The empirical base includes the written material of 65 Georgian learners of English. The data represents a different genre repertoire of a foreign language. The number of texts and tokens in the learner corpus is distributed as follows: the corpus contains 34,527 texts, and the number of tokens is 9,812,931.
In Chapter I - "Learner Corpus as a Resource for Interdisciplinary Research" - we touch upon the theoretical foundations of learning corpus research and methodology, both its theoretical and practical aspects.
In Chapter II - "Principles of Creating a Corpus of Georgian Learners of English" - we discuss the selection criteria of texts included in the corpus, genre classification, and details of data collection—also, reasons and criteria for selection of international study corpora for contrast analysis. In the training corpus, we have identified four categories of genres (essay, academic, journalistic, conversational) and their subcategories. The genre of essay writing includes argumentative essays, narrative essays, descriptive essays, and free composition essays. The academic genre includes field research articles in linguistics and literature. In the publicistic genre, news chronicles, reports, and newspaper articles on political and non-political issues are combined, and in the conversational genre of blogs and correspondence (informal and semi-formal emails). In chapter III - "Analysis of the corpus of Georgian learners of the English language (GLEAN)" - the subject of our research is the structural-semantic analysis of two-component expressions; we paid particular attention to the research of the booster and maximizer collocations (very, utterly, extremely, etc. + adjective). Functional discourse analysis of three- and fourcomponent lexical phrases is also presented.
Chapter IV - "Learning Corpus of Georgian Learners of English Language (GLEAN) as a Source for Compiling Word-Articles of English-Georgian Specialized (EAP) Dictionary" - refers to the macro- and micro-structural description of lexical word articles of 30 multi-component lexical phrases based on the learner corpus. Each word article element is based on D. with minor modifications—the principles of reflection in the dictionary of head-phrases proposed by Sipman.
Degree Name
PhD in Philology
Degree Discipline
Philology
File(s)![Thumbnail Image]()
Loading...
Name
ინგლისური ენის სასწავლო კორპუსის შექმნა ლექსიკური სიმდიდრე და სიტყვათხმარების თავისებურებები ინგლისური ენის ქართველ შემსწავლელებში - მახათაძე, მარინე.pdf
Size
1.66 MB
Format
Adobe PDF
Checksum
(MD5):5a4ff175794319fc2f7726ba30887f49