Harnessing the Latest Corpus-based Approaches for Research

Seminar on Corpus: Promises, Challenges, and Research Directions

Date: 13 March 2017 (Monday)
10:00 am – 12:30 pm
Venue: Tai Ning Hall, 12/F, Block A, The Open University of Hong Kong
30 Good Shepherd Street, Homantin, Kowloon [Map]

Speaker: Professor Tony McEnery
Director of Research,
Economic and Social Research Council
Distinguished Professor,
Department of Linguistics and English Language,
Lancaster University

Professor Tony McEnery is Director of Research at the Economic and Social Research Council and Distinguished Professor of Linguistics and English Language at Lancaster University (United Kingdom). He has worked with scholars from a broad range of subjects, including accountancy, criminology, international relations, religious studies and sociology. He has also worked with an array of impact partners including British Telecom, the Department of Culture, Media and Sport, the Environment Agency, the Home Office, IBM and Research in Motion. He was also Dean of Arts and Social Sciences at Lancaster, and before that Director of Research at the Arts and Humanities Research Council.


What is new in corpus linguistics? How does what is new help us to answer new research questions, or gain insight into existing areas of investigation? In this talk, by looking at learner language, I will show how innovation in corpus linguistics can be driven along by two powerful drivers - new data and new techniques. 


New data is necessary because sometimes the research questions we have do not match well the existing datasets we are working with. Consider the following questions that we may, quite reasonably, ask of learner English:

•   Does cultural and linguistic background affect learner speech — and if so how?
•   What impact may age have on learner production?
•   Is gender a linguistically important feature when exploring the speech of learners of English?
•   How does learner language production vary by task type?
•   Is learner language different when a learner is leading an interaction as opposed to being led through an interaction by a person who is proficient in the language?

These questions may only partially, if at all, be answered using the available learner corpora. No new analytic techniques will help address these questions — you need to get the right corpora that allow you to investigate them. I will show how new data, in the form of a multi-million word corpus being developed at Lancaster University, with Trinity College London, may address these issues. By exploiting this large, orthographically transcribed, corpus of learner speech, amply provided with plentiful relevant metadata, we can gain fresh insights into learner speech. I will show this by overviewing the latest findings from the Trinity Lancaster Corpus, reflecting on the tasks the speakers engaged in and the range of metadata we have available for those speakers. The findings will use a range of metadata to show how, when considered singly and in groups, that metadata can give us answers to questions such as those outlined.


Yet even with the right data, we may find ourselves not able to address a question, or to explore it fully, because the tools we can use on the data do not permit the investigation of the questions we wish to address. In such cases we need new tools — in this talk I will show how one such tool, Graphcoll, allows for the exploration of networks of meaning in learner language in a way which allows the investigation of the development of learner language across levels of proficiency.


Corpus linguistics is about data and tools. To innovate and push the field forward, we need to focus on developing new tools, and new datasets.


Enquiries: Ms Trace Chui (The Open University of Hong Kong)
Tel: 2768 5796
Email: tchui@ouhk.edu.hk
Co-organized by: The Open University of Hong Kong
Caritas Institute of Higher Education

This event was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/IIDS16/H01/16).

Online registration has ended.