国产精品麻豆欧美日韩ww_欧美日高清视频_亚洲精品成人久久久_久久精品国产清自在天天线

打印本文 打印本文  關閉窗口 關閉窗口  
英國國家語料庫(BNC)介紹
作者:admin  文章來源:本站原創  點擊數  更新時間:2011-11-16  文章錄入:admin  責任編輯:admin

 

How the BNC was created

The BNC project was carried out and is managed by the BNC Consortium, an industrial/academic consortium led by Oxford University Press, of which the other members are major dictionary publishers Addison-Wesley Longman and Larousse Kingfisher Chambers; academic research centres at Oxford University Computing Services (OUCS), the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University, and the British Library's Research and Innovation Centre. The project was funded by the commercial partners, the Science and Engineering Council (now EPSRC) and the UK government's Department of Trade and Industry under the Joint Framework for Information Technology (JFIT) programme. Additional support was provided by the British Library and the British Academy.

Creation process in brief

The creation of the corpus started with a careful planning stage where the design principles were drawn up. These principles included the selection criteria that were used as the basis for the collection of the texts (a separate section describes the selection criteria for the written and the spoken parts of the corpus).

Once a suitable texts was identified and permission to use it had been obtained, the text was converted to machine readable form. The conversion was performed by one of the commercial partners (OUP, Longman or Chambers). The resulting text was then converted to the standard project encoding format at OUCS, where its accuracy and internal consistency was also validated. The text was then passed to UCREL, where word class tagging was automatically added, and returned to OUCS for documentation and accession into the corpus. Each stage of corpus processing was recorded in a database maintained at OUCS.

Work on building the corpus commenced in 1991 and was completed in 1994. The first general release of the corpus for European researchers was announced in February 1995. After the completion of the first edition of the BNC, a phase of tagging improvement was undertaken at Lancaster University with funding from the Engineering and Physical Sciences Research Council (Research Grant No. GR/F 99847). This tagging enhancement project was led by Geoffrey Leech, Roger Garside and Tony McEnery. Correction and validation of the bibliographic and contextual information in all the BNC Headers was also carried out for this second version of the corpus, known as the BNC World Edition. BNC World was made available for world-wide distribution in 2001. In response to user feedback, the original SGML version of the corpus was later converted into XML. Additional mark-up for lemma and simplified word-class annotation was added and the treatment of multi-word units was improved. Minor errors and inconsistencies were also corrected. BNC XML Edition was released in 2007. Two sub-sets from the corpus have been published separately: the BNC Sampler and the BNCBaby.

web address:

http://www.scottishcorpus.ac.uk/cmsw/ 

more corpus addresses:

/Article/201111/2702.html

 

    ■點擊→英語疑難問題·綜合解答

 

上一頁  [1] [2] 

打印本文 打印本文  關閉窗口 關閉窗口