Welcome to our corpus linguistics page. While just an introductory look at the world of corpus linguistics, I hope it will provide some tools for you to consider using as your interests and time allow.
Developed by Lawrence Anthony just down the street from us at Waseda, this is a very useful set of tools. Here is his webpage, on which you'll find lots of information in addition to the various types of software (including AntConc) that he has developed. For our purposes in this course, here is the AntConc webpage.
Lest this all seem beyond comprehension, Dr. Anthony has provided a series of tutorials available on YouTube.
AntConc 3.4.0 Tutorial 1: Getting Started
AntConc 3.4.0 Tutorial 2: Concordance Tool - Basic Features
I will readily admit that the Keylist tool was a mystery the first time that I tried it. Thankfully, this explains it nicely.
I'll leave it up to you to search for more helpful tutorials.
Corpus @ Brigham Young University
Here you'll find a variety of corpora from different languages, genre, and areas. As you'll see on the webpage, Mark Davies is the eminent gentleman behind this massive undertaking.
As you might have suspected, YouTube again has several very informative tutorials.
The Compleat Lexical Tutor
Another option is Tom Cobb's webiste called The Complete Lexical Tutor.
Building a Corpus
At some point you might find yourself in need of a specialized corpus. Of course, you could simply compile a very long text file all by yourself, which is of no great difficulty. Should you need to convert a text format from, say, PDF to simple text, one option is a free conversion website called Zamzar. A second option is to simply use the 'save as' routine.
Statistics and Such Cruel Things
A quick and dirty statistics note here: you will encounter the term log-likelihood when assessing whether the difference in frequency between your target text and a given corpus is statistically significant. As usual, the two significance levels of note in our field are p < .05 and p < .01. When pondering log-likelihood results, values in excess of 6.63 indicate statistically significant results at p < .01 and those in excess of 3.84 hit the p < .05 mark.