Module Descriptor School of Computer Science and Statistics
|Module Name||Advanced Computational Linguistics: |
Machine Learning Techniques in Machine Translation, Speech Recognition and Topic Modelling
|Module Short Title||Advanced Computational Linguistics|
Lecture hours: 22
|Module Personnel||Dr Martin Emms|
The aim is to give a grounding in so-called unsupervised machine learning techniques which are vital to many language-processing technologies including Machine Translation, Speech Recognition and Topic Modelling. Whilst studied in these contexts, the techniques themselves are used much more widely in data mining and machine vision for example.
Probability basics on collections of variables with discrete outcomes (what word, what topic etc) in particular joint, marginal, and conditional probabilities; the chain rule; relative frequences as maximum likelihood estimators
|Recommended Reading List|
I will be providing notes, sometimes directing attentions to particular chapters from the following books, as well as possible online sources
Kevin Murphy's book 'Machine Learning: A Probabilistic Perspective'
Russel and Norvig's book 'Artificial Intelligence: A Modern Approach'
Jurafsky and Martin's book 'Speech and Language Processing'
Phillip Koehn’s book ’Statistical Machine Translation’
associated site: www.statmt.org/book
Manning and Schutze's book 'Foundations of Statistical Natural Language Processing'
note by Michael Collins on IBM models www.cs.columbia.edu/~cs4705/notes/ibm12.pdf
No pre-requisite: to implement and experiment with tools will need to be able to program in C++
Initial exams: Examination: 70% Course Work: 30%
Supplemental: 100% examination
|Academic Year of Data||2018/19|