Portal:Computational linguistics

From Wikiversity
Jump to navigation Jump to search

Welcome to the Wikiversity Center for Computational Linguistics.

Schools:

Topic:Lexicography redirects to here.

Summary[edit | edit source]

The Center for Computational Linguistics is a Wikiversity content development project where participants create, organize and develop learning resources for Computational Linguistics. This general goal intersects the Schools of Computer Science and Linguistics. It relates also to Translation, Multilingual Studies and other topics.

Specific Goals[edit | edit source]

This content development project is concerned with learning activities for Computational linguistics. We need learning activities that will help learners:

  • To get familiarized with basic terminology of the field.
  • To get to know different experiences on this field related to Mediawiki: conjugators, bots, multilingual websites approach such as WiktionaryZ,...
  • To practice in your own computer software tools, such as Natural language Toolkit and Apertium.
  • To propose, discuss or even develop new applications which can be used with Mediawiki, especially to improve projects such as Wiktionary, language learning methodologies in Wikiversity or language learning books in Wikibooks.

Concepts to learn include: /concepts

Learning materials[edit | edit source]

Mini-icons of 10*10 pixels.

Learning materials and learning projects are located in the main Wikiversity namespace. Simply make a link to the name of the lesson (lessons are independent pages in the main namespace) and start writing!

You should also read about the Wikiversity:Learning model. Lessons should center on learning activities for Wikiversity participants. Learning materials and learning projects can be used by multiple projects. Cooperate with other departments that use the same learning resource.

Lessons[edit | edit source]

Brainstormed list of possible lessons:

  • Lesson 1: What does Computational Linguistics mean?
  • Lesson 2: Computational Morphology
    • The conjugator based on templates: An example in Wiktionary
  • Lesson 3: The corpus (corpus linguistics)
    • What is it? What can it be used for?
  • Lesson 4: The parser
  • Lesson 5: OmegaWiki as a corpus or lexicon
  • Lesson 6: Audio interfaces and the relationship between sound and meaning
  • Lesson 7: Human/Machine interfaces and linguistics framework
  • Lesson 8: Language acquisition for youngsters and their machines
  • Lesson 9: Computational applications for foreign language learning
    • The multilingual platforms: user preference selection.
  • ...Lesson brainstorm continues...

Remember: All actual learning resources should be on pages in the main namespace (page names with no prefix).

Alternative[edit | edit source]

First course — Introduction
  • Introduction
    • Including Unix for Poets (how to mangle text)
  • Lexical analysis
    • Morphological analysis
      • Finite state automata and transducers
        • Tour of free-software packages (including at least SFST and lttoolbox)
        • Paradigms and lemma-paradigm pairs
      • Two-level morphology
      • POS tagging
        • HMMs
  • Syntactic analysis
      • Finite state grammars
  • Semantic analysis
    • Word sense disambiguation
  • Machine translation
    • Sub-fields: Direct, Transfer, Example-based, SMT
      • Practicals on creating MT systems for a given pair of languages within the RBMT/Transfer paradigm (using Apertium), and in the SMT paradigm (using GIZA++/Moses)
Second course — Probabilistic methods

Activities[edit | edit source]

Readings[edit | edit source]

Each activity has a suggested associated background reading selection.

References[edit | edit source]

Additional helpful readings include:

Active participants[edit | edit source]

If you are an active participant here, please resign here every six months to a year. You can see past participants here. Active participants in this Learning Group

  1. --Copyleft 22:29, 22 June 2009 (UTC)
  2. -- ...

See also[edit | edit source]