VLOG - Auto-Tagging, the quest for Compounds!

In this video, I try to explain the concept of Auto-tagging, which can be considered as the task of extracting terms automatically out of unstructured textual data

Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry.

From Wikipedia - Terminology extraction

In this log, I speak about Auto-tagging:

  • Why we need it
  • What is the challenge
  • The different approaches: naive term frequency versus global semantic web-service
  • The key role of compounds (i.e.: double-worded tags)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere