VLOG - Auto-Tagging, the quest for Compounds!

In this video, I try to explain the concept of Auto-tagging, which can be considered as the task of extracting terms automatically out of unstructured textual data

Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry.

From Wikipedia - Terminology extraction

In this log, I speak about Auto-tagging:

  • Why we need it
  • What is the challenge
  • The different approaches: naive term frequency versus global semantic web-service
  • The key role of compounds (i.e.: double-worded tags)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Tag clouds - what is at stake?

In echo to the ever-increasing popularity of the tag-clouds, the emerging domain of auto-tagging aims to be the solution to populate these attractive visual components without requiring to tag each page individually. While many approaches try to solve this challenge, most of them do not address the real underlying technical challenges. But in order to evaluate how tag-clouds can deliver their full potential, we have to analyze what is at stake for the end-users and understand why tag-cloud could make a real difference in the way to access information.

In this new domain where art can meet with technology, visualization with data mining and repetitive manual efforts with automation, I felt it was interesting to inspect the different components and their roles; and to figure out what is new and what is old, what is solved and what is not, what is possible and what is pure fantasy…

wordle_tag_cloud

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

The tumultuous history of Dialog Systems

Chinese version - 中文

The idea of a Dialog System is probably as old as the field of computer science itself.  It is hard to know if Charles Babbage already thought about it in the 1830s when he created his  Analytical Engine and then his Difference Engine; but it is clear that Alan Turing set the definition of the ultimate Dialog System when he described the Turing Test in his paper Computing Machinery and Intelligence in 1950.
turing_test_version_3
From Wikipedia -   The “standard interpretation” of the Turing Test, in which player C, the interrogator, is tasked with trying to determine which player - A or B - is a computer and which is a human. The interrogator is limited to only using the responses to written questions in order to make the determination.

Turing predicted that machines would eventually be able to pass the test and that 30% of human judges would be fooled in a five-minute test by the year 2000.  Futurist Raymond Kurzweil updated it to 2020 in 1990 and revised it to 2029 in 2005.

This last prediction appears to me as uncertain as any of the prior ones, but many interesting Dialog Systems have been developed already and, thankfully, the market does not need the Turing Test to be passed to start adopting them.

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

How artificial intelligence will revolutionize the customer experience and empower marketing investments

I spoke as Keynote Speaker at the Webcom event in Montréal the 13th of May 2009.

The main goal of this speech was to explain why technologies driven by Artificial Intelligence are going to be the key success factor to provide a positive customer experience online. The main reason I advanced is that the complexity of the task of satisfying all the different of the customer requests properly is way too high to be managed with any traditional approach.

You can find here the main content of this speech: the Presentation slides and the video of the keynote as well as an short interview I gave the same day (unfortunately, the videos are in French).

A positive customer experience can be hard to reach...

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Leveraging a Context for Information Access

contextual_search

“We are getting into semantics again. If we use words, there is a very grave danger they will be misinterpreted.”
Haldeman, Harold Robbins

The semantic web is getting more and more attention, but although the W3C tries to set-up a well-defined standard, its deployment is so far as speculative as is the one of Web 3.0 in general.  On the other side, the search-engine industry increases regularly its ability to provide more comprehensive features going way beyond the simple projection of search keyword into an index of terms. While both sides tend to reach the same result (by restructuring the Internet versus by building a structure over raw data), these are not the two only possible approaches. Others try, for example,  to leverage a context out of the search queries and the navigation behavior to better identify the user intent and therefore provide him with more interesting results.

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Next Page »