Intelligence is in your users, not in your data

homer-intelligence

Text mining has been around for a while now and has not delivered its full promise. While this is more or less systematically so in this great domain which we call computer science, this case is worth mentioning because of the extent of the failure. For instance, when the SQL language was invented, the idea was to make it become a natural query language which anyone can use. Even if it obviously failed to deliver such promise, it became nevertheless one of the most important everyday tools for the vast majority of programmers and almost none of them would imagine working without it. But with most domains related to artificial intelligence, and text mining is no exception, not only did the initial promise was not delivered, but we pretty much manage and exploit textual data the same way as we always did, without having the habit to use any text mining tool in our every day programming. Regular expression are still use much more commonly than any text mining function, even if they are complex in their format and limited in their scope. Text mining is still reserve for very important and very expensive projects which pretty much never work. And when they do provide a real value for a moment, they don’t manage to extend enough to change the paradigm: if it is not structured data, it is not usable data.

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Send post as PDF to PDF | PDF Creator | PDF Converter

Prediction series #1: Semantic Search by 2015

Being a big fan of science fiction, and more especially of the great Isaac Asimov, I regularly feel, when reading his books, the urgency of writing novels as I used to do in my teenage. However, lacking both the time and the skills required to produce anything of interest and quality for the time being, I decided to start writing prediction posts in my domain of expertise. Far from pretending my predictions to become true in any way, I write them for the sake of the exercise and to open a little our imagination further than it is commonly done in the news and research studies. Also, it will be fun to check in a few years how much I was wrong both in missing the real achievements and predicting pure fantasy.

I will start small with this first post of my prediction series with Semantic Search (a field I pretend to know quite well) and for a quite short period of time (5 years). However, one could argue that this is the most difficult way to start and to build any credibility if I happen to be totally wrong. How could anyone have any trust in any of my other predictions posts, if this one is turns out to be a complete fiasco? Well, I don’t really have any answer for that, but I will say that I have to start somewhere and if I turn to be a very poor teller of the near future to the eyes of everyone, then so be it.

isaac_asimov_on_throne

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Send post as PDF to PDF | PDF Creator | PDF Converter

VLOG – Auto-Tagging, the quest for Compounds!

In this video, I try to explain the concept of Auto-tagging, which can be considered as the task of extracting terms automatically out of unstructured textual data

Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry.

From Wikipedia – Terminology extraction

In this log, I speak about Auto-tagging:

  • Why we need it
  • What is the challenge
  • The different approaches: naive term frequency versus global semantic web-service
  • The key role of compounds (i.e.: double-worded tags)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Send post as PDF to PDF | PDF Creator | PDF Converter

VLOG – Search & Tag Clouds

This is my first video log. I will try to address topics more openly and frequently this way.

In this log, I speak about Search and Tag Clouds:

  • Difference between Search and Tag Clouds
  • Similarity between Search Clouds and Search Autocomplete Suggestions
  • Potential to extend their usage for navigation
  • Challenge to mix search and tags Clouds together (usage versus document frequency)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Send post as PDF to PDF | PDF Creator | PDF Converter

Tag clouds – what is at stake?

In echo to the ever-increasing popularity of the tag-clouds, the emerging domain of auto-tagging aims to be the solution to populate these attractive visual components without requiring to tag each page individually. While many approaches try to solve this challenge, most of them do not address the real underlying technical challenges. But in order to evaluate how tag-clouds can deliver their full potential, we have to analyze what is at stake for the end-users and understand why tag-cloud could make a real difference in the way to access information.

In this new domain where art can meet with technology, visualization with data mining and repetitive manual efforts with automation, I felt it was interesting to inspect the different components and their roles; and to figure out what is new and what is old, what is solved and what is not, what is possible and what is pure fantasy…

wordle_tag_cloud

(more…)

del.icio.us Slashdot Digg Technorati Google Windows Live Yahoo Sphere

Send post as PDF to PDF | PDF Creator | PDF Converter
Next Page »