Leveraging a Context for Information Access

Posted on Feb 17, 2009 by in artificial intelligence, customer experience | 0 comments


“We are getting into semantics again. If we use words, there is a very grave danger they will be misinterpreted.”
Haldeman, Harold Robbins

The semantic web is getting more and more attention, but although the W3C tries to set-up a well-defined standard, its deployment is so far as speculative as is the one of Web 3.0 in general.  On the other side, the search-engine industry increases regularly its ability to provide more comprehensive features going way beyond the simple projection of search keyword into an index of terms. While both sides tend to reach the same result (by restructuring the Internet versus by building a structure over raw data), these are not the two only possible approaches. Others try, for example,  to leverage a context out of the search queries and the navigation behavior to better identify the user intent and therefore provide him with more interesting results.

At its core, the semantic web comprises a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain.

Wikipedia – Semantic Web

As I addressed in past posts, context plaid a critical role throughout the history of artificial intelligence, and even computer science in general. We could resume it to the following 2 statements:

  • It is very easy to do a system which behaves very intelligently in one specific predefined context
  • It is very hard to do a system which behaves just a little intelligently in any context

Let’s illustrate in the domain of information access for web sites:

Some web sites having a possibly very high number of pages succeed to provide positive experience to most of their visitors.  Such success usually doesn’t happen by luck, but comes from an extensive knowledge of customer intuitive understanding and hard work to package the right information properly. They are usually the web sites which appear to be pretty small from the first look, while a complete exploration reveals many pages of content throughout the sections of the site. One of the many example which I could give is www.skype.com, which indexes dozen of thousands of pages, but first appears like it only has a few dozen containing only the most useful information.

Other web sites have a tremendous pain to provide a comprehensive access to their content. It is of course the case of the large portals containing millions of information possibly created by the users themselves, but it also concerns many others, which don’t necessarily have such gigantic amounts of data to manage. Most large organization web sites suffer from this problem: regardless of the efforts they put in restructuring and maintaining their information, they receive continuously feed-backs from unsatisfied visitors who spent a long time trying to find an information they know should be there but can’t put their hand on.

What is the reason behind such differences in user experience and how is it connected the context of the search?

When a visitor comes to the skype web site (whether he is aware of what the product is or not) all his experience and all his information gathering is exclusively linked to a specific predefined context which is the product itself. Therefore, putting information around this product, even a lot of it, and structuring it is quite intuitive. Although there will be still several ways to present categories as well as the information itself, we can easily see that most of them will match with the user understanding fairly well: it’s about a software managing multimedia communication between individual over the Internet, the context is pretty well defined and will never really derive from this basis.

On the other side, for the case of a large organization, the multitude of services, department, information, target groups connected to each segment of information makes it almost impossible to settle for one specific information access structure, it requires to continuously reinvent the organization of the content (in addition to maintaining its accuracy over time) and it systematically fails to satisfy most of the target user groups which feels the structure has not been appropriately designed for their needs. Indeed, the web site needs to manage highly variable contexts connected to concepts which are not clearly defined in the head of the visitors themselves. Providing a comprehensive environment in such wide contextual domain is close to impossible with current tools.

Why the semantic web might not be the magic formula:

As a promise, the semantic web could provide an attractive answer to the second category of web sites having such troubles. Indeed, it should facilitate the categorisation all meta-information into high level structure such as ontology and enable a multitude of different ways to access an single information. Technologies such as the one of Endeca would then provide the flexible guidance contextual filter enabling any types of navigation paths.

However, the creation and maintenance of this complex semantic structure could easily become an unmanageable task if done manually. We would therefore be in the situation where the technology would work perfectly would the data be well structured, which turns out to be impossible. Relying an automated processes to generate this semantic structure doesn’t seem realistic either for the time being, as to leverage a global semantic resource to extract the required structure and mapping is not an easy task either.

Why washing out all meta-data and recreate a new artificial structure from user behavior migh not be so great either:

Although everyone enjoys the increased functionalities of Google-like search engine correcting mis-spelling, completing search while typing or refining search with other popular searches, most of you probably agree that it is unlikely to be the solution to the problem of semantic search. First the fact that many people type something doesn’t mean that this is what I am interested in, nor that Google has interesting result for it, and second I am more interested in differentiating the various types of information contained in the search results, rather than to follow the predefined search of someone else not knowing what I filter out by doing so. In that logic, the attend of the newest search platforms such as Twitter Search or Cuil seams quite interesting as they obtain these information from the existing structure of the data sources instead of recreating it artificially from the user behaviors. However, we can see that the task is far from being easy and that the tryouts of these two actors only indicates that either their mission is too broad or that their technology is not enough developed. In the same logic as before regarding the contextual relevance of the difficulty of a task, we can come to the conclusion that their context is too wide as they try to leverage the entire Internet in a semantic way.

A more reasonable and pragmatic approach to leverage context for information access:

The approach others took is the one of leveraging the available semantic information and to extract as much value as possible from them, even if they are limited or ill-defined.

Let’s take the example of an office supply company selling products online I was recently involved in a project with. In addition to having a discussable 3 level data structure containing different kinds of data in the 2 lower levels (varying from colors to brands depending on the parent node), their labelling was quite unclear (e.g.:  “paper, sticker and envelopes” leading to a collection of 3 types of paper, 2 of stickers and 1 of envelope).

We can see how leveraging atomic concepts from such environments is fairly difficult as they are contained several times in different places and joint with other concepts. We can feel the required effort to move from such data structure status (which I believe represent the vast majority of the web sites today) to a well defined and well structure ontology-based structured defining every possible property.

In order to provide a real answer to such problems and exploit all available categorization to its maximal value without requiring any manual data configuration, a good approach is the one of mixing search and category navigation together:  enabling the user to search for what he is looking for (e.g.: “paper”) and to let the system select all the relevant existing meta-data related to this query and to organize them into a sequence of contextual guidance steps pruning naturally all the unadapted candidates from each category selection (I concept I will often refer as “search-guidance”).

Another advantage of this approach is its evolutionism and coverage of all the advantage of the existing functionalities of the traditional search engines:

  • Would you decide to enhance the semantic information of your data, the system could naturally leverage it as well (he can do so with unstructured meta-data and will do it even better with well structured meta-data)
  • Would your customers use specific search terms more and access to some information more, all the usage behaviour features could still be leverage to rank and suggest keyword refinement when no other guidance is possible anymore. In addition, this analysis would be connected to well defined terms having a presence in the meta-data and not any keyword people use, therefore guarantying the system to recommend keyword for which he can generate proper results and guidance.

Would you want to know more, I invite you to check out what we are doing at Guidyu, as we have been very active on this topic.

www.pdf24.org    Send article as PDF   

Submit a Comment

Your email address will not be published. Required fields are marked *