Posted on Feb 6, 2008 by in artificial intelligence | 0 comments

Context-Free Tree

What forests would look like if they were context-free… (click here to see source)

In {en:formal language theory}, the concept of context-free is connected to the one of a formal language or grammar ({en:context-free grammar|CFG}), which, in the scope of AI, is mainly used in the field of Natural Language Processing ({en:Natural Language Processing|NLP}). I will discuss here how it can be considered as a major AI concept, but also how understanding its implications can be a helpful way to analyze the value of any AI algorithm.

First things first, what is the formal language theory?

A formal language is an organized set of symbols the essential feature of which is that it can be precisely defined in terms of just the shapes and locations of those symbols. Such a language can be defined, then, without any reference to any meanings of any of its expressions; it can exist before any interpretation is assigned to it –that is, before it has any meaning. – {en:formal language theory|Wikipedia} .

And what is NLP?

Natural Language Processing (NLP) is both a modern computational technology and a method of investigating and evaluating claims about human language itself. Some prefer the term Computational Linguistics in order to capture this latter function, but NLP is a term that links back into the history of Artificial Intelligence (AI), the general study of cognitive function by computational processes, normally with an emphasis on the role of knowledge representations, that is to say the need for representations of our knowledge of the world in order to understand human language with computers(The Natural Language Processing Research Group of the University of Sheffield.).

And finally, what is a CFG (or context-free grammar):

In formal language theory, a context-free grammar (CFG) is a grammar in which every production rule is of the form V → w where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (possibly empty). The term “context-free” expresses the fact that nonterminals can be rewritten without regard to the context in which they occur. A formal language is context-free if some context-free grammar generates it. – {en:context-free grammar|Wikipedia}

Now, all this is a little technical, let’s revisit each of these three with some concrete examples:

A formal language is made of an alphabet and of rules. If we take the example of the formal language “positive natural numbers” which is the language representing any natural number greater or equal to zero, then, the alphabet would be the characters {0,1,2,3,4,5,6,7,8,9} and the rules would be: cannot not start with zero except if zero alone; and any combination of characters is allowed.

Natural language processing is nothing else than the study of texts written in everyday language. Traditionally, we speak about NLP when a grammatical analysis is done, for instance to structure the different components of a sentence. For example, transforming the sentence “bob has a cat” into the tree P – { subject:”bob”; verb:”has”; object:[determinant:”a”; noun:”cat”]} is a typical result of an NLP treatment.

A context-free grammar (in opposition to a context-sensitive grammar) is a grammar which generates always the same results regardless of the current context. Let’s imagine we want a computer to analyse the family relationships between the characters of a novel. If there is more than one character named Paul, using a context-free grammar will not work, as understanding which of the Pauls is in the current context is necessary to make a correct analysis. It would be required to use a context-sensitive grammar and create some rules to let the context be defined properly.

Now, why is this context-free concept such a meaningful one and why should you care about it?

Let’s first consider a very important quote about NLP:

Progress on building computer systems that process natural language in any meaningful sense requires considering language as part of a larger communicative situation. Regarding language as communication requires consideration of what is said (literally), what is intended, and the relationship between the two. -Barbara Grosz, Utterance and Objective

Now, let’s ask ourselves two questions about what Barbara Grosz states here:

  • Why is it about context-free?
    Because what she calls “a larger communicative situation” is the definition of the context itself: regardless of the efficiency and the intelligence of the algorithm analysing the language, the meaning of what is exchanged cannot be understood without considering other aspects. In other words, language is, by definition, not context-free.
  • Is is specific to NLP?
    Not at all. Let’s consider the field of robotics: training a robot to move a glass without breaking it or spilling its content can be quite easy, if it is always the same glass at the same spot. As soon as this changes, the ability of the robot is almost nil, if it has no ability to self-tune its behavior to this change of context. In this scope, we can say that, if we assume that the changes in the environment are not significant (no dependencies to the context), the context-free algorithm will work with a very high performance. In the same logic, if only one person in the book is called Paul, using a context-free grammar with rules like “X had a brother called Y” will work fine, and could rapidly produce high performance results.

In order to define well how context-free is a major concept in AI, in the sense that it defines the ability (in fact the inability) of a system to perform outside of a fixed and well-defined environment, I would propose the following statement:

Instead of considering a system to be Context-free or not, as if it was a binary property, it is much more interesting to consider it as being the domain covered by the system with good performance in an autonomous way.

In other words, every AI system is Context-free in its core scope of good performance and becomes Context-sensitive oustide of it.

Therefore, analysing the Context-free domain, is correlated to analysing how domain-specific the system is. However the two are not at all similar: a system is domain-specific because of the need to configure it in a specific way due to its context-sensitive constraints.

Before going further in the definition of how this new concept can be a very powerful tool to analyse the real value of any AI algorithm, I would like to clarify an important point: Context-free is also not similar to generic. A generic algorithm is an algorithm which doesn’t contain any element adapted to some environments and not others. Now, in AI, adapted doesn’t mean providing good results. Therefore, generic is a dangerous word to use from an AI perspective. Context-free is much more adequate, as, if you connect it with a well-defined measure of performance, the domain it defines will include its level of genericity, but from a performance perspective, not from a conceptual perspective.

Let me give you an example: a system which learns to classify documents into categories by analysing the occurences of single terms by refering to an initial sample list of classified documents can be considered as highly generic: the only domain-specific requirement is the existence of textual content in the documents. But what about its context-free limitations? These are very simple, they are restricted to a domain where analysing single words by their occurences is both possible and meaningful. It becomes context-sensitive as soon as you might need more than just the number of time each word occurs to make the right classification decision.

One could argue that we didn’t make any real progress with this generalisation of the context-free concept. This is completely right, but I would argue that it defines a new perspective in the way we consider the value of an AI outcome: we should not only look if it is generic or not, how well it works what a specific domains, but why and how it must be adapted specifically to different domains to provide good results. This is the value of doing a context-free analysis.

Now, why is it useful to do such analysis? Why defining how context-free a system can give us valuable information to understand its core value? The answer is pretty straight forward, and comes directly from the definition of a context-free analysis:

A context-free analysis is the study of the limitations caused by the initial assumptions an AI system is based on. A system will be considered context-free throughout all the domains where these assumptions do not cause (based on theoretical logic and/or real results outcomes) a lack of performance without a complex configuration requiring specific know-how or information sources.

Again, there is no claim to be scientific here, and therefore, this usage level of context-free greatly differs from its usage in formal language theory where it relies on a scientifically well-defined concept.

To finish with this post, I would like to suggest the following key questions which lead to such analysis, and which can be of great help when defining the value of an AI technology for a specific purpose (I encourage any company willing to purchase an AI products to consider checking them well and thoroughly in order to have a much higher confidence in the adaptation of this product to their needs):

  • What are the data you should provide to the system and how will they be treated?
  • Are these data always available in this exact form and content and, if not, what are the consequences of possible differences?
  • What type of resources, skills, effort and time are needed to configure the system so it can be up and running?
  • How will the system learn by itself and will adapt to changes over time?
  • What additional continuous effort is required to keep the performance at the top over time?

The goal of asking these questions is not to have a checklist validation. In fact, you don’t need to ask them from a specific standpoint, but can do so in a fully generic way. The goal is to retrieve the information required to understand the context-free model behind, because the answers will lead you to discover the true potential of the technology and its adaptation for different domains.

A last point: the need of human configuration is neither a a good nor a bad thing, don’t look at it this way: it is simply an information to consider in order to identify if the required skills are available and if the effort is worthwhile. In other words, requiring 2 years of efforts from high-level engineers, or two weeks from a common user, in order to set up a system properly for a specific environment, is a constraint you should consider in the exact same way that you consider the price of the software itself.    Send article as PDF   

Submit a Comment

Your email address will not be published. Required fields are marked *