Review of Kurzweil’s “How to Create a Mind”

Kurzweil has a solid reputation as an inventor of technically-advanced products that have very practical use. He is also a famed a futurists, and a shrewd businessman who has without a doubt learned how to capitalize, popularize, and monetize his own and other’s ideas and visions: some brilliant, some not so much according to skeptics.

As the New Yorker recognized, Kurzweil’s critics have not always been kind; PZ Myers, a renowned biologist once indicated that he is a genius… and one of the greatest hucksters of our time. The author of “Gödel, Escher, Bach,” Pulitzer Prize winner Doug Hofstadter said reading one of Kurzweil’s books was like mixing together good food with dog excrement: ultimately you can’t tell the good from the bad.

The astute reader will be aware of commercialization and hyperbole but not be dissuaded by it. Rather, I suggest you read to enjoy the broad strokes and general principles behind the ideas presented and use them as a catalyst to explore the various aspects he put together in an attempt to explain one of many possible approaches to achieving human-like artificial intelligence- that particular goal only one of several possible paths to self-directed thinking, perhaps consciousness, and sentience in a machine. See our Introduction to Artificial Intelligence for a brief overview of the various AI perspectives.

May Kurzweil’s collection of ideas inspire your imagination.


Kurzweil subscribes to the theory that Artificial Intelligence machines will soon be equaling the power of human thought-with all of its complexities and richness- and perhaps even outstripping it.

The rather broadly held theory is lent credence by some two major turning points;- In 1997, Gary Kasparov was beaten at Chess by Deep Blue of IBM, and in 2011, Watson an Artificial Intelligence machine also of IBM beat Brad Rutter and Ken Jennings in the Jeopardy Chess matches.  He uses these two events to support the argument that the neuro-networks responsible for higher level/ hierarchical thinking (known as the Neocortex) actually have simple principles that can be well replicated, and that some of the more advanced AI machines such Siri- iPhone’s voice recognition software- and the aforementioned Watson already have the pattern recognition scheme used in their installed “brain”.

Kurzweil explains that this pattern recognition scheme is naturally hierarchical, meaning that lower-level patterns that pick minute inputs from the surroundings combine, triggering higher-level patterns picking more abstract categories that must be taught. Also, information moves upwards and downwards, causing feedback between higher and lower order patterns in a theory called the Pattern Recognition Theory of the Mind (PRTM), similar to the design of our best AI machines, and with a little tweaking- Kurzweil continues- will make it possible to design computers that match human thought, with such features as Identity, consciousness, and free will by 2029, eventually outstripping even human capabilities since they don’t have such biological incapacities as will be explained later. This advance, though, will allow us to use technology to update our neurochemistry in a merger Kurzweil calls the “singularity”.

It should be pointed out to the reader of this review, that the Singularity has morphed into several definitions. Originally conceived it simply meant the point at which machine intelligence surpasses human intelligence. Machines have concepts and thought beyond our comprehension, developing even faster and smarter machines further separating us from the new masterminds of the universe. See more on our treatment of that in the Human Extinction: Risks to Humanity section.


The ability to reason, analyze and prioritize enables mammals to think abstractly, as well as be predictive so we can processes, manipulate and store information from which we can adapt to or change a surrounding based on what we have learned about it. This intelligence comes from the Neocortex, which was added to previously existing sections of the brain by evolution.


The Neocortex gives mammals like humans the ability to think hierarchically and to understand singular parts of larger groups, groups that also belong to much bigger groups, and so on, helping us survive and thrive in two ways; It gives us a detailed and precise likeness of our surroundings and allows us to understand and adjust to the surroundings as our thoughts climb the levels of hierarchies, becoming more abstract and complex. The lack of the Neocortex- some scientists believe- contributed to the extinction of dinosaurs. Mammal Neocortex differ in size and development and account for 80% of the weight of human brain.

Neuroscientist Henry Markram of Switzerland deduced that the Neocortex can be reduced to a single thought process- hierarchical thinking- because of its uniform structure, as found out in a study where he scanned mammalian Neocortexes in search of neural assemblies. He indicated that the Neocortex appeared to be constructed of Lego-like collections of several dozen neurons in layers, connected to similarly structured super-assemblies connected to yet a higher layer of neuronal collections, and so on until the highest level represented the entire brain.  He is now a Director at the Blue Brain project, intent on recreating the complexities of the human brain, beginning with a trial on rats.


 The Pattern Recognition Theory of Mind (PRTM)

The author, borrowing from others before him, says that each layer of neural assemblies stands for a pattern recognizer that finds hierarchically organized information in the surroundings whether auditory, linguistic or any other information. Neural assemblies are pre-organized and innate, but are taught at each level of the neural assembly, incorporated with exact information. Human higher level thinking uses some 30 million recognizers and writes all information into different levels of neural assemblies in our brains.  For example, on a human face the mouth and nose are recorded at a different neural assembly from the entire face such that even if some facial parts are absent, a face can still be recognized especially if enough parts of it are available to trigger a recognizer and send the information to the next upward level.


Before a pattern recognizer at one hierarchical level triggers another one higher, they prime it before sending signals back to recognizers at the next-lowest level, to prime and prepare their senses for firing. In this instance, if a person’s eye is detected, the recognizers for the face will be primed before signaling to those representing other parts of the face to detect given features. The author considers this predictive.

Pattern recognizers communicate with positive or negative signals to encourage or hinder firing depending on the possibility of a given pattern to exist and whether they come from lower or higher conceptual levels.

Every new or change in a sensory scenario is detected by the brain and is saved given a new pattern recognizer. Some, like different expressions of a relative are saved multiple times while redundant ones, like a face not seen for ages are eventually replaced to save storage space. This replacement causes memory to fade away slowly to the extent that a face seen before is no longer remembered. Pattern recognizers have a redundancy factor of about 100 to 1 depending on importance (like between relatives and first sighting).

This example is exclusive of the great abstraction levels that we reach with alarming regularity and means. According to the author we might not, for example, remember a reason for laughing yet remember that we did laugh. We must also note that these signals are sent at very high speeds and pattern recognizers fire across many given faculties at any given time.

The reach and presence of the Pattern Recognition Scheme

As can be seen below, different mental capabilities from the Neocortex are found in multiple brain parts, and other parts of the Neocortex are available to perform tasks that are assigned to any other parts should it be found that the said parts are damaged or missing from birth (brain cells in various locations can be “taught”, or rather learn to be multifunctional if necessary for survival. This is known as neural plasticity and has even been found in people having congenital defects.


Introducing Speech Recognition to Artificial Intelligence

As Kurzweil shows, advanced artificial intelligence machines and software programs already use the processes described of the Neocortex above.

When the author and other computer scientists first moved into the uncharted territory of artificial intelligence, they sought to solve problems using predefined intelligent solutions and  programmed these problem types and solutions into a computer to be applied to arising problems as they came. Speech to text conversion (1980’s) was first tackled in this way- recording digital patterns which the program would try to match against human voice inputs. But since enunciation and pronunciation differ between people of different nationalities or races, or even with one person as they age, this method quickly became impracticable- too many variations would be needed in the “answer” databank.  Kurzweil then tried another technique known as vector quantization: to summarize or reduce human speech into 1,024 points/ iterations.

He then recreated what goes on in a person’s brain while they spoke and simulated this so that the computer could identify new units of speech, as well as variations in enunciation and pronunciation using a technique very mathematical in nature known as the Hidden Markov Model which could “infer a hierarchy of states with connections and probabilities.”

With this done, he sought to set parameters of unknown data points and their organizational hierarchies, using the biological evolution and cross-bred multiple ‘solution organisms’ (genetic codes of multiple parameters) which even had mutations that were not definite, or properly defined in their parameter values. Multiple cross-breeding tests were conducted, where in the best resultant designs were set aside and used for setting parameters for the Hierarchical Hidden Markov Model (HHMM). This HHMM was trained with speech samples from people of different nationalities and races, and who had unique accents to learn “the likelihood that specific patterns of sound are found in each phoneme, how the phonemes influence one another, and the likely orders of phonemes.” At the end of the day, the HHMM discovered/ learned that there were different rules, which were very different yet delicate, but more importantly were much more useful than the previous hand-coded rules used. In short, as Kurzweil and team combined HHMMs to simulate the cortical organization that accompanies human learning and a genetic algorithm to simulate the biological evolution that gave rise to a particular cortical design. Both of these are self-organizing procedures. This became the cornerstone of subsequent speech recognition works and research, and is being used in other areas of AI like speech simulation and knowledge of natural languages.

The need for both self-organizing and pre-programmed systems

While self-organizing systems are generally more advanced than pre-programmed ones, Kurzweil says artificial intelligence machines are incorporated with both, especially because the pre-programmed systems are much faster when handling familiar information and present a good basis for lower conceptual levels of hierarchy. These two advantages over the otherwise more advanced self-organizing systems enable the self-organizing system to learn much quicker than it would do on its own, and be ready for practical use much faster.  Combining both optimizes an effective AI machine. After the self-organizing system has fully learned, it’s expected that the pre-programmed system will be discontinued.

Watson; The Most Advanced Machine in AI

According to Kurzweil, Watson is an AI machine which uses an ‘expert manager’ called UIMA (Unstructured Information Management Architecture) to choose the correct sub systems for use in different situations and then with “intelligence” combines the outcomes (answers) of these systems. This method allows Watson to contribute to a resolution even though it may not deliver an actual answer to a given problem.  This multi-processing also helps to gauge and build Watson’s confidence in its answers by use of a probability percentage. This example of probability percentages was witnessed at the Jeopardy matches. Kurzweil says the human brain also uses this method when statistical inference is used to resolve multiple hypotheses.

According to the author, Watson was designed around the complexities and richness of the Neocortex, although admittedly it’s still some way from posing as an actual human. For example, it could not ace the famed Turing test because it was never designed to pass it nor engage in intelligent conversation, rather it was designed to compete at Jeopardy and answer brief and not so complex questions. Kurzweil, though, believes with a little tweaking, Watson will perform those tasks considering that many AI advances occurred before the complexities of the Neocortex were well researched.

Simulating the Human Brain

Multiple attempts with varying degrees of success have been made to accurately simulate the human brain, ably assisted by technologies including the scanning technology used to uncover the grid-like patterns of the Neocortex’s connections.  There a number of such technologies including the latest MRI techniques which are noninvasive scanning technologies.

Human Connectome

The National Institutes of Health, through their Human Connectome project have chosen to use this technology and expect to build a complete 3-D map of the human brain complete with all its connections by 2014.

The Blue Brain Project

The Blue Brain Project, on the other hand aims to model and “simulate the human brain, including the entire Neocortex as well as the old-brain regions such as the cerebellum, amygdala, and hippocampus, and by recording the measurements of ion channels, neurotransmitters, and enzymes that generate and regulate every neuron’s electrochemical activity. They will be using a patch-clamp robot, another scanning technology, in a system that is automatic and able to scan neural tissue at one micro-meter of accuracy, avoiding the destruction of delicate membranes. In 2005, participants simulated one neuron, and in 2011 did a neural mesocircuit of 100 neocortical columns.  They target 10,000 neurons and a rat brain by 2014. Their current goal is 2023 for fully-simulated Human Brain.

Educating the simulated brain

According to Kurzweil, the simulated brain cannot achieve human-level thinking unless it has the necessary content and he describes multiple potential methods to fulfill this requirement. The most likely, he surmises, is one that can simplify molecular models by creating functional equivalents at different levels of detail, starting with his personal functional algorithmic method to simulations that are closer to full molecular simulations. His book goes into greater detail, but he guestimates that it could speed the learning process 1000 fold or more.

Technological acceleration

Kurzweil explains that future-human-evolution-and-exponential-technology-growthhis Law of Accelerating returns (LOAR) is doubted by many because they don’t understand the concept of linear vs. exponential progressions where if forty linear steps is equal to 40 years, the same 40 steps on an exponential scale would equal a whopping trillion years. Based on the historical evidence of exponential advancement, he predicts more complex advances are coming, merging biological and technical evolution techniques. He confidently speculates on the possibility of a machine having human consciousness, identity and free will, purporting that any complex physical system will inevitably develop it.  He cites man’s best friend, the canine, as an example of a non-human consciousness.

Consciousness, Free Will and Identity?

He also argues- concerning free will- that there’s a likelihood that we humans actually don’t have it, but just feel that we do, or alternately, like consciousness, perhaps it’s also an emergent property that evolves at high, complex levels. If these are true then it’s likely possible that a machine of human-level thinking would also have the same, or feel (have the perception) that it does. Kurzweil holds that identity is borne of our sense of free-will and experience. He extrapolates that a self-aware machine would naturally possess the same belief.

Beyond Human Intelligence

Kurzweil is also a proponent of the more advanced applications of AI. Synthetically producing a Neocortex and replacing our own biological one would enable the functioning of more than 300 million processors- or more. A billion?  He considers the fact that digital neurons can be made to link up wirelessly- a big advantage over human ones which are linked physically.

He also considered the possibility of adding bug cleaning features to our brains, to remove/ reduce instances such as multiple thinking and inconsistent but colliding ideas in our brains. A module for detailed thinking could be designed to continually do background scans for inconsistencies in all existing ideas or patterns and update their compatibilities with each other. Inconsistent ideas would then be reviewed or eliminated. With this and other such implants, we would alleviate the risk of AI machines ever outstripping us in intelligence.  We could then take advantage of the singularity by incorporating the exponential advances into our own biology. By doing so we could dispel some fears of losing our identity or changing the continuity of our body cells any more than nature replenishes them for us currently.


It’s only fair to say we are in a race with technology which is ever advancing.  His far future vision is the spread of our non-biological intelligence to the four corners of the universe, infusing our deliberate will directly upon its fate.  If we are able to break the speed of light barrier we could have a universal omnipresence within a few centuries. It is our destiny.

Certainly on that last conclusion this reviewer and this site agree.  Science fiction writers and far futurists have been coming to that conclusion for years as well. See our own 2003 essay on the distant future. It is in fact the only logical conclusion to an assumed eternal existence in the known universe (although we disagree with the assumed ubiquitous non-biological entity).

In any case, let us all hope the boundaries of reality continue to expand the unknown at least as fast as our ability to consume and understand it, lest we be caught in the forever loop of The End is Just the Beginning.

Further Reading

Leaders in Artificial Intelligence – Google

A. Presentation of Google

Google Inc. is an American corporation [1] founded in 1998 by Larry Page and Sergey Brin. It is headquartered in Mountain View, CA with more than 70 offices in the USA and 40 other countries around the world (ex. Australia, Brazil, Canada, China, France, Germany, India, Ireland, Israel, Japan, Kenya, Russia and the United Kingdom).

Google- Leader in the field of Artificial Intelligence

Fig. 1: Google headquarters in Mountain View, CA.
Fig. 2: Research at Google (Video)

Google’s main mission is to collect data from companies and private computer servers, organize it and make it accessible to everyone through their recognized world largest search engine. This mission requires large amount of resources, sustained research (Fig. 2) and development and innovation in computer science, artificial intelligence and other scientific fields. In its approach to R&D known as “Hybrid Research Model”, the company blurs the line between research and development activities and maintains the right balance at all levels. That is to say, research teams stay involved in engineering activities as much as their engineering counterparts bring a research dimension to their activities. Google has a strong commitment to and supports academic research through grants, scholarships, Faculty research awards, Faculty training, curriculum development and outreach programs.

B. Research and Development at Google

R&D and innovation at Google span over several areas in computer science and is driven by real-world data and experience. Its goal is to create practical applications and bring a significant improvement in quality of service to its millions of customers. In particular, Google’s contributions to the advancement of Artificial Intelligence is best known through advances in speech recognition, language translation, machine learning, market algorithms and computer vision. Of the more than $3 billions of investment in R&D, a large size is allocated to AI. The best way to describe ongoing research at Google is through its most popular publications, applications and innovation and the people who are leading it. The following table gives a simple synopsis of research at Google in Artificial Intelligence theory and applications.

AI Field


Machine LearningMachine Perception
Machine LearningMachine Translation
Data MiningMultimedia data processing
Data MiningAI-enabled Visual search
Natural Language UnderstandingSentence parts prediction
Natural Language ProcessingSpeech Recognition and Processing
Natural Language ProcessingGoogle Now  voice recognition on Android  [6]
Computer Visionmedia annotation

Table 1: Research activities in AI at Google

B-1.  AI applications and innovations at Google

By applying Machine Learning techniques to speech understanding, machine translation, and visual processing, Google researchers gather large volumes of evidence of unstable relationships within evolving interests. Then they apply multiple learning algorithms to generalize from that evidence, new interests.

As its mission states, Google’s intention is to organize all types of media (image, video, sound) and make it accessible to everyone. To this end, it exposes computers to different kinds of media and makes them perceive and build explanations from these perceptions. This process is called Machine Perception and is at the core of Google’s data-driven solutions to problem solving.

Using computer vision technology, Google is very active in annotating media, measuring semantic similarity, synthesizing complex objects and browsing large collections of multimedia objects. Besides, Google is also using meaningful data mining techniques to process multimedia  contained in YouTube video, Android, Google image search, StreetView, Google Earth.  It succeeds the translation of raw text and audio within e-mail messages, books and Android through selected statistical translation techniques that improve over time and is independent of the natural language of the content.

Research in Natural Language Processing (NLP) at Google goes beyond the traditional boundaries of language-dependent, limited domain, syntactic/semantic analysis to reach out to the vast amount of data on the Web in multiple human languages. On the syntactic as well as the semantic levels, researchers at Google develop algorithms to predict the position; words should be assigned to in a sentence and the relationships that bind them. In addition, NLP research is oriented towards multilingual linear time parsing algorithms that are able to handle large shifts in vocabulary.

Google Instant - Leader in the Field of Artificial Intelligence

Fig. 3 1: Google Instant – Predicting part-of-speech tags with NLP techniques in Google search

In speech technology, Google is involved on two fronts: 1) Making natural language a normal communication medium between man and machine (computers, phones); 2) Turning any multimedia object (text, video, sound) searchable and accessible on the Web.

B-3.  Leading figures in AI research at Google

Ray Kurzweil - Director of Engineering at GoogleRay Kurzweil [3] is an author, a famous inventor and a futurist who joined Google in December 2012 as a Director of Engineering. He published several books on health, AI, transhumanism, technological singularity, and futurism (ex. The Age of Spiritual Machines, The Singularity Is Near). His work at Google focuses on “new technology development” as well as machine learning and language processing. Kurzweil’s ambition is to analyze the enormous amount of information collected on Google tools and provide it as an intelligent private assistant. He predicts that this assistant would listen to your phone conversation and read your e-mail in the background and later anticipate on your needs, serve them to you before you even ask.  Another Kurzweil’s goal is to design at Google the technology that really understands the meaning of any human language.

Peter Norvig - Former Director of Google Search QualityPeter Norvig [4] started at Google in 2001 as a Director of Search Quality, responsible for the core web search algorithms until 2005. Then, as a Director of Research he oversaw the machine translation team and organized efforts in speech understanding groups. In particular, one of his interests is a system that can help humans find answers to questions that aren’t clearly defined.  He is a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. Previously, he was the head of the Computational Sciences Division at NASA Ames Research Center where he received the NASA Exceptional Achievement Award in 2001. He published over fifty publications mainly in Artificial Intelligence.


Sebastian Thrun - Research Fellow at GoogleSebastian Thrun [5] is a research Professor at Stanford University, co-founder of Udacity and a fellow at Google.  He initiated the secretive Google X lab [7] which harbours dozen of projects like the self-driving car [9] [10], speech recognition and object extraction from video and Google Glass[8], an augmented reality head-mounted device.




Fernando Pereira - Research Director at GoogleFernando Pereira [11] is Researcher Director at Google. His main research interests are in machine-learnable models of language and biological sequences. He is a Fellow of the American Association for Artificial Intelligence, holds several patents in AI and has numerous contributions in computational linguistics and logic programming.  Pereira has over 100 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming.

C. Selected Google contacts

Google Inc.  (headquarters)
1600 Amphitheatre Parkway
Mountain View, CA 94043
Phone: +1 650-253-0000
Google Shanghai
60F, Shanghai World F. C.
100 Century Avenue,
Shanghai 200120, China
Phone: +86-21-6133-7666
Google Paris
38 avenue de l’Opéra
75002 Paris
France  +33
Phone:  (0)1 42 68 53 00
Google Moscow
7 Balchug st.
Moscow 115035  / +7
Russian Federation
Phone: 495-644-1400

Table 2: Google research – Some contacts around the world

D. Further readings

[1] Google. About Google Inc.  URL = . Retrieved April 16, 2013.

[2] Google. Research at Google.  URL = http . Retrieved April 16, 2013.

[3] Inventor Profile Ray Kurzweil  Invent Now, Inc. URL = . Retrieved April 16, 2013. Retrieved April 16, 2013.

[4]   URL = . Retrieved April 16, 2013.

[5] Sebastian Thrun.  Home page at Stanford. URL = . Retrieved April 16, 2013.

[6]  Google. Google Now presentation  URL =

[7]  Gaudin, Sharon (2011). Top-secret Google X lab rethinks the future, Computerworld. Retrieved April 16, 2013.

[8] Albanesius, Chloe (4, 2012). Google ‘Project Glass’ Replaces the Smartphone With GlassesPC Magazine.

[9]  John Markoff (2010). “Google Cars Drive Themselves, in Traffic”The New York Times. Retrieved April 16, 2013.

[10]  Mary Slosson (2012). Google gets first self-driven car license in Nevada”Reuters. Retrieved April 16, 2013.

[11] Fernando Pereira.   URL = . Retrieved April 16, 2013.

Artificial Intelligence: Computer Vision

A.      Nature of Computer Vision

A. 1.  What is computer vision?

Computer Vision (CV) is the science of teaching a computer how to identify a physical object in its surroundings (Fig. 1). Its task is to capture an image, understand it, reconstruct it internally and create a meaningful and concise description. As a scientific and engineering field, Computer Vision [1] [2] strives to apply its theories, models and techniques to the construction of practical systems. Its ultimate aim is to imitate and improve on human visual perception. To this end, it draws from several fields like image processing (imaging), AI (pattern recognition), math. (statistics, optimization, geometry), solid state physics (image sensors, optics), neurobiology (biological vision) and signal processing. This article is about Computer Vision [3] and not Machine Vision (MV) which has a significantly different goal.


Fig. 1: MERTZ, an active vision head of humanoid robot for
learning in a social context at MIT.

Although MV and CV share some vocabulary, concepts and techniques, they have fundamentally different approaches and priorities (see table 1). On the one hand, Computer Vision needs to capture 2D images of objects in a scene and apply elaborate algorithms to recreate an approximate 3D image of that scene. On the other hand, Machine Vision is interested only in 2D images of objects whose salient features are extracted for discrimination purposes (identifying, recognizing, grading, sorting, counting). MV methods use hard-coded (embedded) software containing information about the scene.


Computer Vision

Machine Vision

Hardware/softwareComputers/SoftwareDedicated industrial hardware
Problem solving methodsAlgorithmsin situ programming
Input datafilemechanical part
output datasignal for human beingsignal to control equipment
User interfacesimple graphical interfaceelaborate interface  is critical
Knowledge of human visionstrong influencefair influence
Quality criteriacomputational performanceeasy, cost effective, reliable
Financial supportSecondaryCritical

Table 1: Comparing Computer Vision and Machine Vision.

2.  A short history of computer vision

Larry Roberts’s Ph.D. dissertation in 1963 at the MIT, “Machine Perception of Three-Dimensional Solids”, was a landmark contribution in that it laid out the foundations of  the field of computer vision. In his thesis, Larry Robert proposed the idea of extracting 3D geometrical information from related 2D views of blocks (polyhedra).

In its evolution [6], research in CV needed to tackle real-world problems where edge detection and segmentation are focal points. David Marr proposed his bottom-up approach to scene understanding at the MIT in 1972. It was a major milestone, the most influential contribution in CV.

Here is a synopsis of fifty years of computer vision 1963-2013:

  • • 1960s: Image processing and pattern recognition appear in AI.
  • • 1970s: Horn, Koenderink, Longuet-Higgins milestone contribution to image processing.
  • • 1980s: Math., probability and control theory are applied to Vision.
  • • 1990s: Vision is augmented by computer graphics and statistical learning.
  • • 2000s: Advances in visual recognition and major practical applications have significant impact on Vision.

B.      R&D challenges in Computer Vision

Today there are a number of factors which prevent CV [7] from reaching its full potential. Its interdisciplinary nature (AI, computer science, math., physics, biology) and unexpected growth have made it subject to dispersion and instability. Then, CV – often confused with machine vision – lacks the name recognition and image as a field in its own right. As a result, many research initiatives have the feeling of being underestimated. Last, there seem to be a disconnect between academic research and industry development. With so little ground for cooperation, their needs, achievements and perspectives are not mutually understood.

In spite of all the steady advances in Computer Vision, R&D results have yet to match the visual capabilities of a young child. As significant progress is under way in Europe, Asia and America [9], there is still a lot of hope and ground for optimism about the future. The 2013 Robotics Roadmap report [10] to the American congress has eloquently identified Robotics and Computer Vision as important future drivers of industry.

C.      Applications of Computer Vision

There are several applications of Computer Vision that we enjoy in our everyday lives. Some of these are movies, surveillance, face recognition & biometrics, road monitoring, autonomous driving, space exploration, remote sensing, agriculture and transportation. The following areas [5] are also well known active applications of CV.

1. Medical computer vision or medical image processing

Vision is mainly used in medicine to help with pathology, surgery and diagnosis.

2. Research:

There are several active CV research initiatives in the context of Robotics, Unmanned Aerial Vehicle (UAV), autonomous vehicles (Mars exploration), drones, submersibles. One of the main applications of vision in mobile robotics is the challenging task of vehicle localization.

Computer Vision can help robots in

  • Localization
  • Obstacle avoidance
  • Mapping (determining navigable terrain)
  • Object recognition  (people and objects)
  • Learning interaction with object

3.       Machine Vision:

Although Machine Vision (MV), an engineering discipline, and Computer Vision (CV), a scientific discipline, sometimes overlap, they have different methods and goals. Machine Vision is concerned with using automated image analysis to insure inspection and robot guidance in industry. CV techniques are borrowed and implemented in MV.

4.       Process control  (ex. industrial robot activity):

In manufacturing, it is common to see robots enabled with vision systems when controlling the quality of manufactured goods. For example, in agriculture, CV techniques are being used to classify rice according to grain size during production.

5. Events detection [8]  (ex. human/crowd surveillance or wildfire detection):

Crowd behavior is known to be complex, abstract and sometimes conceal occlusions, changes in illumination and abnormal patterns. To help analyze these anomalies, many researchers use computer vision techniques in video surveillance. it is also used in forest fire detection to improve the human-controlled fire detection rate. It helps render a 3D model from flat images of the fire captured as it evolves in time.

6.       Topographical modeling (ex. landscape image reconstruction):

The recognition, classification and analysis of landscape elements such as buildings, roads, rivers, plantations, and railways require special skills that Computer Vision can provide. It uses shape recognition techniques to classify features and build reliable topographic data.

7. Automatic inspection in manufacturing applications:

Computer vision techniques (procedures and algorithms) have been implemented in a manufacturing environment for heavy automatic optical inspection of complex thin film metal patterns. These techniques can for example detect critical electrical defects.

8. Military applications: 
CV is heavily used in combat zones for monitoring and identifying enemy activities (see Fig. 2).

SRI’s TerraSight video processing and exploitation suite

Fig. 2: SRI’s TerraSight® video processing and exploitation suite.

D.      References

[1] Linda Shapiro and G. Stockman (2001). Computer Vision. Prentice Hall. ISBN 0-13-030796-3..

[2] Tim Morris (2004). Computer Vision and Image Processing. Palgrave Macmillan. ISBN 0-333-99451-5.

[3] David A. Forsyth and Jean Ponce (2003). Computer Vision, A Modern Approach. Prentice Hall. ISBN 0-13-085198-1..

[4] Turek, Fred (June 2011). Machine Vision Fundamentals, How to Make Robots See. NASA Tech Briefs magazine 35 (6),  p. 60–62.

[5] Gérard Medioni and Sing Bing Kang (2004). Emerging Topics in Computer Vision. Prentice Hall. ISBN 0-13-101366-1.

[6] R. Fisher et al. (2005). Dictionary of Computer Vision and Image Processing. John Wiley. ISBN 0-470-01526-8.

[7] Azad, Pedram, T. Gockel, R. Dillmann (2008). Computer Vision – Principles and Practice. Elektor International Media BV. ISBN 0-905705-71-8.

[8] Çelik, Turgay  ( June 2008). Computer vision based fire detection in color images.

Proceedings of the Soft Computing in Industrial Applications, 2008. SMCia ’08. IEEE Conference on, p. 258 – 263.

[9] David Lowe (current) The computer vision industry, URL =

[10] GaTech, CMU, Robotics Tech. Consortium (March 2013). A Roadmap for U.S. Robotics, From Internet to Robotics.

Introduction to Robotics

A. 1.  What is Robotics?

Robotics is the study of robots which are automated machines designed to carry out dangerous or strenuous work for humans.  In the beginning, Robotics was a subfield of artificial Intelligence [1] [2] which then split to form a branch of engineering concerned with the construction, operation and usage of skilful robots. Research and development in Robotics can easily be classified in several categories. These are industrial, personal and service robots, humanoid, networked robots, Robotics for biological and medical applications, and space Robotics. Example Robotics success stories are the Mars Exploration Rover from NASA, the underwater robot Caribou from Wayne State University, the entertainment and home robots  Aibo and Asimo from Sony and Honda.

Most robots have three main parts: a controller (its brain), mechanical components involved in an autonomous motion, and sensors which receive input from its surrounding and help in adapting.


 Fig. 1: Example success stories in Robotics (historical):
Sony’s AIBO in May 1999,  Honda’s ASIMO and NASA’s Mars Exploration Rover.

2.  A short history of Robotics

The word “Robot” was first introduced  around 1920 by Czech playwright and novelist Karel Čapek in his play “Rossum’s Universal Robots”.  It originated from the old Church Slavonic (Bulgarian) word “robota” which means “servitude” or “forced labor”. Then in 1942, Isaac Asimov, coined the word Robotics in his “Three Laws of Robotics” of a science fiction novel.  Over several years, many advances by scientists and industry leaders helped the field of Robotics achieve great success and popularity [1] [2].

  • In 1898, Nikola Tesla demonstrated his first radio-controlled vessel.
  • In 1939 and 1940, the World’s Fairs showcased the first Humanoid robot.
  • In 1956, Unimation presented the first commercial robot.
  • In 1961, the frst industrial robot was operating.
  • In 1972, the first computer-controlled robot, the IRB6 was sold in Sweden.
  • In 1975, Unimation produced the universal manipulation arm.

B.      Methods and trends in Robotics

Modern Robotics research relies heavily on computer science and AI techniques. Therefore, many of the known issues in these technologies have transferred to interface programming of robots.  The following three styles of user interface have emerged in Robotics research [4] over the years.

Learning by example:

In the beginning, robots learned about their duties by following a predefined sequence of tasks taught by a supervisor. The robot would record precisely each step of the task in an internal memory and them “play back” the same task on its own. This approach was particularly suitable for manufacturing jobs like welding and painting.

Robot interface programming:

The proliferation of computers and high level computer programming languages has open new doors in dealing with robots, their components, interface and control. Nowadays, there are several robot programming languages (RPL) which help design interfaces to manipulators (mechanical parts), effectors (end of parts) and deal with control problems. A robot programming language acts as an interface between a human and an industrial robot. These languages are generally divided in three groups: [7]

  • Dedicated programming languages,
  • Robotics-friendly libraries from existing programming (ex. C library),
  • Robotics-specific libraries for an existing language,
  • Brand new language with Robotics-specific libraries.

There are three interesting examples of these Robotics-specific programming languages. The first is VAL (Variable Assembly Language), a manipulator control language which was developed by Unimation to control the industrial robots. The second language called AL, is based on force control and parallelism. It was developed in an Artificial Intelligence laboratory at Stanford University. The third one is RAIL, a high-level language based on Pascal. It is one of the best languages for controlling manipu- lation and vision systems.

Task-level programming languages:

With such language, a user can specify directly in a high-level language, all intermediate sub goals of the main task.  This particular disposition helps the planning of multiple tasks without going into intricate details of how to perform them. For example,  when a robot is asked  to “move a tire”, the system have to plan a path for the manipulator to achieve this goal (find a point of contact, grasp the tire,  and move it) and simultaneously avoid collision with other objects along its path or surroundings. Task-level programming of manipulators is still an active area of research.


Fig. 2: Robots in action in a vehicle manufacturing plant (photo by

C.      Research and challenges in Robotics

Like all high-calibre research initiatives, Robotics [4]  [5] has its own set of fundamental challenges and unsolved problems. Some of these general challenges identified on the international scene (see fig. 3) [8] and in America [10], may be categorized as follows:

Physical interaction with the real world:  Good  hardware  to  make  robots’ arms  and  hands capable of a multitude of other actions besides picking and placing.

Perception  outside  structured  2D settings:  Current robots’ ability to perceive and act on 3D objects is limited and primitive.

Safety for humans:  It is necessary to make personal robots safe for humans to be around with. These safety concerns about human-robot interaction bring with them a number of technical challenges drawn from studies in human-computer interaction.

Grid of robots, sensors, and users: In current real-world applications, a robot carry out predefined tasks together with a human or a network of sensors in a structured setting. With the proliferation of networks and embedded structures in our environment, robots will have to learn how to deal with other participants to achieve a goal inside a network.

The following facts are from the IEEE/RSJ International Conference on Intelligent Robots and Systems, October 7-12, 2012



We can also point out challenges related to particular aspects of Robotics R&D:

Knowledge representation: Humanoid robots can help us understand how robots should represent knowledge about entities in their surroundings.

Vision and tactile coordination:  The possibility of combining vision and tactile perceptions to have a better handle on objects is attractive in industrial environment.

Acceptance of humanoid: How should industry make humans accept humanoid robots as team mates without any particular negative impact on production?

Mobility in space robotics:  This area of research is still haunted with the basic questions of robot location, robot goal, obstacles to overcome, and motion from an initial point to a desired point.

Time delay in space robotics:  This is a serious challenge that affects not only space Robotics but also robots involved in critical environment like the nuclear industry.

D.      References

[1] Nocks, Lisa (2007). The robot: the life story of a technology. Westport, CT: Greenwood Publishing Group.

[2] International Federation of Robotics ( 2012 ). History of Industrial Robots.
URL = .

[3] Robotics Industry News (2013). Applications and Trends.

URL = .

[4] The Robotics Institute at Carnegie Mellon University.
URL =  .

[5] Trossen Robotics Community (2013). How to build a Robot:
URL = .

[6] Society of Robotics Surgery. URL = .

[7] Robotics Business Review.

[8] World Technology Evaluation Center (2006). International Assessment of Research and Development in Robotics.

[10] GaTech, CMU, Robotics Tech. Consortium (March 2013). A Roadmap for U.S. Robotics, From Internet to Robotics.

Machine Learning

A.      Foundations of Machine Learning

A. 1.  What is machine learning?

The field of Machine Learning (ML) in Artificial Intelligence focuses on research on the logical foundations and the design of practical systems which learn from experience, adapt to new situations and improve their behavior over time. ML [1] finds inspiration in biological learning entities and also includes many other disciplines like probability theory, computational logic, optimization, Web search, statistics, and control theory.  Its most well known technique is classification which accepts a vector of values (training set of examples) as input and produces a single discrete output value, the classifier.

ML learning algorithms deal in general with representation, evaluation and optimization:

  • A learning process is always preceded by the choice of a formal representation of the classifier. The set of classifiers that a machine can handle is called its hypothesis space.
  • For the best possible outcome, the learning algorithm uses a function called objective function to identify good classifiers by means of scores.
  • A method for selecting the highest-scoring classifiers is known as optimization.Fig. 1: Typical machine learning tag cloud

A. 2.  Motivation for machine learning

Ray Kurzweil, an American author, inventor and futurist, suggested that the methods of intelligence not yet understood is the mystery that attracts us. The motivation behind current research in ML fits perfectly this image. It’s about transferring control from man to machine, empowering the machine in a way that it is able to program itself through selected examples and experience. Machine learning research aims at the possibility of instructing machines in a way that lessen the burden of hand-programming of complex information into future computers. ML methods excel in application domains too fuzzy (ex. perception, computer vision) for humans to manually design an appropriate algorithm.

A. 3.  Example machine learning problems

Here is a set of common classification problems where our goal is to create categories of objects according to particular properties.

  • optical character recognition (OCR): is a type of pattern recognition [3]. that identifies and classifies handwritten characters (ex. what is the phone number in this scanned image?).
  • face detection:  identifies particular faces in an image along some given features.
  • spam filtering: distinguishes legitimate e-mail form spam.
  • news labeling: sorts news according to their subject content (ex. politics, religion, leisure).
  • natural language understanding: determines and classifies words spoken in a discourse by a human.
  • predictions: given a set of variable (ex. clinical, demographic) predict thelikelihood of occurrence of an event (heart attack, stock price fluctuation, prostate cancer resurgence) .

B.      Background of Machine Learning

Machine Learning techniques stem originally from that of Computer Science and Statistics [2]. Its history evolved along the path of the decline of neural networks systems and the emergence to knowledge-intensive systems. In the beginning, a good understanding of the learning process of biological entities was important for the purpose of reproducing aspects of it in computer systems. In its early days, ML also found inspiration from research into computer-assisted tutoring systems with which it shared many of the same objectives and perspectives. Over the years, ML has, in return, inspired developments in this field to generate Intelligent Tutoring Systems based on AI techniques.

C.      Methods of Machine Learning

There are three main algorithms in ML: supervised learning, unsupervised learning and reinforcement learning. Here is below a table which describes the components of these algorithms.


Supervised LearningUnsupervised LearningReinforcement Learning
Artificial neural networkArtificial neural network
Bayesian statisticsAssociation rule learningQ-learning
Case-based reasoningHierarchical clusteringLearning automata
Decision treesPartitional clustering
Learning automata
Instance-based learning
Regression analysis
Linear classifiers
Decision trees
Bayesian networks
Hidden Markov models

Fig. 2  Elements of main machine learning algorithms

The most common learning paradigms are induction, clustering, analogy, discovery, genetic algorithms and reinforcement. The evaluation of the success of a learning algorithm are its predictive accuracy, the speed of the learner, the speed of the classifier, the space requirements.

D.      Applications of Machine Learning

Machine learning has become one of the most active and rewarding areas of research [8] due to its widespread use in situations as diverse as natural language processing, speech recognition, spam detection, Web search, computer vision, medical diagnosis [4], finance (credit scoring, fraud detection, stock trading) and robotics [6]. [7]. Many data-intensive scientific and industrial research (ex. British Petroleum, Cessna) use machine learning techniques in scientific discovery.

Remarkable case studies have shown that ML applications have made predictions (ex. breast cancer diagnosis) more accurate (72%) than human beings (65%). Here are details about some of these applications:

Speech recognition.  This technology – pioneered by IBM – made its debut in Text-To-Speech (TTS) or Speech-To-Text software used in transcription. In order to succeed, such tool must be trained right from the start, with the age group and accent (reading aloud some text) of the human subject. It needs data (speech pattern) fed by the subject and some Bayesian inference (probability) to improve its accuracy over time. For example, the NELL  (Never Ending Language Learning) project led by Tom Mitchell (Carnegie Mellon U.) is learning how to read information from the Web.

Computer vision. Many face recognition systems using vision are developed with machine learning technology. With stunning accuracy, US Post Office  uses ML technology to automatically sort around 80% of envelopes with handwritten addresses.

Bio-surveillance. The Center for Disease Control (CDC) in Atlanta, GA, uses ML technology to detect and track disease outbreaks around US. In addition, the Real-time Outbreak and Disease Surveillance (RODS), a public health surveillance software, collects and analyzes disease data. It uses a machine learning software to classify admissions in terms of categories of symptoms  along geographical distribution.

Robot control. Supervised learning techniques is standard practice in the field robotics. For example, it is used for detecting and repelling dust and snow, identifying vegetation and locating obstacles. In self-supervised learning, a robot can generate training samples, effectively teach itself to increase performance.

E.      Perspectives in Machine Learning

As industry leaders like Google (X labs), Microsoft and Yahoo [5] are actively involved in research and making significant investments in Machine Learning, there is a strong feeling that its future is as bright as never before.

Nowadays, the most promising and exciting aspect of research in ML is Deep Learning (DL). It deals with learning higher-level concepts at several levels of representation. DL is helping researchers make new discoveries in speech recognition and computer vision. Apple’s Siri virtual personal assistant as well as Google Street View are interesting illustrations of this new and promising technique. DL is well on its way of outperforming humans in areas like patterns recognition.

F.      References

[1]. Kodratoff, Yves et al. (1990 ), Yves et al. Machine Learning: An Artificial Intelligence Approach, (Vol. 3).

[2] Hastie, Trevor;  Tibshirani , R.;  Friedman, J. (February 2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
URL = .

[3]. Christopher M. Bishop (2006). Pattern Recognition and Machine Learning, Springer ISBN 0-387-31073-8.

[4]. Wernick, Yang et al. (July 2010). Machine Learning in Medical ImagingIEEE Signal Processing Magazine, vol. 27, no. 4, pp. 25-38.

[5] Yahoo Research (Machine learning group). Current research on Machine Learning Applications at Yahoo. URL =

[6] Pieter Abbeel (October 2012). Machine learning for Robotics, a video lecture,

[7] Klingspor V.; Demiris, J. ( ). Human-Robot communication and Machine Learning,
Applied Artificial Intelligence Journal, Vol. 11, pp. 719-746, 1997

[8] Andrew Ng (2013). Machine Learning: Coursera open access online lecture.