When Machine Learning Isn’t Learning
By Bill Franks, Apr 10, 2014
Terms come in and out of vogue on a regular basis. In recent years, the use of the term Machine Learning has surged. What I struggle with is that many traditional data mining and statistical functions are being folded underneath the machine learning umbrella. There is no harm in this except that I don’t think that the general community understands that, in many cases, traditional algorithms are just getting a new label with a lot of hype and buzz appeal. Simply classifying algorithms in the machine learning category doesn’t mean that the algorithms have fundamentally changed in any way.
Many startup companies, particularly in the cloud, are touting machine learning capabilities. In some cases, the algorithms are hidden behind a user interface so that users may not know what is happening under the hood. Users may believe that a new capability or algorithm that is closer to artificial intelligence is being used. However, would those same users be excited if they knew that they are buying a very early and immature version of yet another tool to create a decision tree?
Perhaps I have an outdated view, but I have always thought of machine learning as being closer to artificial intelligence than data mining. I want a machine learning algorithm to adjust itself dynamically and learn how to apply new rules. This is distinct from an iterative algorithm like a k-means cluster analysis. It can be argued that a clustering algorithm “learns” after each pass and adjusts dynamically. However, the rules are set in advance and don’t change. Once the first iteration of a k-means process has begun, the final answer is set in stone even if we don’t know the answer yet. Everything that happens after starting the first iteration can be manually duplicated if desired. A k-means algorithm uses fixed rules and the algorithm never learns to do something differently.
Like k-means clustering, many algorithms being tagged with the machine learning label today are more iterative in nature than adaptive and learning in nature. I first came across the difference between artificial intelligence and a complex set of rules in high school. For a science fair project, I programmed my computer to play the game Isolation. Isolation is played on an 8 x 6 grid. Players move their piece to an open space and then punch out any space on the board. The idea is to get your opponent trapped on an island with no moves to make before you are trapped.
As I played the game, I realized that a strategy of choosing a space with a lot of options on the next two or three moves, as well as the next move, would usually beat moving to the space where the most options existed for only the next move. My computer program took advantage of this. The program identified every possible space it could move to. Then, the program determined for each of the spaces how many moves were possible on the next move beyond the current move. I believe the program examined the options on the third move as well. Whichever available space had the largest number of options across the next several moves was the one the computer would pick.
When I took my program to the science fair, it beat most people. Since many people hadn’t played the game, it wasn’t surprising to me because a moderately skilled player will beat a novice in most games. However, many people thought my computer was truly intelligent, especially since it even had three difficulty levels. The only difference between the difficulty levels was the probability that the computer would randomly select a space instead of picking the best space. While people perceived that there was a lot of intelligence behind the program, there really wasn’t.
The point is that with some simple, recurring rules I was able to create a program that could beat most people in a strategy game. However, the computer really wasn’t thinking or learning. It was simply following predetermined, iterative rules that I had provided. There is an old saying that any sufficiently sophisticated technology is indistinguishable from magic. I am beginning to wonder if any sufficiently complex rules-based algorithm is indistinguishable from true artificial intelligence or adaptive machine learning.
I have no issue if the market wants to label algorithms that are based on iterative rules as machine learning. I do wonder, however, how many people are just following the hype and do not understand that what they think is an algorithm that is learning and adapting is really just a set of complex rules.
Bill Franks is an IIA Faculty Member and the Chief Analytics Officer at Teradata Corporation.
About the author
Bill Franks is Chief Analytics Officer for Teradata, where he provides insight on trends in the analytics and big data space and helps clients understand how Teradata and its analytic partners can support their efforts. His focus is to translate complex analytics into terms that business users can understand and work with organizations to implement their analytics effectively. His work has spanned many industries for companies ranging from Fortune 100 companies to small non-profits. Franks also helps determine Teradata’s strategies in the areas of analytics and big data.
Franks is the author of the book Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012). In the book, he applies his two decades of experience working with clients on large-scale analytics initiatives to outline what it takes to succeed in today’s world of big data and analytics. The book made Tom Peter’s list of 2014 “Must Read” books and also the Top 10 Most Influential Translated Technology Books list from CSDN in China.
Franks’ second book The Analytics Revolution (John Wiley & Sons, Inc., September, 2014) lays out how to move beyond using analytics to find important insights in data (both big and small) and into operationalizing those insights at scale to truly impact a business.
He is a faculty member of the International Institute for Analytics, founded by leading analytics expert Tom Davenport, and an active speaker who has presented at dozens of events in recent years. His blog, Analytics Matters, addresses the transformation required to make analytics a core component of business decisions.
Franks earned a Bachelor’s degree in Applied Statistics from Virginia Tech and a Master’s degree in Applied Statistics from North Carolina State University. More information is available at www.bill-franks.com.