Use code DAD23 for 20% off + Free shipping on $45+ Shop Now!
The Master Algorithm
How the Quest for the Ultimate Learning Machine Will Remake Our World
Formats and Prices
- ebook $13.99 $17.99 CAD
- Trade Paperback $19.99 $24.99 CAD
This item is a preorder. Your payment method will be charged immediately, and the product is expected to ship on or around September 22, 2015. This date is subject to change due to shipping delays beyond our control.
Also available from:
A thought-provoking and wide-ranging exploration of machine learning and the race to build computer intelligences as flexible as our own
The Machine-Learning Revolution
We live in the age of algorithms. Only a generation or two ago, mentioning the word algorithm would have drawn a blank from most people. Today, algorithms are in every nook and cranny of civilization. They are woven into the fabric of everyday life. They’re not just in your cell phone or your laptop but in your car, your house, your appliances, and your toys. Your bank is a gigantic tangle of algorithms, with humans turning the knobs here and there. Algorithms schedule flights and then fly the airplanes. Algorithms run factories, trade and route goods, cash the proceeds, and keep records. If every algorithm suddenly stopped working, it would be the end of the world as we know it.
An algorithm is a sequence of instructions telling a computer what to do. Computers are made of billions of tiny switches called transistors, and algorithms turn those switches on and off billions of times per second. The simplest algorithm is: flip a switch. The state of one transistor is one bit of information: one if the transistor is on, and zero if it’s off. One bit somewhere in your bank’s computers says whether your account is overdrawn or not. Another bit somewhere in the Social Security Administration’s computers says whether you’re alive or dead. The second simplest algorithm is: combine two bits. Claude Shannon, better known as the father of information theory, was the first to realize that what transistors are doing, as they switch on and off in response to other transistors, is reasoning. (That was his master’s thesis at MIT—the most important master’s thesis of all time.) If transistor A turns on only when transistors B and C are both on, it’s doing a tiny piece of logical reasoning. If A turns on when either B or C is on, that’s another tiny logical operation. And if A turns on whenever B is off, and vice versa, that’s a third operation. Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT. Simple algorithms can be represented by diagrams, using different symbols for the AND, OR, and NOT operations. For example, if a fever can be caused by influenza or malaria, and you should take Tylenol for a fever and a headache, this can be expressed as follows:
By combining many such operations, we can carry out very elaborate chains of logical reasoning. People often think computers are all about numbers, but they’re not. Computers are all about logic. Numbers and arithmetic are made of logic, and so is everything else in a computer. Want to add two numbers? There’s a combination of transistors that does that. Want to beat the human Jeopardy! champion? There’s a combination of transistors for that too (much bigger, naturally).
It would be prohibitively expensive, though, if we had to build a new computer for every different thing we want to do. Rather, a modern computer is a vast assembly of transistors that can do many different things, depending on which transistors are activated. Michelangelo said that all he did was see the statue inside the block of marble and carve away the excess stone until the statue was revealed. Likewise, an algorithm carves away the excess transistors in the computer until the intended function is revealed, whether it’s an airliner’s autopilot or a new Pixar movie.
An algorithm is not just any set of instructions: they have to be precise and unambiguous enough to be executed by a computer. For example, a cooking recipe is not an algorithm because it doesn’t exactly specify what order to do things in or exactly what each step is. Exactly how much sugar is a spoonful? As everyone who’s ever tried a new recipe knows, following it may result in something delicious or a mess. In contrast, an algorithm always produces the same result. Even if a recipe specifies precisely half an ounce of sugar, we’re still not out of the woods because the computer doesn’t know what sugar is, or an ounce. If we wanted to program a kitchen robot to make a cake, we would have to tell it how to recognize sugar from video, how to pick up a spoon, and so on. (We’re still working on that.) The computer has to know how to execute the algorithm all the way down to turning specific transistors on and off. So a cooking recipe is very far from an algorithm.
On the other hand, the following is an algorithm for playing tic-tac-toe:
If you or your opponent has two in a row, play on the remaining square.
Otherwise, if there’s a move that creates two lines of two in a row, play that.
Otherwise, if the center square is free, play there.
Otherwise, if your opponent has played in a corner, play in the opposite corner.
Otherwise, if there’s an empty corner, play there.
Otherwise, play on any empty square.
This algorithm has the nice property that it never loses! Of course, it’s still missing many details, like how the board is represented in the computer’s memory and how this representation is changed by a move. For example, we could have two bits for each square, with the value 00 if the square is empty, which changes to 01 if it has a naught and 10 if it has a cross. But it’s precise and unambiguous enough that any competent programmer could fill in the blanks. It also helps that we don’t really have to specify an algorithm ourselves all the way down to individual transistors; we can use preexisting algorithms as building blocks, and there’s a huge number of them to choose from.
Algorithms are an exacting standard. It’s often said that you don’t really understand something until you can express it as an algorithm. (As Richard Feynman said, “What I cannot create, I do not understand.”) Equations, the bread and butter of physicists and engineers, are really just a special kind of algorithm. For example, Newton’s second law, arguably the most important equation of all time, tells you to compute the net force on an object by multiplying its mass by its acceleration. It also tells you implicitly that the acceleration is the force divided by the mass, but making that explicit is itself an algorithmic step. In any area of science, if a theory cannot be expressed as an algorithm, it’s not entirely rigorous. (Not to mention you can’t use a computer to solve it, which really limits what you can do with it.) Scientists make theories, and engineers make devices. Computer scientists make algorithms, which are both theories and devices.
Designing an algorithm is not easy. Pitfalls abound, and nothing can be taken for granted. Some of your intuitions will turn out to have been wrong, and you’ll have to find another way. On top of designing the algorithm, you have to write it down in a language computers can understand, like Java or Python (at which point it’s called a program). Then you have to debug it: find every error and fix it until the computer runs your program without screwing up. But once you have a program that does what you want, you can really go to town. Computers will do your bidding millions of times, at ultrahigh speed, without complaint. Everyone in the world can use your creation. The cost can be zero, if you so choose, or enough to make you a billionaire, if the problem you solved is important enough. A programmer—someone who creates algorithms and codes them up—is a minor god, creating universes at will. You could even say that the God of Genesis himself is a programmer: language, not manipulation, is his tool of creation. Words become worlds. Today, sitting on the couch with your laptop, you too can be a god. Imagine a universe and make it real. The laws of physics are optional.
Over time, computer scientists build on each other’s work and invent algorithms for new things. Algorithms combine with other algorithms to use the results of other algorithms, in turn producing results for still more algorithms. Every second, billions of transistors in billions of computers switch billions of times. Algorithms form a new kind of ecosystem—ever growing, comparable in richness only to life itself.
Inevitably, however, there is a serpent in this Eden. It’s called the complexity monster. Like the Hydra, the complexity monster has many heads. One of them is space complexity: the number of bits of information an algorithm needs to store in the computer’s memory. If the algorithm needs more memory than the computer can provide, it’s useless and must be discarded. Then there’s the evil sister, time complexity: how long the algorithm takes to run, that is, how many steps of using and reusing the transistors it has to go through before it produces the desired results. If it’s longer than we can wait, the algorithm is again useless. But the scariest face of the complexity monster is human complexity. When algorithms become too intricate for our poor human brains to understand, when the interactions between different parts of the algorithm are too many and too involved, errors creep in, we can’t find them and fix them, and the algorithm doesn’t do what we want. Even if we somehow make it work, it winds up being needlessly complicated for the people using it and doesn’t play well with other algorithms, storing up trouble for later.
Every computer scientist does battle with the complexity monster every day. When computer scientists lose the battle, complexity seeps into our lives. You’ve probably noticed that many a battle has been lost. Nevertheless, we continue to build our tower of algorithms, with greater and greater difficulty. Each new generation of algorithms has to be built on top of the previous ones and has to deal with their complexities in addition to its own. The tower grows taller and taller, and it covers the whole world, but it’s also increasingly fragile, like a house of cards waiting to collapse. One tiny error in an algorithm and a billion-dollar rocket explodes, or the power goes out for millions. Algorithms interact in unexpected ways, and the stock market crashes.
If programmers are minor gods, the complexity monster is the devil himself. Little by little, it’s winning the war.
There has to be a better way.
Enter the learner
Every algorithm has an input and an output: the data goes into the computer, the algorithm does what it will with it, and out comes the result. Machine learning turns this around: in goes the data and the desired result and out comes the algorithm that turns one into the other. Learning algorithms—also known as learners—are algorithms that make other algorithms. With machine learning, computers write their own programs, so we don’t have to.
Computers write their own programs. Now that’s a powerful idea, maybe even a little scary. If computers start to program themselves, how will we control them? Turns out we can control them quite well, as we’ll see. A more immediate objection is that perhaps this sounds too good to be true. Surely writing algorithms requires intelligence, creativity, problem-solving chops—things that computers just don’t have? How is machine learning distinguishable from magic? Indeed, as of today people can write many programs that computers can’t learn. But, more surprisingly, computers can learn programs that people can’t write. We know how to drive cars and decipher handwriting, but these skills are subconscious; we’re not able to explain to a computer how to do these things. If we give a learner a sufficient number of examples of each, however, it will happily figure out how to do them on its own, at which point we can turn it loose. That’s how the post office reads zip codes, and that’s why self-driving cars are on the way.
The power of machine learning is perhaps best explained by a low-tech analogy: farming. In an industrial society, goods are made in factories, which means that engineers have to figure out exactly how to assemble them from their parts, how to make those parts, and so on—all the way to raw materials. It’s a lot of work. Computers are the most complex goods ever invented, and designing them, the factories that make them, and the programs that run on them is a ton of work. But there’s another, much older way in which we can get some of the things we need: by letting nature make them. In farming, we plant the seeds, make sure they have enough water and nutrients, and reap the grown crops. Why can’t technology be more like this? It can, and that’s the promise of machine learning. Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine-learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way.
Once we look at machine learning this way, two things immediately jump out. The first is that the more data we have, the more we can learn. No data? Nothing to learn. Big data? Lots to learn. That’s why machine learning has been turning up everywhere, driven by exponentially growing mountains of data. If machine learning was something you bought in the supermarket, its carton would say: “Just add data.”
The second thing is that machine learning is a sword with which to slay the complexity monster. Given enough data, a learning program that’s only a few hundred lines long can easily generate a program with millions of lines, and it can do this again and again for different problems. The reduction in complexity for the programmer is phenomenal. Of course, like the Hydra, the complexity monster sprouts new heads as soon as we cut off the old ones, but they start off smaller and take a while to grow, so we still get a big leg up.
We can think of machine learning as the inverse of programming, in the same way that the square root is the inverse of the square, or integration is the inverse of differentiation. Just as we can ask “What number squared gives 16?” or “What is the function whose derivative is x + 1?” we can ask, “What is the algorithm that produces this output?” We will soon see how to turn this insight into concrete learning algorithms.
Some learners learn knowledge, and some learn skills. “All humans are mortal” is a piece of knowledge. Riding a bicycle is a skill. In machine learning, knowledge is often in the form of statistical models, because most knowledge is statistical: all humans are mortal, but only 4 percent are Americans. Skills are often in the form of procedures: if the road curves left, turn the wheel left; if a deer jumps in front of you, slam on the brakes. (Unfortunately, as of this writing Google’s self-driving cars still confuse windblown plastic bags with deer.) Often, the procedures are quite simple, and it’s the knowledge at their core that’s complex. If you can tell which e-mails are spam, you know which ones to delete. If you can tell how good a board position in chess is, you know which move to make (the one that leads to the best position).
Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so. In this book I use the term machine learning to refer broadly to all of them.
Machine learning is sometimes confused with artificial intelligence (or AI for short). Technically, machine learning is a subfield of AI, but it’s grown so large and successful that it now eclipses its proud parent. The goal of AI is to teach computers to do what humans currently do better, and learning is arguably the most important of those things: without it, no computer can keep up with a human for long; with it, the rest follows.
In the information-processing ecosystem, learners are the superpredators. Databases, crawlers, indexers, and so on are the herbivores, patiently munging on endless fields of data. Statistical algorithms, online analytical processing, and so on are the predators. Herbivores are necessary, since without them the others couldn’t exist, but superpredators have a more exciting life. A crawler is like a cow, the web is its worldwide meadow, each page is a blade of grass. When the crawler is done munging, a copy of the web is sitting on its hard disks. An indexer then makes a list of the pages where each word appears, much like the index at the end of a book. Databases, like elephants, are big and heavy and never forget. Among these patient beasts dart statistical and analytical algorithms, compacting and selecting, turning data into information. Learners eat up this information, digest it, and turn it into knowledge.
Machine-learning experts (aka machine learners) are an elite priesthood even among computer scientists. Many computer scientists, particularly those of an older generation, don’t understand machine learning as well as they’d like to. This is because computer science has traditionally been all about thinking deterministically, but machine learning requires thinking statistically. If a rule for, say, labeling e-mails as spam is 99 percent accurate, that does not mean it’s buggy; it may be the best you can do and good enough to be useful. This difference in thinking is a large part of why Microsoft has had a lot more trouble catching up with Google than it did with Netscape. At the end of the day, a browser is just a standard piece of software, but a search engine requires a different mind-set.
The other reason machine learners are the über-geeks is that the world has far fewer of them than it needs, even by the already dire standards of computer science. According to tech guru Tim O’Reilly, “data scientist” is the hottest job title in Silicon Valley. The McKinsey Global Institute estimates that by 2018 the United States alone will need 140,000 to 190,000 more machine-learning experts than will be available, and 1.5 million more data-savvy managers. Machine learning’s applications have exploded too suddenly for education to keep up, and it has a reputation for being a difficult subject. Textbooks are liable to give you math indigestion. This difficulty is more apparent than real, however. All of the important ideas in machine learning can be expressed math-free. As you read this book, you may even find yourself inventing your own learning algorithms, with nary an equation in sight.
The Industrial Revolution automated manual work and the Information Revolution did the same for mental work, but machine learning automates automation itself. Without it, programmers become the bottleneck holding up progress. With it, the pace of progress picks up. If you’re a lazy and not-too-bright computer scientist, machine learning is the ideal occupation, because learning algorithms do all the work but let you take all the credit. On the other hand, learning algorithms could put us out of our jobs, which would only be poetic justice.
By taking automation to new heights, the machine-learning revolution will cause extensive economic and social changes, just as the Internet, the personal computer, the automobile, and the steam engine did in their time. One area where these changes are already apparent is business.
Why businesses embrace machine learning
Why is Google worth so much more than Yahoo? They both make their money from showing ads on the web, and they’re both top destinations. Both use auctions to sell ads and machine learning to predict how likely a user is to click on an ad (the higher the probability, the more valuable the ad). But Google’s learning algorithms are much better than Yahoo’s. This is not the only reason for the difference in their market caps, of course, but it’s a big one. Every predicted click that doesn’t happen is a wasted opportunity for the advertiser and lost revenue for the website. With Google’s annual revenue of $50 billion, every 1 percent improvement in click prediction potentially means another half billion dollars in the bank, every year, for the company. No wonder Google is a big fan of machine learning, and Yahoo and others are trying hard to catch up.
Web advertising is just one manifestation of a much larger phenomenon. In every market, producers and consumers need to connect before a transaction can happen. In pre-Internet days, the main obstacles to this were physical. You could only buy books from your local bookstore, and your local bookstore had limited shelf space. But when you can download any book to your e-reader any time, the problem becomes the overwhelming number of choices. How do you browse the shelves of a bookstore that has millions of titles for sale? The same goes for other information goods: videos, music, news, tweets, blogs, plain old web pages. It also goes for every product and service that can be procured remotely: shoes, flowers, gadgets, hotel rooms, tutoring, investments. It even applies to people looking for a job or a date. How do you find each other? This is the defining problem of the Information Age, and machine learning is a big part of the solution.
As companies grow, they go through three phases. First, they do everything manually: the owners of a mom-and-pop store personally know their customers, and they order, display, and recommend items accordingly. This is nice, but it doesn’t scale. In the second and least happy phase, the company grows large enough that it needs to use computers. In come the programmers, consultants, and database managers, and millions of lines of code get written to automate all the functions of the company that can be automated. Many more people are served, but not as well: decisions are made based on coarse demographic categories, and computer programs are too rigid to match humans’ infinite versatility.
After a point, there just aren’t enough programmers and consultants to do all that’s needed, and the company inevitably turns to machine learning. Amazon can’t neatly encode the tastes of all its customers in a computer program, and Facebook doesn’t know how to write a program that will choose the best updates to show to each of its users. Walmart sells millions of products and has billions of choices to make every day; if the programmers at Walmart tried to write a program to make all of them, they would never be done. Instead, what these companies do is turn learning algorithms loose on the mountains of data they’ve accumulated and let them divine what customers want.
Learning algorithms are the matchmakers: they find producers and consumers for each other, cutting through the information overload. If they’re smart enough, you get the best of both worlds: the vast choice and low cost of the large scale, with the personalized touch of the small. Learners are not perfect, and the last step of the decision is usually still for humans to make, but learners intelligently reduce the choices to something a human can manage.
In retrospect, we can see that the progression from computers to the Internet to machine learning was inevitable: computers enable the Internet, which creates a flood of data and the problem of limitless choice; and machine learning uses the flood of data to help solve the limitless choice problem. The Internet by itself is not enough to move demand from “one size fits all” to the long tail of infinite variety. Netflix may have one hundred thousand DVD titles in stock, but if customers don’t know how to find the ones they like, they will default to choosing the hits. It’s only when Netflix has a learning algorithm to figure out your tastes and recommend DVDs that the long tail really takes off.
Once the inevitable happens and learning algorithms become the middlemen, power becomes concentrated in them. Google’s algorithms largely determine what information you find, Amazon’s what products you buy, and Match.com’s who you date. The last mile is still yours—choosing from among the options the algorithms present you with—but 99.9 percent of the selection was done by them. The success or failure of a company now depends on how much the learners like its products, and the success of a whole economy—whether everyone gets the best products for their needs at the best price—depends on how good the learners are.
The best way for a company to ensure that learners like its products is to run them itself. Whoever has the best algorithms and the most data wins. A new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle (or a vicious one, if you’re the competition). Switching from Google to Bing may be easier than switching from Windows to Mac, but in practice you don’t because Google, with its head start and larger market share, knows better what you want, even if Bing’s technology is just as good. And pity a new entrant into the search business, starting with zero data against engines with over a decade of learning behind them.
You might think that after a while more data is just more of the same, but that saturation point is nowhere in sight. The long tail keeps going. If you look at the recommendations Amazon or Netflix gives you, it’s clear they’re still very crude, and Google’s search results still leave a lot to be desired. Every feature of a product, every corner of a website can potentially be improved using machine learning. Should the link at the bottom of a page be red or blue? Try them both and see which one gets the most clicks. Better still, keep the learners running and continuously adjust all aspects of the website.
The same dynamic happens in any market where there’s lots of choice and lots of data. The race is on, and whoever learns fastest wins. It doesn’t stop with understanding customers better: companies can apply machine learning to every aspect of their operations, provided data is available, and data is pouring in from computers, communication devices, and ever-cheaper and more ubiquitous sensors. “Data is the new oil” is a popular refrain, and as with oil, refining it is big business. IBM, as well plugged into the corporate world as anyone, has organized its growth strategy around providing analytics to companies. Businesses look at data as a strategic asset: What data do I have that my competitors don’t? How can I take advantage of it? What data do my competitors have that I don’t?
In the same way that a bank without databases can’t compete with a bank that has them, a company without machine learning can’t keep up with one that uses it. While the first company’s experts write a thousand rules to predict what its customers want, the second company’s algorithms learn billions of rules, a whole set of them for each individual customer. It’s about as fair as spears against machine guns. Machine learning is a cool new technology, but that’s not why businesses embrace it. They embrace it because they have no choice.
- "Wonderfully erudite, humorous, and easy to read."—KDNuggets
- "Pedro Domingos demystifies machine learning and shows how wondrous and exciting the future will be."—Walter Isaacson, New York Times bestselling author of Steve Jobs, The Innovators, and The Code Breaker
- "An impressive and wide-ranging work that covers everything from the history of machine learning to the latest technical advances in the field."—Daily Beast
- "Domingos writes with verve and passion."—New Scientist
- "Unlike other books that proclaim a bright future, this one actually gves you what you need to understand the changes that are coming."—Peter Norvig, Director of Research, Google andcoauthor of Artificial Intelligence: A Modern Approach
- "Domingos is the perfect tour guide from whom you will learn everything you need to know about this exciting field, and a surprising amount about sience and philosophy as well."—Duncan Watts, Principal Researcher, Microsoft Research, and author of Six Degrees and Everything Is Obvious *Once You Know the Answer
- "[The Master Algorithm] does a good job of examining the field's five main techniques.... The subject is meaty and the author...has a knack for introducing concepts at the right moment."—The Economist
- "Domingos is a genial and amusing guide, who sneaks us around the backstage areas of the science in order to witness the sometimes personal (and occasionally acrimonious) tenor of research on the subject in recent decades."—Times Higher Education
- "An exhilarating venture into groundbreaking computer science."—Booklist, starred review
"[An] enthusiastic but not dumbed-down introduction to machine learning...lucid and consistently informative.... With wit, vision, and scholarship, Domingos decribes how these scientists are creating programs that allow a computer to teach itself. Readers...will discover fascinating insights."
- "This book is a must have to learn machine learning without equation. It will help you get the big picture of the several learning paradigms. Finally, the provocative idea is not only intriguing, but also very well argued."—Data Mining Research
- "If you are interested in a crash course on the enigmatic field of machine learning and the challenges for AI practitioners that lie ahead, this book is a great read."—TechCast Global
- "This book is a sheer pleasure, mixed with education. I am recommending it to all my students, those who studied machine learning, those who are about to do it and those who are about to teach it."—Judea Pearl, author of The Book of Why, and professor of computer science, UCLA and winner of the A. M. Turing Award
- "Machine learning is the single most transformative technology that will shape our lives over the next fifteen years. This book is a must-read--a bold and beautifully written new framework for looking into the future."—Geoffrey Moore, author of Crossing the Chasm
- "Machine learning is a fascinating world never before glimpsed by outsiders. Pedro Domingos initiates you to the mysterious languages spoken by its five tribes, and invites you to join in his plan to unite them, creating the most powerful technology our civilization has ever seen."—Sebastian Seung, professor, Princeton, and author of Connectome
- "A delightful book by one of the leading experts in the field. If you wonder how AI will change your life, read this book."—Sebastian Thrun, research professor, Stanford, Google Fellow and Inventor of the Self-Driving Car
- "This is an incredibly important and useful book. Machine learning is already critical to your life and work, and will only become more so. Finally, Pedro Domingos has written about it in a clear and understandable fashion."—Thomas H. Davenport, Distinguished Professor, Babson College and author of Competing on Analytics and Big Data @ Work span
- "Machine learning, known in commercial use as predictive analytics, is changing the world. This riveting, far-reaching, and inspiring book introduces the deep scientific concepts to even non-technical readers, and yet also satisfies experts with a fresh, profound perspective that reveals the most promising research directions. It's a rare gem indeed."—Eric Siegel, founder of Predictive Analytics World and author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
- On Sale
- Sep 22, 2015
- Page Count
- 352 pages
- Basic Books