Data for the People

How to Make Our Post-Privacy Economy Work for You

Contributors

By Andreas Weigend

Formats and Prices

Price

$18.99

Price

$24.99 CAD

Format

Format:

  1. ebook $18.99 $24.99 CAD
  2. Hardcover $30.00 $38.00 CAD

This item is a preorder. Your payment method will be charged immediately, and the product is expected to ship on or around January 31, 2017. This date is subject to change due to shipping delays beyond our control.

A long-time chief data scientist at Amazon shows how open data can make everyone, not just corporations, richer

Every time we Google something, Facebook someone, Uber somewhere, or even just turn on a light, we create data that businesses collect and use to make decisions about us. In many ways this has improved our lives, yet, we as individuals do not benefit from this wealth of data as much as we could. Moreover, whether it is a bank evaluating our credit worthiness, an insurance company determining our risk level, or a potential employer deciding whether we get a job, it is likely that this data will be used against us rather than for us.

In Data for the People, Andreas Weigend draws on his years as a consultant for commerce, education, healthcare, travel and finance companies to outline how Big Data can work better for all of us. As of today, how much we benefit from Big Data depends on how closely the interests of big companies align with our own. Too often, outdated standards of control and privacy force us into unfair contracts with data companies, but it doesn’t have to be this way. Weigend makes a powerful argument that we need to take control of how our data is used to actually make it work for us. Only then can we the people get back more from Big Data than we give it.

Big Data is here to stay. Now is the time to find out how we can be empowered by it.

Excerpt

INTRODUCTION

The Social Data Revolution

How Can We Ensure That Data Are for the People?

Every revolution was first a thought in one man's mind; and when the same thought occurs to another man, it is the key to that era.1

RALPH WALDO EMERSON

AT 6:45 A.M., the alarm on my mobile phone wakes me up. Eager to start the day, I carry my phone to the kitchen while I scan through my email and Facebook notifications. My phone's GPS receiver and wifi register the changes in location, logging my shift a few meters north and east. As I pour myself a cup of coffee and really start to get going, the phone's accelerometer tracks how quickly I walk and the barometer registers when I'm going up the stairs. Because I have Google apps installed on my phone, Google has a record of all these data.

After breakfast, I'm ready to make my way to Stanford University. The electricity company has put in a "smart" meter, which registers the decrease in electricity use as I turn off my lights and unplug my mobile devices. When I open the garage door, the meter detects the usage signature specific to it. Thus, as I pull my car out onto the street, my electricity provider has enough data to know I'm no longer at home. When my phone's signal gets picked up by different cellular signal towers, so does my mobile phone carrier.

On the road, a camera installed on a street corner takes a photo of my license plate in case I speed through a red light. Thankfully, I'm on my best behavior today so I won't be greeted with a ticket in the mail. But as I go on my way, my license plate is photographed again and again. Some of those cameras belong to the local government, while some belong to private companies that are analyzing the data to identify patterns of mobility—which they sell to police departments, land developers, and other interested parties.

When I get to Stanford, I use the EasyPark app on my phone to pay the parking fee. The money is automatically debited from my bank account, and the university parking team is notified that I'm paid up, so both the school and my bank can see that I'm on campus starting at 9:03 a.m. When my phone stops moving at a car's pace, Google infers this is where I have parked and logs the location, so that I can look it up in case I forget later. It's also time to check my Metromile insurance app, which has been recording data about my drive from the car's on-board diagnostic system. I can see in an instant that my fuel efficiency was lower today—nineteen miles per gallon—and that I spent $2.05 on gas for my commute.

After my day at Stanford, I'm planning to meet up with a new friend back in San Francisco. We "virtually" met each other when we both commented on a post by a mutual friend on Facebook, and liked each other's take on the topic. It turned out we had more than thirty Facebook friends in common, more than enough reason to meet up.

Google Maps predicts that I'll get to my new friend's place at 7:12 p.m., and as usual the prediction is correct within a few minutes. As it happens, my friend lives above a store that sells tobacco products as well as various paraphernalia used for smoking marijuana. The GPS receiver on my smartphone doesn't differentiate between the apartment and the store, however. As far as my carrier and Google are concerned, I've ended my day with a visit to the head shop—a fact revealed to me by the ads Google shows when I check the weather forecast before going to bed.

Welcome to the social data revolution.

Give to Get

Every day, more than a billion people create and share social data like these. Social data is information about you, such as your movements, behavior, and interests, as well as information about your relationships with other people, places, products, even ideologies.2 Some of these data are shared knowingly and willingly, as when you are signed in to Google Maps and type in your destination; others less so, often without much thought, part and parcel of the convenience of using the internet and mobile devices. In some cases, it is clear that sharing data is a necessary condition for receiving services: Google can't show you the best route to take if you don't tell it where you are and where you want to go. In other cases, you might happily contribute information, as when you "like" a friend's Facebook post or endorse a colleague's work on LinkedIn simply because you want to reach out and support her in some way.

Social data can be highly accurate, pinpointing your location to within less than a meter, but social data are often sketchy, in the sense of being incomplete. For example, unless I sign in to an app that displays my smart meter's readings (for instance, to be sure that I really did turn off all the lights in my house as I make my way to the airport), the electricity company knows when I am not at home, but nothing more than that. It's a rough data point that may or may not be of much use to me. Similarly, as I was visiting my new friend in San Francisco, while my latitude and longitude were conveyed with precision, the inferences made about my activities that evening were utterly wrong. That's even sketchier, in the sense that the data appeared quite exact but were very much an interpretation. Sketchy data have a tendency to be incomplete, error-prone, and—occasionally—polluted by fraud.3

Altogether—passive and active, necessary and voluntary, precise and sketchy—the amount of social data is growing exponentially. Today, the time it takes for social data to double in quantity is eighteen months. In five years, the amount of social data will have increased by about a factor of 10, or an order of magnitude, and after ten years, it will increase by about a factor of 100. In other words, the amount of data we created over the course of the entire year 2000 is now created over the course of a day. At our current growth rate, in 2020 we'll create that amount of data in less than an hour.

It's essential to understand that "social data" isn't merely some trendy buzzword for social media. Many social media platforms have been designed for broadcasting. In the case of Twitter, communication is almost always moving in one direction, from a celebrity, authority, or marketer to the masses. Social data is far more democratic. You may share information about yourself, your company, your accomplishments, and your opinions through Twitter or Facebook, but your digital traces are much deeper and broader than that. Your searches on Google, your purchases on Amazon, your calls on Skype, the minute-by-minute location of your mobile phone—all these and many more sources come together to produce a unique portrait of you as an individual.

Further, social data doesn't end with you. You create and share data about the strength of your relationships with family, friends, and colleagues through your communication patterns; you create data alongside friends and strangers alike—for instance, when reviewing a product or tagging a photo on Instagram. You verify your identity when you set up an account on Airbnb, the platform for renting a room or house, using your Facebook profile in addition to a government-issued ID. Social data are becoming embedded in homes with smart thermostats, in cars with navigational systems, and in workplaces with team-based software. Such data are beginning to feature in our classrooms and doctors' offices. As mobile phones get loaded up with more sensors and apps, and new devices start tracking your behavior at home, in the mall, and on the job, you'll have less and less ability to control the data that describe your daily routine—as well as your deepest wishes. Data scientists become detectives and artists, painting iteratively clearer sketches of human behavior from our digital traces.

These digital traces are examined and distilled to uncover our preferences, reveal trends, and make predictions, including about what you might buy. During my tenure as chief scientist of Amazon, I worked with Jeff Bezos to develop the company's data strategy and customer-centric culture. We ran a series of experiments to see if customers were happier with their purchases when they were shown editor-written versus consumer-written product reviews, and whether recommendations based on traditional demographic profiling or individual clicks were more successful. We saw the power of genuine communication over manufacturer-sponsored promotions. The personalization tools we created for Amazon fundamentally changed how people decide what to purchase and became the standard in e-commerce.

Since leaving Amazon, I have taught courses on "The Social Data Revolution" to thousands of students, from undergraduates and graduate students at Stanford and the University of California–Berkeley to Chinese business students at Fudan University and China Europe International Business School in Shanghai and Tsinghua University in Beijing. I also continue to run the Social Data Lab, a group of data scientists and thought leaders that I founded in 2011. Over the past decade, in my work with corporations ranging from Alibaba and AT&T to Walmart and UnitedHealthcare, and at major airlines, financial services firms, and dating sites, I have been an advocate for sharing the decision-making power of data with customers and users—regular people like you and me.

No single person can wade through all of the data available today in an effort to make what we used to call an "informed" decision about some aspect of life. But who will have access to the tools that are necessary for leveraging data in service to our problems and needs? Will the preferences, trends, and predictions extracted from data be available to only a few powerful organizations, or will they be available for anyone to use? What price will we have to pay to secure the dividends of our social data?

As we discover the value of social data, I believe we must focus not just on access but also on actions. We face some decisions many times each day, others just once in a lifetime. Indeed, the social data we create today have a long shelf life. The way we behave today may influence the choices we face in the decades to come. Few people have the ability to observe everything they do, or to analyze how their behavior might affect them, in the short or long term. Social data analysis will allow us to better identify the possibilities and probabilities, but the final choice must be deliberate.

One thing these technologies cannot do is decide what sort of future we want—as individuals or a society. The laws in place that protect individuals in many countries from discrimination in the workplace or health care may not exist tomorrow—and in some countries, they do not exist even today. Imagine that you opt to share that you're worried about having high cholesterol with a health app or site in order to get advice about diet and exercise regimens. Could your worries be used against you in some way? What if the law made it permissible to charge you a higher rate for medical care if you refused to stop eating deep-fried food and slouching on the couch after you've been presented with a menu of your health risks and recommendations for healthier choices? What if a manager used a service to crawl the web for information about you, and then, based on what he learned, decided that your lifestyle isn't a good match for a job at his company and he won't consider your application? These are real risks.

If the sole person creating and sharing data about you was yourself, you might be able to withhold information that you thought might be risky. It would cost you a lot of convenience, but it could be possible. However, we do not live in such a world. You have no control over much of the data about you. This fact will become more palpable as social data are utilized by businesses and governments to improve effectiveness and efficiency.

Because social data are so democratic, the questions about how best to handle it touch each and every one of us. Technology is moving fast, and the companies that collect and analyze our data are primarily in the business of creating and coding information, not creating and codifying principles. Many of those questions are being considered on an ad hoc basis, if they're being considered at all. We should not leave decisions about principles that will deeply influence our future in the hands of the data companies.

We can agree to have all of these data collected, combined, aggregated, and analyzed so that we are in a better position to understand the trade-offs in decision-making. Human judgment is crucial to evaluating the trade-offs intrinsic to any important decision. Our lives should not be driven by data. They should be empowered by data.

Principles for the Post-Privacy Age

As we've come to appreciate the increasing role of data in life, there have been several efforts to safeguard citizens' interests. In the 1970s, the United States and Europe adopted broadly similar principles for the fair use of information. Individuals were told they had a right to know who collected what data about them, and how these data were being used. They could also correct data about themselves that were inaccurate.4 These protections are perversely both too strong and too weak for the world of new data sources and analytics that is being built today.

They're too strong because they assume it's possible to keep tabs on all the data collected about you. Amazon might be able to explain in accessible terms exactly how the data the company collects about you are used. It might even be able to do so in a way that helps you make better decisions. But reviewing all this information would require investing a lot of time. How many of us would take the time to trawl through all the relevant data? Would it be useful to you to see how Amazon weighs each data point, or would you prefer to get a summary?5

At the same time, these protections are too weak, because even if you could check every bit of data you have created and shared about yourself, you will not get a full picture of the data about you, which includes data created and shared by others, such as your family, friends, colleagues, and employers. The businesses you visit online, as well as most of those you visit in the physical world, also create (and sometimes share) data. That goes for strangers on the street and a number of other organizations, public and private, with which you interact. Who decides whether these data are accurate or inaccurate? Because data today come from so many perspectives, having the right to correct data about yourself doesn't reach nearly far enough. Finally, even accurate data can be used against you.

With the massive quantitative and qualitative shifts in data creation, communication, and processing, the right to know and the right to correct are clearly insufficient. Thus far, the attempts to update these guidelines have focused almost entirely on maintaining individual control and privacy.6 Unfortunately, this approach is borne out of ideals and experiences that are technologically a century out of date. Standards of control and privacy also force individuals to enter an unfair contract with data companies. If you want your decision-making to be improved by data, you usually have to agree to having your data collected on the data collector's terms. Once you've done that, the data company has satisfied the legal requirement to give you individual "control," regardless of how much choice you really had or the effects on your privacy. If you want to maintain personal privacy, you can instead withhold your consent to data collection and forfeit your access to relevant data products and services, reducing the value you get from your data. Enjoy your individual control then.

Today, what we need are standards that allow us to assess the risks and rewards of sharing and combining data, and provide a means for holding companies accountable. After two decades working with data companies, I believe the principles of transparency and agency hold the most promise for protecting us from the misuse of social data while increasing the value that we are able to reap from them.

Transparency encompasses the right of individuals to know about their data: what it is, where it goes, and how it contributes to the result the user gets. Is the company observing you from the "dark" side of a one-way mirror, or does it also give you a window with a view to what it does with your data, so that you can judge whether (and when) the company's interests are aligned with your own?7 How much data about yourself do you have to share to receive a data product or service that you want? Historically, there has been a strong information asymmetry between institutions and individuals, with institutions having the advantage. Not only do institutions have more capacity to collect data about you, they can interpret your data in comparison to others' data. The balance between what you give and what you get needs to be clear to you.

Consider how transparency is designed into the shopping experience at Amazon compared to the traditional relationship between customers and retailers. When you are about to buy an item, should a retailer remind you that you already bought it, potentially losing a sale in the process? At Amazon, if you try to buy a book you've already bought from the site, you're greeted with the query "Are you sure? You bought this item already, on December 17, 2013." If you buy a track from an album of music and then decide to buy the rest of it, Amazon will "complete the purchase," automatically deducting the amount you have already spent on the track from the current price for the album. Amazon surfaces and uses data about your purchasing history in these ways because the company wants to minimize customer regret. Likewise, many airline frequent flyer programs now send you a reminder that your miles are about to expire rather than letting them quietly disappear from the company's books.

Unfortunately, transparency is far from the norm. Consider the far-too-typical experience of calling your favorite customer service center. At the start of the call, you'll inevitably receive the warning: "This call may be recorded for quality assurance purposes." You've got no choice: you must accept the company's conditions if you want to talk to a representative. Okay, but why is that recording accessible only to the business? What, really, does "quality assurance purposes" mean when only one side of the conversation is assured access to the record of what was agreed? The principle of data symmetry would also give you, the paying customer, access to the recording.

Whenever I hear that my call might be recorded, I announce to the customer service rep that I might also record the call for quality assurance purposes. Most of the time, the rep plays along. Occasionally, however, the rep hangs up. Of course, I could record the call without asking for the rep's permission—which, I should note, is against the law in some places. Then, if I don't get the service I was promised, I could appeal to a manager with my evidence in hand. If that still didn't work, I could upload the audio file in the hopes that it might go viral and the company feels pressured to fix things quickly—as Comcast did when a customer tried to cancel services but was rebuffed again and again, finally succeeding after his recording started trending on Twitter.8

You shouldn't have to break the law to level the playing field in this way. To make transparency the new default, you need more information to be public, not less.

But transparency isn't enough; you also need agency.9 Agency encompasses the right of individuals to act upon their data. How easy is it for you to identify the data company's "default" settings, and are you allowed to alter them for whatever reason you like? Are you able to act upon the data company's outputs in any way you choose, or are you gently nudged (or forcefully pushed!) toward only some options—mostly the options that are best for the company? Can you play with the parameters and explore different scenarios to show a smaller or bigger range of possibilities? Agency is an individual's power to make free choices based on the preferences and patterns detected by data companies. This includes the ability to ask data companies to provide information to you on your own terms.

On a fundamental level, agency involves giving people the ability to create data that are useful to them. Amazon wholeheartedly embraced uncensored customer reviews. It didn't matter to the company if the reviews were good or bad, five stars or one, written out of a desire to gain approval from others or to achieve a lifelong dream of becoming a book critic. What mattered was their relevance to other customers who were trying to figure out what to purchase. Reviews revealed whether a customer regretted a purchase even though she did not return the item for a refund. These data helped customers decide if a recommended product was the best choice for them. Amazon gave customers more agency.

Many marketers talk about targeting, segmentation, and conversion. I don't know about you, but I don't want to be targeted, segmented, converted, or sliced and diced. These aren't expressions of agency. We can't assume that the leaders of every company will, on their own, embrace the principles of transparency and agency. We must also go beyond these principles: we need delineated rights that help to spell out how to translate transparency and agency into tangible, hands-on tools.

If we can get data companies to agree to a set of meaningful rights and tools, it will lead to what I call "sign flips"—reversals in the traditional relationships between individuals and institutions. Amazon's decision to let customers write most of the content about products is a sign flip, and the social data revolution will provide many more similar opportunities. As individuals gain more tools to help them make better decisions for themselves, old-fashioned marketing and manipulation are becoming less effective. Gone is the day when a company could tell a powerless customer what to buy. Soon, you will get to tell the company what to make for you. In some places, you already can.

Sign flips are an important element in how physicists see the world. They are often associated with phase transitions, where a change in an external condition results in an abrupt alteration in the properties of matter—water changing from a liquid into a gas when it is heated to the boiling point. The effect on society of the increasing amount of data can be compared to the increasing amount of heat on a physical system. Under certain conditions—when data companies provide transparency and agency for users—a sign flip will take place that favors the individual over the institution; that is, it will benefit you, not the company, or the company's chief marketing officer.

We the people all have a stake in the social data revolution. And if you want to benefit from social data, you must share information about yourself. Period. The value you reap from socializing data often comes in the form of better decision-making ability, when negotiating deals, buying products and services, getting a loan, finding a job, obtaining education and health care, and improving your community. The price you pay and the risks you take in sharing data must at least be offset by the benefits you receive. Transparency about what data companies are learning and doing is essential. So, too, is your ability to have some control over data products and services. Otherwise, how could you possibly judge what you give against what you get?

Balancing the Power

Information is at the center of power. Those who have more information than others almost always stand to benefit, like the proverbial used-car salesman who pushes a lemon on an unwitting customer. As communication and processing have become cheap and ubiquitous, there's a lot more data—and a lot more risk of substantial information imbalances, since no individual can get a handle on all the data out there.

Much of the data being created and shared is about our personal lives: where we live, where we work, where we go; who we love, who we don't, and who we spend our time with; what we ate for lunch, how much we exercise, and which medicines we take; what appliances we use in our homes and which issues animate our emotions. Our lives are transparent to the data companies, which collect and analyze our data, sometimes engaging in data trafficking and too often holding data hostage for use solely on their terms. We need to have some say in how our data are changed, bartered, and sold, and set more of the terms on the use of our data. Both sides—data creator and data company—must have transparency and agency.

This will require a fundamental shift in how we think about our data and ourselves. In the first chapter, I explain several of the ways data companies analyze data, adopting the metaphor of the refining process, whereby the companies transform raw data into products and services. Then, in Chapter 2, I turn to individuals and their attributes, and how the cumulative digital traces of our lives—our searches, clicks, views, taps, and swipes—are destroying any illusion of privacy, creating new concepts of identity, and indicating honest signals of interest, whether we want them to or not. Next, in Chapter 3, I shift the focus from the individual to the connections between individuals, and how social networks reveal and reshape trust in the digital age. I then in Chapter 4 look at how our context is being recorded at finer and finer resolution, as sensors of all types—not just cameras—are networked, and the data they collect are analyzed to infer our location, emotional state, and level of attention.

With this foundation, I lay out the six rights that I feel are essential to ensuring that future data of the people and by the people will be data for the people. Two of these rights—the right to access data and the right to inspect the data companies—are committed to the cause of increasing transparency. The remaining four rights are focused on giving us more agency, through the right to amend data, the right to blur data, the right to experiment with data, and the right to port data to other companies. Applying these rights to our data and their use will have consequences for how we buy, how we pay and invest, how we work, how we live, how we learn, and how we manage public resources, as we will see in the closing chapter on turning these six rights into realities.

Genre:

  • "An exhaustive and insightful look at how data is collected and uses... Weigend argues persuasively that in this 'post-privacy' world, we should give our data freely, but that we should expect certain protections in return."
--New York Times Book Review
  • "[Weigend] makes a strong case for what we need-the right to amend or blur the data that pertains to us, the freedom to experiment with it and take it with us to other sites and services, and the ability to insist that data refineries be clear about how they're using our information."
  • --Wall Street Journal

    "A hugely interesting read, packed to bursting with intriguing examples... The depth and breadth of Weigend's experience is clear in the sheer range of technologies and business models he describes. He explains critical concepts clearly and concisely, at a pace that should keep both experts and those new to the field hooked."
    --New Scientist
  • "Weigend is a bold explorer of the technological future. His compelling book maps the opportunities of a world without secrets."
  • --Daniel Kahneman, author of Thinking Fast and Slow

    "Data for the People asks us to think seriously about the data we generate in our online world, and how we are increasingly losing control over it. These products and services that generate data are not going away. And with advances in artificial intelligence enabling computers to do traditionally human tasks in a scalable manner, this data can and will continue to be utilized across the majority of decisions by institutions. Andreas acknowledges and embraces this future, and provides a framework and a call to action to ensure that in this world, as consumers, we can use and control our data in ways that are both transparent and beneficial to us."
    --Vinod Khosla, Partner at Khosla Venture
  • "The author maintains the intellectual complexity of his subject while remaining accessible to readers searching for the truth about the salability of their privacy, the nuances of data sharing, and the ways to cloak their digital footprints. A cautionary, cohesively delivered update on the scope and science of human quantification."
  • --Kirkus Reviews
  • "Data-abundant, ubiquitous, personal-is restructuring our competing values of privacy, convenience, identity, and control. No one understands this better than Weigend, and with Data For the People, he helps the rest of us understand it as well."
  • --Clay Shirky, author of Here Comes Everybody

    "Andreas Weigend is the preeminent thinker on the economic power of social data. Data for the People is a brilliant guide for how individuals, companies and policymakers can tap data's value while retaining our
    human values. Thought provoking-and action-inspiring!"
    --Kenneth Cukier, Senior Editor, The Economist and coauthor of Big Data

    "Data is the new oil-the key means of production in modern capitalism. Big data refineries such as Google, Amazon, Facebook, and OKCupid influence where we work, what we buy, who we marry, and how we vote-in ways that very few people understand, much less control. This is an excellent book about the biggest ever challenge to human privacy and autonomy. Social data expert Andreas Weigend explains the incredibly detailed data we give to these companies, how it's used to nudge our decisions, and how we can take back control so our data empower us rather than exploiting us."
    --Geoffrey Miller, associate professor of psychology at the University of New Mexico
  • "Finally a highly readable and heartfelt book about data by a leading technologist! Andreas Weigend writes with superb clarity about the most important issue of the early 20th century-the data economy and its threat to our privacy and individual rights. The narrative of his own personal journey from East Germany to becoming the Chief Scientist at Amazon.com is also compelling. Overall a major work by one of the world's leading authorities on data."
  • --Andrew Keen, author of The Internet Is Not the Answer

    "This book is a landmark in the debate on privacy and data sharing. Everyone whose data is being captured and mined-in other words, everyone-should heed Weigend's call for data literacy and support his 'Data Bill of Rights.'"
    --Pedro Domingos, author of The Master Algorithm and professor of computer science at the University of Washington

    On Sale
    Jan 31, 2017
    Page Count
    272 pages
    Publisher
    Basic Books
    ISBN-13
    9780465096534

    Andreas Weigend

    About the Author

    Andreas Weigend is one of the world’s foremost experts on the future of big data, social-mobile technologies, and consumer behavior. He teaches at Stanford University, the University of California, Berkeley, and Cheung Kong Graduate School of Business in China. He is the founder and director of the Social Data Lab. He lives in San Francisco, California.

    Learn more about this author