
Personal data harvested by marketers is growing so vast  and far reaching that it is threatening to unleash a new wave of  digital discrimination, one that ordinary people won't even be able to  see happening, Microsoft principal researcher Kate Crawford is warning.
Combining  the troves of information collected by retailers, mobile carriers,  Internet companies and others into massive databases creates so-called  big data sets. Computers then troll the data looking for patterns that  can be used to make predictions about consumer habits. “Some people think that big data is really quite fantastic because  you're working at a mass level and therefore you can't actually conduct  group-based discrimination,” Crawford said, speaking at the 
EmTech conference  at MIT last week. “It's actually quite the opposite. Big data is not  color blind, it's not gender blind and, in fact, marketers are using big  data to have ever-more precise categories about you.”
 A recent study at Cambridge University looking at almost 60,000  people’s Facebook “likes” was able to predict with high degrees of  accuracy their gender, race, sexual orientation and even a tendency to  drink excessively. The model could tell a gay man from a straight man  correctly 88% of the time and predict race with 95% accuracy, for  example. Government agencies, employers or landlords could easily obtain  such data, Crawford warns.
 A lender, for example, who didn't want borrowers of a certain race  could show online offers only to people whose social network activity  fit certain parameters. Banks must report detailed statistics about  their actual lending activity to regulators, but web advertising  parameters are seemingly free of discrimination. By never putting offers  in front of unwanted groups, and thus never formally rejecting them,  those who engage in online discrimination could sidestep fair lending  and redlining laws that apply in the physical world.
 Most concern about data collection has focused on the government,  particularly after the revelations from former National Security Agency  contractor Edward Snowden. Crawford welcomed the increased skepticism  following the Snowden leaks but warns there is much potential harm from  commercial misuse of data, as well.
 “It's not that big data is effectively discriminating -- it is, we  know that it is,” says Crawford. “It's that you will never actually know  what those discriminations are.”
 Another problem can arise when collected data isn’t representative of  the entire population. For example, well-off people are more likely to  carry smartphones than the poor. Two years ago, the City of Boston  released an app called Street Bump that automatically sends reports  about potholes using data from smartphone sensors. But the city had to  be mindful that reports were more likely to come from areas with higher  phone ownership rates.
 Big data predictions and pigeon-holing can also be harmful when  wrong. A decade ago, some TiVo users spent weeks trying to convince  their machines to stop recording shows aimed at demographic groups they  weren't in. "
If TiVo Thinks You Are Gay, Here's How to Set It Straight," read one 
Wall Street Journal  headline from 2002. Mistaken algorithms today could scare off  employers, college admissions officers or others screening candidates  via big data. "If I predict something about you and I'm right, that can  be just as dangerous as if I predict something about you and I am  wrong," Crawford says.
 Crawford also wants to temper the excitement around studying  real-time Twitter activity to guide rescue efforts during natural  disasters. A review of activity on the social network during Hurricane  Sandy last year, for example, found that the peaks of activity occurred  not in places with the most damage or need for help, like the outskirts  of Queens and Staten Island, but in areas where Twitter use was most  prevalent, like Manhattan.
 Databases are now combining a vast array of different sources –  everything from the output of mobile apps and Web searches to radio tags  on items bought at a store and phone-location trackers.
 Even data scrubbed to remove personal references can be reconnected  to individuals. Cellphone carriers are selling collections of data about  phone movements, for instance, with all personal details removed. But a  group of researchers from MIT, the Universite Catholique de Louvain in  Belgium and other institutions looked at one such collection and were  able to pinpoint 95% of the unique users by analyzing just four GPS time  and location stamps per person.
 Several years ago, researchers at 
Carnegie Mellon University  were able to create a system to uncover Social Security numbers from  birthday and hometown information listed on social networking sites like  Facebook.
 All the studies point to a need  for additional protections and awareness, Crawford says. “We can't  afford to set up a system with no opt out and no protections for its  citizens,” she says. “Frankly, it doesn't take a science-fiction  scenario to realize what is at stake.”