What data scientists can learn about ethics and privacy from the example of OkCupid.
Ever looked for a product online and noticed the same or related product suggestions pop up on Google and Facebook? Ever wondered how the keyboard suggests exact words or emoticons? The key to such a highly intuitive behavior on part of a machine lies in data science.
Data science uses different algorithms and machine learning to extract hidden patterns from the raw data. In just a few short years, data has become one of the most valuable commodities in the global economy. Data science allows us to dig into the deepest aspects of a human by studying the data generated by their online/offline activity.
Several industries like finances and investment banking, airlines, and computers leverage data science running their operations. Data science helps not only in retrospective analysis, but it also helps in prophesizing.
Alter-ego of data science
But like most things in life, data science has its pros and cons. Storing and managing huge volumes of personal data belonging to millions of individuals brings up issues like privacy, data protection, and data leakage. After all, critical data could be misappropriated by hackers, scammers, and identity thieves. An individual could have their identity distorted or their entire financial portfolio erased in a matter of minutes.
From dating to data leaks
Let’s look back at the blunder made by OkCupid a few years ago. A group of researchers publicly released a dataset of nearly 70,000 users of the online dating site. It comprised of their usernames, age, gender, location, relationship preferences, etc. It also included their responses to various profiling questions used by the site.
On asking the researchers whether they had attempted to anonymize the dataset, lead researcher – Emil O. W. Kirkegaard – nonchalantly stated that the data had been made public. He further added, “Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.” This infused a certain kind of rage amongst those who were sensitive towards their data.
Data science gives access to every platform that an individual has approached. In the above scenario, what looked like a move made with benevolent intentions, turned out to be a breach of privacy.
Implications for big companies
Data collection involves ethical considerations that may or may not be followed. For leading companies like Google, only consumer beneficial data is recorded and obtained. Research ethics involves protecting the privacy of the consumers or individuals, obtaining informed consent, maintaining the confidentiality of any data collected, minimizing any harm. When these limitations are compromised, it can destroy public trust and invite expensive lawsuits.
Data science is vast and has grown its roots in every industry. With a large chunk of the global population hooked up to social media and e-commerce, companies have access to large volumes of intimate information. It is important that organizations develop processes to safeguard such information for the sake of ethics and customer trust, and consequently business.
Regulating data sharing is necessary
In my opinion, there should be a regulatory framework around making information public. Data science must be used as a way to comprehend users and hence articulate methods to improve services and eventually prove beneficial for both – the organization and the end user. If at all it is vital for information to be publicly revealed, individual consent must be taken into consideration. Data science should only be used to make relevant information public – that which benefits the public at large.