‘Despite all the hype, machine learning is not a be-all and end-all solution. We still need social scientists if we are going to use machine learning to study social phenomena in a responsible and ethical manner.’


I came across this quote in an article in Communications of the ACM[1][2] by machine learning specialist Hanna Wallach. The article is called ‘Computational Social Science ≠Computer Science+ Social Data’.

Hanna points out that the kind of machine learning conducted by computer scientists and the kind of big data analysis needed by social scientists are very different things. Traditional machine learning applications are primarily for tasks like recognising handwriting or classifying images. She calls these ‘prediction tasks’. Prediction tasks use observed data to reason about future or yet-to-be-observed data.

According to Hanna, that means that computer scientists are happy to work with ‘arbitrarily complex black boxes that require large amounts of data to train. For example GoogLeNet, a “deep” neural network, uses 22 layers with millions of parameters to classify images into 1000 distinct categories.’

Social scientists’ use of big data is different. Rather than classify, social scientists generally want to build a model about human behaviour that makes sense in the real world. In Hanna’s case, her question was whether local government officials would be more likely to comply with a request for public records if they knew that peers in other organisations had already done so. To study this question, she collected half a million volunteered public emails from (U.S.) local government, effectively creating a very large dataset. After she created it, she realised that this dataset could be used for the kind of machine learning described above, creating the potential for significant privacy concerns.

Privacy is obviously important, but there is another ethical problem here. It is that datasets like these have inbuilt biases which are not observable to the machine or the naked eye. The most common bias is the under-representation of minorities. How can anyone tell which segments of the population the conclusions drawn from these datasets are valid for? You can’t.

Her conclusion: ’we must treat machine learning for social science very differently from the way we treat machine learning for say handwriting recognition or playing chess.... we need transparency. We need to prioritise interpretability... and conduct rigorous detailed error analyses ... and most importantly work with social scientists to understand the ethical implications and consequences of our modelling decisions.’

The answer to her question incidentally, was ‘yes’.

To buy a copy of the article, contact This email address is being protected from spambots. You need JavaScript enabled to view it.


[1] Communications of the ACM. 03/2018 Vol 16. No. 3

[2] The ACM is the Association for Computing Machinery

Tags: Market Research, Artificial Intelligence, Machine Learning, Ethics, Big Data