The growing availability of large datasets and powerful analytical tools is creating both methodological and ethical issues for data scientists in the academy and industry alike. Qualitative methods, developed in the social sciences, offer data scientists a new toolbox to apply to these challenges.
Despite increasing awareness of its limitations, machine learning is still being used to automate a wide variety of decision-making. These decisions sometimes have terrible results for people who find themselves unable to secure a job or trying to disprove the existence of a large debt when a machine has made a decision and there is no human to appeal to. Part of the process of building any predictive model should be rigorous engagement with the underlying data. Better attention to the datasets used to make decisions would help data scientists to fulfil their ethical duties and avoid creating tools that cause further harm to already vulnerable people. Researchers in qualitative methods, such as social science researchers, are deeply engaged with the questions of how to understand data as a product of human endeavour. While some big data enthusiasts have argued that the larger the dataset, the more closely it reflects reality, critics from anthropology and related disciplines have argued that any dataset is the result of human decision-making and each researcher comes to it anew with their own questions and assumptions. The archives are haunted by those who went before us to curate our data and the data itself is re-animated each time a new researcher engages with it. This presentation will argue that that data science, especially when it is applied to questions of human behaviour, shares as much with the social sciences as it does with computational sciences. By embracing this complexity, a richer set of approaches to data becomes available. Rather than trying to approach data as inert and unchanging, we become sensitive to the presence of those who have gone before us to shape it and more attuned to our own expectations. Learning to see these ghostly collaborators, data scientists have the opportunity to bring new nuance and subtlety to their work and build a more ethical and responsible data science.
Fiona Tweedie loves data in (almost) all of its forms. She is particularly interested in balancing open data with ethics and privacy and the intersections between the humanities and sciences in data-intensive research. She came to Python teaching text mining to postgraduate researchers after her experiences running GovHack in Melbourne persuaded her that she wanted to learn to code. She currently works as the data policy advisor in the University of Melbourne’s Digital and Data team. Fiona holds a PhD in Ancient History and will also talk to you about the Roman conquest of Italy