Vast quantities of data about individuals are increasingly being created by services such as mobile apps and online social networks and through methods such as DNA sequencing. These data are quite rich, containing a large number of fine-grained data points related to human biology, characteristics, behaviors, and relationships over time. They hold tremendous potential for scientific inquiry, as they can enable researchers to explore research questions at an unprecedented level of detail. In addition, the costs of obtaining, storing, and analyzing these types of data are quite low, and falling, relative to the costs of conducting traditional research studies. For these reasons, large-scale data are leading growth in fields such as computational social science and biomedical big data research. As just one example, public health researchers are supplementing traditional methods of disease outbreak detection with streams of data from social networks, chat rooms, and web search queries. Interest in big data for research is expected to continue to rise as the number of large-scale data sources rises and the capabilities for big data analysis advance. We argue, however, that the current research framework is ill-suited to the oversight of large-scale data research.
Download