Despite its many benefits, the emergence of high-performance machine learning systems for augmented analytics over the last 10 years has led to a growing “plug-and-play” analytical culture, where high volumes of opaque data are thrown arbitrarily at an algorithm until it yields useful business intelligence. What does this mean in terms of data audit? Let’s discuss it.
Data Audit and the Black Box Problem
Due to the black box nature of a typical machine learning workflow, it can be difficult to understand or account for the extent of the “dark” data that survives these processes; or the extent to which the unacknowledged provenance or unexplored scope of the data sources could legally expose a downstream application later on.