In the area of data quality, data protection and data governance, representativeness is a key factor. This concerns whether training data is sufficiently complete to avoid bias. Transparent provenance describes where the training data originates and whether it was collected responsibly. Data quality also includes the timeliness and accuracy of the data, meaning it must be clean, up-to-date and consistent. This includes not only raw data but also the quality of labels for both training and test data. Clear guidelines should exist, and methods such as redundant multi‑labelling should be used.
Protection of personal and proprietary data is essential: sensitive, internal information that is not publicly accessible and may provide a competitive advantage must be secured. The protection of individuals is equally important, particularly regarding identifiability, such as might occur in image processing. Data governance describes the extent to which there is a strategic framework of rules, processes, roles and policies that ensures the availability, quality, integrity and security of data throughout its entire lifecycle.