Skip to main content

Blog

Accuracy of training data and model outputs in Generative AI: CREATe Response to the Information Commissioner’s Office (ICO) Consultation

Posted on    by Zihao Li
BlogDecentralisation, automation and platformsPolicy

Accuracy of training data and model outputs in Generative AI: CREATe Response to the Information Commissioner’s Office (ICO) Consultation

By 28 May 2024No Comments

In January 2024, the UK Information Commissioner’s Office (ICO) launched a consultation series on the use of generative AI models and the application of data protection law. As a research centre focused on technology regulation, CREATe is undertaking research that addresses the specific questions raised by the third call of ICO consultation, particularly concerning the accuracy of training data and model outputs.

Led by Zihao Li, Weiwei Yi and Jiahong Chen, our response emphasises that the accuracy of generative AI is increasingly critical as Large Language Models (LLMs) become more widely adopted. Due to potential flaws in training data and hallucination in outputs, inaccuracy can significantly impact individuals’ interests by distorting perceptions and leading to decisions based on flawed information. We critically engage with the ICO’s analysis regarding data protection accuracy and generative AI, arguing that although the ICO’s analysis provides a comprehensive understanding of data accuracy in generative AI, five overarching issues have been overlooked or underestimated, notably:

  1. Merely relying on the disclosure of statistical accuracy of the GenAI model is insufficient, since it could lead to an “Accuracy Paradox”. It refers to the unintended consequences of solely relying on the disclosure of a model’s statistical accuracy, which can lead to a misleading sense of reliability among users. As accuracy metrics improve, users may overly trust the AI outputs without sufficient verification, increasing the risk of accepting erroneous information.
  2. Increasing the accuracy of inputs, models, and outputs often comes with the cost of privacy, especially in GenAI context. This involves not only technical identifiability of the individuals involved, but also societal risks such as more accurate and precise targeting for commercial purposes, social sorting, and group privacy implications.
  3. Overreliance on developers’ and deployers’ accuracy legal compliance is not pragmatic and is overoptimistic, which could ultimately become a burden for users with the tendency of using dark pattern. In this context, GenAI developers and deployers could use such manipulative design to shift the responsibility for data accuracy onto users.
  4. We argue that content moderation as a tool to mitigate inaccuracy and untrustworthiness. As a critical role in ensuring the accuracy, reliability, and trustworthiness of GenAI, content moderation could filter flawed or harmful content, which involves refining detection methods to distinguish and exclude incorrect or misleading information from training data and model outputs.
  5. Accuracy of training data cannot directly translate to the accuracy of output, especially in the context of hallucination. Even though most training data is reliable and trustworthy, the essential issue remains that the recombination of trustworthy data into new answers in a new context may lead to untrustworthiness, as the trustworthiness of information depends on the context and circumstances.

In the end of our response, we also identify several measures that organisations can use to improve statistical accuracy or minimise inaccuracy. For example, zero-shot fact verification, implementing automated error correction systems within LLMs, and dynamic real-time information injection can, to some extent, alleviate issues of content hallucination and data relevancy, thereby improving the accuracy of model outputs.

Addressing these issues requires an interdisciplinary approach that brings together expertise from technology, law, and ethics. Ongoing research and dialogue with stakeholders, including policymakers, industry leaders, technology designers and civil society, are crucial for adapting to the rapidly evolving landscape of AI. At CREATe, we are committed to contributing to this multi-faceted effort. Please stay turned for our upcoming research and interdisciplinary research dialogue.

You can access the text of the response below.

Final_CREATe_ICO Policy Consultation Response