AI Training – Misunderstood and Misrepresented

AI Training - Misunderstood and MisrepresentedIntroduction: The Need for Clarity in AI Understanding

The landscape of AI and journalism is rapidly evolving, but it’s clouded by significant misconceptions, as highlighted in OpenAI’s recent blog post about their interaction with journalism, specifically addressing the lawsuit filed by The New York Times. This misunderstanding centers on how AI, particularly Large Language Models (LLMs) like ChatGPT, is trained. Influenced by lawsuits and media portrayals, the general public compares AI training to something like making photocopies or storing text in a database, or file on a hard drive. However, the reality of AI training is far more complex and nuanced, involving processes like vectorization, which is fundamentally different from simple copying or storing.

The Core of the Misunderstanding: Vectorized Data vs. Photocopying

To illustrate the distinction, consider this analogy: If a business hires someone to check out three books from a library, not to photocopy them, but to count all the words and letters, analyze the writing style, and then use that analysis to write something new, this falls under ‘Fair Use’. AI training through vectorization is akin to this process. It’s about understanding patterns, styles, and structures of language, rather than simply replicating text.

AI Training - Misunderstood and MisrepresentedThree Real-World Analogies to Demystify AI Training

  1. Cooking from a Recipe Book: Imagine a chef who reads several recipe books, learns various cooking techniques and ingredient combinations, and then creates a new dish. The chef doesn’t photocopy the recipes but uses the acquired knowledge to innovate. Similarly, AI learns from data to generate new, original content.
  2. Music Composition: Consider a musician who listens to different music genres, absorbing rhythms, melodies, and harmonies. They then compose a unique piece of music inspired by what they’ve learned. This process is not about copying but about understanding and creatively applying musical concepts.
  3. Architectural Design: An architect studies various building designs and architectural principles from multiple sources. They then design a unique structure influenced by these concepts but not directly copying any single design. This is analogous to how AI models learn and create.

Historical Correlations: AI and Past Innovations

AI Training - Misunderstood and MisrepresentedThe current situation with AI mirrors past technological advancements. In the 90s, the rise of the Internet brought similar legal and ethical debates about information sharing and copyright. Before that, the evolution of records, tapes, VHS, and photocopies faced their own challenges. Each of these innovations initially met with resistance and misunderstanding, much like AI today. However, they eventually became integrated into society as their true nature and potential were understood.

Conclusion: Embracing AI’s True Potential

The key to moving forward is education and open dialogue. Just as society adapted to past technological changes, understanding AI’s true mechanism is crucial. AI, through its advanced training methods like vectorization, offers immense potential for innovation across various fields, including journalism. It’s time to shift the narrative from misrepresentation to a more informed and constructive understanding of AI’s capabilities and limitations.


  1. What is the main misconception about AI training? The primary misconception is that AI training is akin to making photocopies or storing text, whereas it’s more about analyzing and understanding patterns in data to create something new.
  2. How is AI training like a chef using a recipe book? AI training is similar to a chef who learns from recipe books and then creates a new dish. The chef doesn’t copy the recipes but uses the learned techniques and ingredients to innovate.
  3. Why is the analogy of music composition relevant to AI training? Like a musician who composes a unique piece after absorbing various musical styles, AI models generate original content by understanding different data patterns, not by copying them.
  4. Can you draw a parallel between AI and past technological innovations? Yes, the evolution of AI is similar to past innovations like the Internet, where initial resistance and misunderstanding eventually gave way to acceptance and integration into society.
  5. Why is it important to understand AI’s training mechanism? Understanding AI’s training mechanism is crucial for recognizing its potential for innovation and addressing legal and ethical concerns in an informed manner.
Read OpenAI’s statement on their website: