This is a report by Thomas Margoni and Bartolomeo Meletti, of an event organised by Bartolomeo Meletti, jointly Copyright Education Creative Director for CREATe and Copyright Services Delivery Manager of Learning on Screen.
On Friday 14 February 2020, Learning on Screen organised the event AI and Audiovisual Archives, to explore some of the immense opportunities that AI can represent for an organisation that in over 70 years of operations has collected a unique archive of more than 2.2 million audiovisual works (the BoB archive).
The event – held in London and chaired by Bartolomeo Meletti in his capacity as Copyright Services Delivery Manager of Learning on Screen – reflects one of the first major steps taken by Learning on Screen to explore the possibility of making its extensive archive of TV and radio broadcasts available for research purposes.
The event programme included discussions and panel presentations from:
- Cath Sleeman – Head of Data at Nesta, on ‘How can big data improve our understanding of gender inequality in the creative industries?’
- Max Cleary – Partnerships Lead – AI, Immersive & 5G, Digital Catapult
- Thomas Margoni – Senior Lecturer in Intellectual Property and Internet Law at CREATe, on ‘AI, machine learning and the role of copyright in protecting non personal “data”‘.
Learning on Screen has recently started a pilot project in collaboration with Nesta to investigate the challenges and opportunities related to using the BoB archive to train algorithms and for other research purposes. The project, led by Cath Sleeman and Raphael Leung (Nesta), will analyse a sample of the BoB archive to explore how computer vision can be used to study diversity and representation on British TV, with a focus on gender inequality.
The BoB archive is composed of copyright protected works such as films, TV programmes and sound recordings. At the event, Thomas Margoni presented some of the work developed by CREATe in the field of data, AI and Open Science (see here and here) and discussed the opportunities as well as the legal hurdles that the employment AI tools such as Text and Data Mining (TDM) and machine learning to extract information from copyrighted works may encounter. Particularly relevant in this field are the issues of the protection of non personal data (e.g. factual data) and the extraction of non protected information from protected works.
Given the fact that data-driven AI needs this type of information in very large amounts in order to learn to perform specific tasks, it is essential to know whether, and under which condition, data can be reused, including when it can be extracted from otherwise copyright protected works. Naturally, the availability of specific exceptions and limitation to copyright is a necessary complement to a proper discussion of these problems. In fact, if certain data are protected as such or as part of a copyrighted work, certain exceptions such as temporary copies, research and, more recently, TDM can be employed to train models used by AI systems. However, exceptions such as those contained in the UK CDPA or in equivalent European legislation, are narrowly framed and must be narrowly interpreted, as repeatedly stressed by the courts.
For instance, the TDM exception contained in Sec. 29A of the CDPA 1988 is limited to non commercial research, whereas the TDM exceptions contained in the new EU CDSM Directive (which will not be implemented into UK law), are limited to research purposes by research organisations (Art. 3), or when they are generally available for any type of use can be limited by contract (Art. 4). This approach should be compared with other jurisdictions such as Canada, the US, Singapore or Japan, just to name a few, where TDM is allowed under much more generous conditions, usually to everyone with no (or very little) limitations as to beneficiaries or types of use. In other words, in these other legal systems the balance between the protection of investment and the promotion of innovation has been struck more favourably towards the latter. The UK should reflect on these considerations and establish whether the current legislative framework, limiting TDM to non commercial research, is the best formulation that can be achieved in the long lasting tension between two important interests (investments v innovation), and what space is left for the rights of users to access, combine and create new knowledge.
CREATe has been developing a dedicated research theme to the issues of AI, machine learning and the role of data under the broader concept of Open Science. If you are interested in future developments in this field, please follow our research!