EU copyright law in relation to AI training models
The mining of big data and machine learning require the compilation of corpora (e.g. literary works, public domain material, data) that are often “available on the internet”. The collection stage is usually followed by the processing and annotation of the collected data, depending on the type of learning (supervised/unsupervised) and the purpose of the algorithm. Copyright law has a direct impact on this process, as the corpora could include works protected by copyright and, any digital copy, temporary or permanent, in whole or in part, direct or indirect, has the potential to infringe copyright (Art. 2 InfoSoc Directive). Furthermore, the changes made in the collected material can amount to ‘adaptation’ and the relevant exceptions, such as research or text and data mining, might not sufficiently cover these activities of the stakeholders in this area.
This project will analyse case studies on data scraping, natural language processing and computer vision to assess whether the current legal framework is well equipped for the development of AI applications, especially in the field of machine learning, or, if not, what kind of measures should be developed (legal reform, policy initiatives, licences and licence compatibility tools, etc).