This post is part of a series of evidence summaries for the 21 for 2021 project, a CREATe project within the AHRC Creative Industries Policy and Evidence Centre (PEC). The 21 for 2021 project offers a synthesis of empirical evidence catalogued on the Copyright Evidence Portal, answering 21 topical copyright questions for the 21st century. In this post, Thomas Margoni (research professor of intellectual property law at the Faculty of Law, KU Leuven) explores the need for more empirical evidence relating to computational uses of copyrighted works.
Text and Data mining (TDM), or more comprehensively computational uses, refer to the use of, often copyright protected, material by computers for purposes different than their direct consumption as works. These may include a variety of data analytics approaches such as knowledge extraction, information discovery, pattern recognition (De Wolf et al, 2014) and – more recently – machine learning (see CREATe resource page on legal approaches to data). TDM’s diffusion in fields such as computer science, statistics, economic analysis and life sciences can be traced back a few decades (illustratively 150 out of 842 studies indexed in the Copyright Evidence Wiki employed this method). However, it wasn’t until the early 2010s that the debate regarding their proper legal classification as either a copyright relevant act, or as an activity alien to copyright’s remit, has occupied the agendas of policy makers, legislators and the courts. Indicative in this sense is that all four studies on TDM or computational uses indexed in the Copyright Evidence Wiki are from either 2014 or later (De Wolf and Partners (2014); Guadamuz and Cabell (2014); Handke, Guibault, and Vallbe (2015); Handke et al. (2021))
Timeline visualisation of distribution of studies relating to ‘mining’ (noting that pre-2014 studies relate to mining as a methodology, rather than topic of research)
In the EU, the UK was the first (at the time) Member State to implement a TDM exception in 2014 as a reaction to the influential Hargreaves Report of 2011. In the U.S., the “Google Books” saga identified web mining as transformative, and thus fair use, a couple of years earlier. Japan introduced an exception dedicated to “information analysis” in 2009 (De Wolf et al, 2014). A handful of other EU Member States had enacted TDM exceptions into domestic law, usually on the basis of Art. 5(3)(a) InfoSoc Directive (i.e. within the framework of the teaching and scientific research exception) during approximatively the same time frame (Geiger et al, 2018).
The recent Copyright in the Digital Single Market Directive (CDSM) introduced two new TDM exceptions in the EU Acquis. Their formulations appear detailed enough as to reduce to a minimum the margin of discretion of Member States in their implementation, even though early indications already point towards some potentially meaningful divergences. Given the broad definition of TDM provided in the CDSM, which often captures most of the steps essential in a variety of data-driven analytic and technological processes such as Artificial Intelligence (AI), it seems more accurate to refer to these types of non-consumptive uses as computational uses (Margoni 2021). For the same reason, it seems important to call the attention of the scholar community to the regulatory approach taken by the EU – an approach that marks a disconnect with the trends identifiable at the international level. It can be said, in fact, that most of the EU AI development (at least the part that relies on learning from copyright protected works and other protected subject matter such as the Sui Generis Database Right) has been made dependant on an exception: either on a narrow exception that urges commercial actors to seek licenses – a market that evidence shows has not formed yet – or on a broader one which nevertheless can be “opted-out” and therefore leaves the amount and type of training material to a private ordering determination. Outside the EU, whereas approaches follow heterogeneous patterns, they all seem to share a higher degree of pro-innovation elements. Finally, it seems likewise important to consider the empirical evidence available to date – even if limited at the moment – to formulate an initial assessment of these different approaches. This is especially so given that a number of countries (e.g. Canada) are currently considering the implementation of computational uses provisions which will have a long-lasting impact on the way in which AI and more generally new technologies will evolve.
Debates and recent evolutions
The debate on the implementation of dedicated computational uses (or text and data mining) exceptions can be summarised in two main positions. The first one stems from the fundamental principle that copyright has never been intended to regulate the mere use of protected works, (e.g., reading, listening, viewing) but only their reproduction (e.g., making of copies of protected works) and further circulation. Accordingly, it has been proposed that also computer-performed “mere” uses should follow the same principle and thus the “right to read is the right to mine” (Guadamuz & Cabell 2014). Under this point of view, TDM – similarly to any other type of (computational) use – should be simply considered as a non-copyright relevant act and therefore be freely available to all interested parties.
On the other hand, there is the view that the temporary and incidental copies that characterize most, if not all, digital uses – and therefore also most computational uses or TDM activities – are sufficient to trigger the right of reproduction. According to this view, computational uses need to be authorized either by law (a dedicated exception) or by right-holders (e.g, contract, consent and/or a copyright license). This latter position seems to have encountered the favour of the European courts, in particular of the EUCJ in cases such as Infopaq I and II, where it has been acknowledged that temporary and transient copies needed in data capture processes were sufficient to trigger Art. 2 InfoSoc Directive. Simultaneously, the Court also clarified that such temporary copies shall be excused on the basis of Art. 5(1) InfoSoc Directive (i.e. the exception for certain acts of temporary reproduction) when the cumulative conditions of this article are fulfilled. Against this legal framework, scholars have argued that computational uses, given their strong pro-innovation, pro-competitive and pro-freedom of expression nature, deserve a well-defined regulatory framework (Geiger et al 2018; Margoni 2018; Rosati 2018; Flynn et al 2020; Ducato and Strowel 2020; Margoni and Kretschmer 2021).
Clarity and legal certainty in this area would allow public, commercial, and distributed initiatives to employ computational techniques while relying on a clear legal basis with positive effects for all stake holders affected (De Wolf et al 2014). In the midst of this debate, the EU Commission Impact Assessment identified four possible approaches to the regulation of computational uses (TDM is the term employed in most EU legislative and policy documents). Of the four options – ranging from industry self-regulation to an exception available to anybody but limited to scientific research purposes – option three (“Mandatory exception applicable to public interest research organisations covering text and data mining for the purposes of both non-commercial and commercial scientific research”) was identified by the EC as the most appropriate. Option three formed the foundation of what has become Art. 3 CDSM, with some important additions and modifications encountered throughout the trialogue legislative process, chiefly the provision allowing the retention of copies of works for the purpose of scientific research including for the verification of research results (Art. 3(2)). Another important addition to the field of computational uses that took place during the trialogue was Art. 4, which introduced a TDM exception available to any beneficiary for any purpose, but subject to an “opt-out” clause; that is to say, right holders have the possibility to expressly reserve the use of the work in an appropriate manner. As it is well known, only a handful of Member States have met the transposition term and therefore it is still too early to properly assess the impact of these provisions under domestic law. It seems however that the German and the Dutch texts omit the reference to the requirement of an “express” reservation. If confirmed, the consequences of this omission in the text of the implementing legislation will need to be assessed in the light of the broader legal framework and contractual praxis of the Member State. Nonetheless, it is arguable that if a legal consequence can be attributed to the missing word, then the transposition is prima facie not in compliance with the EU mandate.
Existing evidence and research agendas
Empirical evidence in this area is limited. Handke et al (2015; 2021) show that where TDM requires the express consent of rights holders, it makes up a significantly lower share of total research output. Methodologically, these studies are based on bibliometric data and quasi-experimental research designs and identify strong evidence that copyright exceptions or limitations promote the adoption of computational uses-based research. The authors acknowledge that their studies are among the first not only to attempt measuring the impact of TDM exceptions, but also to empirically document an adverse effect of intellectual property on innovation under particular circumstances. Another interesting conclusion of the studies is that there seems to be a market failure in relation to the licensing of data for academic data mining. This seems a relevant element in assessing the EU framework which has explicitly relied on the needs to safeguard licensing business models as a basis for the creation of Arts. 3.
Word cloud visualisation showing the prominence of “research” and “academic” references in TDM studies
Handke et al’s findings are timely and highly relevant, especially in the light of the recent enactment and current transpositions of Arts. 3 and 4 CDSM. As introduced above, Art. 3 creates a TDM or computational uses exception to the right of reproduction for research organisations and cultural heritage institutions acting for research purposes when they have lawful access to works or other subject matters. Whereas some legal certainty is certainly welcome in this field, it has been argued (Margoni and Kretschmer 2021) that the conditions to benefit from this exception are so restrictive that, beyond the narrow scope of Art. 3, EU based users (government, firms, NGOs, SMEs, journalists, etc) will need to acquire licenses from right holders. When this is the case, the problems identified by Handke at al, in particular those connected with market failures in the data licensing market, will resurface.
On the other hand, Art. 4 creates a mandatory exception for Member States – but optional for right holders who have the option to expressly reserve such uses – that extends beyond the limited pool of beneficiaries of Art. 3. This is another important intervention in this field which arguably possesses a stronger potential to enable computational uses beyond the academic and cultural sectors, thereby addressing more directly some of the issues identified in the empirical literature presented above. Some shortcomings may nonetheless be identified in Art. 4. The possibility for right holders to “opt-out”, by expressly reserving this use certainly stands out. It seems arguable that large commercial right holders representing vast repertoires will certainly make good use of this “opt-out” possibility.
Other elements, common to Art. 3 and Art. 4, which may fuel the concerns expressed in the literature supporting wider exceptions relate to: the need to have lawful access to the sources; for the exceptions be limited only to the right of reproduction, and; to the unclear relationship between the exceptions and technological protection and integrity measures. Regarding the specific topic of technological protection measures (TPMs) applied to copyright law in general, not specifically to computational uses, it is interesting to note that existing evidence shows the anti-competitive, anti-innovation and anti-freedom of expression effects of both TPMs (Gasser 2004; Fiesler 2020) and contracts (Tan 2014), with their ability to effectively circumvent exceptions and limitations (Favale 2011).
On the other hand, provisions that seem to be aligned to the needs of innovation, competition, and also more broadly to fundamental issues of transparency and accountability, are reflected in the exception to retain copies of the analysed works for verification purposes and in the ban of contractual arrangements contrary to Art. 3. The former, also referred to as the “storage provision” deserves a particular mention. It not only identifies a crucial step in the process of training AI algorithms, but it also probably represents the first formal recognition at the international level of such an enhanced transparency requirement.
Future directions for research
Empirical research in this area is limited and therefore it is even more essential. This seems especially true when the limited evidence produced to date challenges the very same theoretical foundations of copyright law, i.e. that copyright protection is needed to foster further innovation in the cultural, scientific and technological fields. With the EU and other countries across the globe approaching the issue of computational uses (such as in the transposition of Arts. 3 and 4, or in the reported open consultation launched by the government of Canada), the availability of a solid empirical body of evidence will be crucial to offer clear support for one of the two main directions that this sub-field of IP law has identified: an exception limited as of uses, beneficiaries, rights, types of works accessed and types of access, on the one hand, and broad, flexible and future-proof provisions enabling computational uses and technological development on the other.
Another area that should be followed with interest relates to the phase of transposition into domestic law of Arts. 3 and 4. As it has been pointed out above, there seems to be the risk of divergent implementations of the reservation obligation, where it appears that certain national implementing acts omit the reference to the requirement of “express” reservation. More research on this aspect and of its real-life implications will certainly be needed in order to monitor this process and ensure that this intervention complies with their stated goals, including to ensure cross-border uses.