Skip to main content


New Working Paper – The Law of Data Scraping: A review of UK law on text and data mining

Posted on    by CREATe Team
BlogWorking papers

New Working Paper – The Law of Data Scraping: A review of UK law on text and data mining

By 30 March 2021May 13th, 2021No Comments

CREATe presents the second entry in our series of working papers released in 2021: “The Law of Data Scraping: A review of UK law on text and data mining” by Sheona Burrow, a (part time) postdoctoral research fellow at CREATe, University of Glasgow.

Click on image to download

The digital economy is driven by data, and the quantity of data generated by individuals and businesses is now almost beyond comprehension. Techniques and tools that can make sense of ‘big data’ are being developed and explored, but the legal environment for those seeking to make sense of third-party data can be confusing at best, and prohibitive at worst. In 2018, the General Data Protection Regulation tightened up the environment in relation to personally identifiable data for individuals, creating another layer of lengthy and complex rules for data controllers and processors. This maps onto forty years of debate on how best to protect databases – whether through existing intellectual property regimes, competition law, or through special protections that sit outside these traditional frameworks. Fundamental questions are unresolved: who owns data? Can data be treated like property? How can investment in data be recognised and supported and protected from competitors? Should it be protected at all? Does data have public value?

Those looking to the law for answers will emerge disappointed. Legislation tries to address questions of ownership and investment, resulting in a piecemeal collection of protections for data in copyright law, the sui generis database right (in the UK and Europe), data protection law and competition law. Our common law and the acquis communitaire (as still applicable in the UK post-Brexit) allow for further protection in contract and confidentiality. Yet, applying these webs of protection, case law is limited and often fact specific. In 2014 the UK enacted a (limited) copyright exception for text and data mining (TDM) with a view to removing some of the perceived barriers to engaging in the exploitation of data for non-commercial research. CREATe Director Professor Martin Kretschmer and Research Professor Thomas Margoni discussed the issues with this exception in 2018. As this paper discusses, the TDM copyright exception only addresses one facet of the multi-layered protections for data – and in a very limited way. Ultimately those seeking to exploit big data still need to balance a complex range of legal considerations before engaging in data scraping or TDM, which could (or may already) result in a chilling effect on data analysis.

Where to next? In the 1980s and 1990s the discussions around the implementation of the new sui generis database right often focused on normative questions of how data should be protected. In the focus to address issues with the current UK intellectual property system since the publication of the Hargreaves Review in 2011, perhaps some of those normative questions around why and how we should protect data have been lost. Tinkering around the edges of fundamentally different legal protections is unlikely to give users the certainty they need to exploit data to best effect. In 2017 the European Commission put forward a case for a ‘data producers’ right for non-personal data and Aplin and Bently recently put forward the case for global mandatory fair use. However, for those engaging in TDM in the UK just now, these ideas are unlikely to change the legal landscape quickly enough to answer the question ‘can I do this’.

This research forms part of the CREATe Open Science series and was supported by the ESRC Urban Big Data Centre (UBDC) at the University of Glasgow. Through its research and national data service functions UBDC promotes the use of big data and innovative research methods to improve social, economic and environmental well-being in cities. UBDC publishes world-leading research in the social sciences and other disciplines that is distinguished by its critical engagement with debates about new forms data and data-driven urban analytics. In addition, it works to enhance the quality and accessibility of urban big data and methods for urban analytics, supporting a wide range of applications and users.

UBDC’s researchers and information specialists acknowledge that enablers and barriers associated with access to data and adoption of data-driven methods go far beyond technical issues, demanding consideration of factors such as information governance, economic, commercial and political influences and the law. UBDC’s support of this working paper originated in conversations around the feasibility and legal legitimacy of collecting online data at scale using web scraping. Centre contributors recognised widespread uncertainty regarding the implications of intellectual property, contract and privacy legal domains. Risk exposure for those collecting data online in support of academic research work is poorly understood, discouraging innovation and, to a lay audience, seemingly undermining the potential value of legal principles such as the text and data mining copyright exception. It was hoped that this analysis could offer some clarity for social data science researchers, and enable a more informed calibration of risk appetite.

For those interested, the paper also incorporates Appendices summarising the key cases on database protection and the main literature in relation to database protection from 1981 to 2020.


Data is perceived to be a key asset in the digital economy. Many governments have been keen to promote and exploit data driven economies. Data scraping is a widely used technique that automatically extracts information from different (often online) sources, whilst data mining is the machine reading of data to identify useful information not immediately obvious on human reading. In 2014, the UK implemented a limited exception to copyright law for text and data mining (TDM). However, copyright is only one layer of legal protection available to ‘data’ and the protection of data has been the subject of a long-running tension between property based rights and concurrent protection for data owners in liability rules arising through competition and contract law. Maintaining an appropriate balance between protecting rightholders and users has remained problematic. This paper summarises the legal protection available in the UK for different types of data, and the (limited) interpretation of that protection by the UK courts. The analysis is situated in a review of the academic literature. Ultimately this paper will conclude that the layered protection for data is confusing for end users, and that the case law on the protection and exceptions available to those seeking to engage in TDM limited and fact dependent.

The full paper can be downloaded here.