Skip to main content


OpenMinTeD: Opening up Text and Data Mining

Posted on    by Thomas Margoni

OpenMinTeD: Opening up Text and Data Mining

By 16 October 2017March 18th, 2021No Comments

OpenMinTeD (Open Mining Infrastructure for Text and Data) is the H2020 e-infra project aiming to develop a registry for text and data mining services and tools. This will allow researchers, research institutions and data providers to find, use and combine resources for text and data mining (TDM) purposes thereby enhancing the scientific playing field of the EU.

The project is run by a consortium of 16 EU partners. CREATe/University of Glasgow coordinates the legal interoperability activities (as part of working group WG3), thanks to a team formed by more than 20 specialists with an interdisciplinary background under the scientific lead of Dr Thomas Margoni, and with coordination of Dr Giulia Dore and other CREATe fellows.

WG3 has been investigating, on the one hand, the causes and the degree of the limits imposed to text and data mining under the law of copyright and related rights – e.g., sui generis database right – and, on the other hand, the complex licensing framework in which the resources to be mined are set.

Among the early findings of the project, it has emerged that interoperability and standardisation are indeed crucial for the full development of text and data mining. This is especially true in the fragmented and often inconsistent EU legal framework within which EU TDM researchers and platforms operate. One of the recommendations of the project in order to enhance TDM and more generally R&D in the EU is the introduction of a general and open exception under EU IP law for TDM and similar uses. This would represent a decisive factor to enable TDM activities in the EU in the same way they are already successfully employed across the globe. Failing to do this would condemn the EU scientific and socio-economic sectors to fall behind in such a strategic sector.

At the same time, for the short term the development of licence compatibility tools appears to be the best possible solution. Among these, the Licence Compatibility Matrix (now released in beta version) will help researchers to determine whether the licensing terms of given resources are compatible and can therefore allow the resources to be combined and remixed. Other supporting and training materials include the Open Science fact-sheet and the Open Access FAQs, as well as webinars, slides and blog posts. OpenMinTeD promotes the adoption of its solutions through a number of public initiatives, the most recent represented by the Open Science Fair, the largest Open Science conference organised in the EU, held on September 6-8 in Athens, Greece.

With only eight months left, it can certainly be  said that a number of important goals have been reached in terms of the development of an open TDM infrastructure at the EU level. The role of CREATe as a leading research centre in the field of copyright, promoting interdisciplinary research and supporting novel methodological approaches to the study of copyright has been crucial for the progress of the project. Its contribution to the legal interoperability working group has proven essential to the development of the compatibility tools, thanks to the valuable support of CREATe researchers, including Dr. Kristofer Erickson who also serves as an external expert for the project.

The next step is to develop an interactive tool based on the current Matrix, which aims at helping researcher determining with even more ease and automation the compatibility of the resources they wish to combine. The idea of such automated tool has been very much appreciated by the reviewers of the project and praised by the research communities engaged with TDM activities. This step is  an experiment and poses some challenges, which will require the involvement and coordination of programming and design experts who will work together with the legal team. Another examples of the importance of the CREATe scientific and methodological contribution to this project.

Additional information regarding the project status together with its current developments and results are visible at its official website, while other interesting outputs and events about the project are reported on, on the CREATe Blog and OpenAire website.