From a copyright perspective, it may be fair to say the UK presently finds itself at a competitive disadvantage in the field of text and data mining (TDM) when compared with some other jurisdictions. Notwithstanding recent attempts to usher in a very broad copyright exception for TDM (a move that would be welcomed by AI developers), as matters stand the UK has only a very narrowly circumscribed copyright exception in this area. That exception is in section 29A of the Copyright, Designs and Patents Act 1988. There are three key ‘narrowing’ elements to this exception.
Under section 29A, the would-be text and data miner is required to have lawful access to the copyright work they wish to interrogate. According to guidance issued by the UK IPO, lawful access “covers where researchers have the legal right to access a copyright work to read it; examples could include paying for a subscription to a journal or database or material published under open licences including Creative Commons and Open Government Licences licences.”.
This is not dissimilar to the concept of lawful access for the TDM exceptions under Articles 3 and 4 of the DSM Directive. The requirement of lawful access means that someone who proposes to mine text and data cannot simply and indiscriminately scrape any data they wish to interrogate. This is a significant limitation for the developer of an AI model who wishes to train it utilising vast quantities of data. In theory, they are required to obtain permission from each individual copyright owner or check the terms on which each dataset is made available, to ensure that it is lawfully accessible. Practically, that represents a significant challenge.
Subject to the requirement of lawful access, the copy of the copyright work can be used for computational analysis of anything recorded in that work for the sole purpose of research for a non-commercial purpose. This gives rise to two further questions: firstly, what is “research” and, secondly, precisely what is a “non-commercial” purpose?
In the context of Article 3 of the DSM Directive, the TDM exception covers “scientific research” (i.e., natural and human sciences) undertaken by “research organisations” (e.g., universities, libraries, research institutes) and “cultural heritage institutions” (e.g., publicly accessible libraries or museums, film archive heritage institutions).
There is, however, no statutory definition of “research” for section 29A. According to the UK IPO guidance, section 29A is not limited to students at school, university or college but extends to private study more generally although only if the person is “genuinely studying (like you would if you were studying for a college course)”. This is not particularly illuminating, but it suggests that research involves pursuing an investigation in order to obtain understanding, knowledge and information/data. Ostensibly, and in contrast to Article 3, research is not confined to scientific research.
So when is a “purpose” non-commercial? Recital 42 to the InfoSoc Directive suggests that the non-commercial nature of an activity should be determined by that activity as such. In other words, it is the activity itself that should be determinative of the issue rather than, for example, the nature of the establishment undertaking the activity (such as a commercial establishment). So, in theory, a commercial establishment can undertake TDM research for a non-commercial purpose. That is not to say that the commercial nature of the establishment is an irrelevant consideration. It is just not a determinative one. It is probably fair to say in practice, in most (but not all) cases, where a commercial establishment undertakes TDM research it is likely to be for a commercial purpose rather than a non-commercial purpose.
The expression ‘commercial’ can be read as including both direct and indirect economic and commercial advantages. This broad approach to the meaning of commercial is also reflected in the open source licensing community. For example, in the ‘Attribution – NonCommercial 4.0 International’ licence, the expression non-commercial is defined as “not primarily intended for or directed towards commercial advantage or monetary compensation”. So what exactly are these advantages? I have endeavoured to give some examples of what I think they could be in the table below (I am positive there are better examples):
Another consideration to ponder is that, while the research in question has to be undertaken solely for a non-commercial purpose, there is no restriction in relation to the outputs produced by TDM. In other words, there is no restriction on commercialising those outputs. Having said that, if there is an intention to commercialise the outputs when the TDM is undertaken, it seems likely that the purpose of the associated research will not be non-commercial. This effectively means that there is a ‘grey area’. If you can demonstrate that when the TDM was executed, it was executed solely for a non-commercial purpose and without any intention to commercialise the outputs, it may be open to you to rely upon the exception in section 29A, notwithstanding that the outputs are later commercialised. How you demonstrate this is anyone’s guess. I anticipate that you would require fairly compelling evidence of your intended purpose to persuade a judge it was a non-commercial purpose.
TDM is nothing new. It has, however, started to attract significant attention with the exponential growth in generative AI. Legal advisers (myself included) are having to consider afresh copyright exceptions, such as those for TDM specifically and others such as those applicable to temporary copies. There is a slew of litigation in the US against AI developers, which looks set to test the applicability of the copyright fair use doctrine to training data there. Getty Images has pursued Stable Diffusion in the UK for copyright infringement and, assuming the case does not settle, we may well see UK copyright exceptions under the microscope too.
Against this backdrop, the UK IPO is consulting with stakeholders following its recent failed attempt to introduce a permissive copyright exception for TDM. Anecdotally, those discussions are focussed on copyright licensing for inputs, but with little progress being made. A voluntary ‘code of practice’ is anticipated this Summer and it seems that the IPO is of the view that industry (i.e. the AI and creative sectors) should be able to resolve this conundrum between themselves, notwithstanding their own inability to reconcile these interests.
 The concept of lawful access is explained in Recital 14 (for Article 3) and Recital 18 for Article 4.
 ibid. 1.
 See Michel M Walter & Silke von Lewinski, European Copyright Law: A Commentary’ at paragraph 11.5.50)