Text Mining and Copyright

The electronic analysis of large amounts of copyright works allows researchers to discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading. This process, known as ‘text and data mining’, may lead to knowledge which can be found in the works being mined but not yet explicitly formulated. For example, the processing of data contained in a large collection of scientific papers in a particular medical field could suggest a possible association between a gene and a disease, or between a drug and an adverse event, without this connection being explicitly identified or mentioned in any of the papers.

Scientific publishers offer various text mining functionalities which have been developed in collaboration with researchers and private companies. For instance, Reed Elsevier makes available to subscribers an application developed with NextBio (a company specialising in biomedical text mining) that enables readers to conduct ‘deep searches’ and to make automatic connections between data contained in scientific articles and additional information about genes, diseases, and so on.

Google Books, one of the largest existing collections of digitised books, offers a ‘text mining experience’ to all users through Ngram Viewer, a graphic tool created in collaboration with researchers from Harvard University. The tool enables the tracking of the frequency of particular words or combinations of letters across over five million digitised books published between 1800 and 2000. However, access to the whole corpus of Google Books to carry out more sophisticated text mining research is restricted, and can only be obtained upon request.

Technologies based on the electronic analysis of large amounts of works are still in their infancy, and the possibilities they might open up in the future are largely unpredictable.

In the UK, copyright law provides an exception that allows researchers to make copies of works ‘for text and data analysis’. This means that where a user has lawful access to a work they can make a copy of it for the purpose of carrying out a computational analysis of anything recorded in the work. The exception applies under the following conditions:

1) The computational analysis must be for the purpose of non-commercial research
2) The copy is accompanied by sufficient acknowledgment (unless this is practically impossible)

The provision further specifies that copyright is infringed if the copy made is transferred to another person, or it is used for purposes different than those permitted by the exception (although the researcher could ask the owner for permission to do either of these things). Also, copies made for text and data analysis cannot be sold or let for hire.

Importantly, the provision states that the activities covered by the exception cannot be ruled out by contract. Contractual terms which purport to restrict or prevent the doing of the acts permitted under the exception are unenforceable.

Although text and data analysis is mainly concerned with mining literary works, the exception covers all categories of copyright works, and a parallel exception applies to recordings of performances.

Refernces:

http://www.legislation.gov.uk/uksi/2014/1372/regulation/3/made

http://www.jisc.ac.uk/publications/reports/2012/value-and-benefits-of-text-mining.aspx

http://www.nactem.ac.uk/

Leave a comment