The Publishers Association (PA) is the leading trade organisation serving book, journal, audio and electronic publishers in the UK. Join The Publishers Association today and start benefiting.

Content Mining Free For All Would Be Bad For All

Print
As the debate around fracking for shale gas in Blackpool reminds us, the consequences of badly managed mining can be severe.  Ok, so the data and text mining of publishers content may not lead to actual earthquakes, but the implications of not managing access would be damaging nonetheless.  The Hargreaves proposal to introduce an exception to copyright to allow content mining would create risks for the economy, whilst not even solving the problems it seeks to address.

Calculating the perceived benefits and costs of any policy proposal is a routine task for Ministers.  Normally, contentious issues can be resolved by asking academics to act as objective arbiters.  However, that is hard to do in this case, because the academy is itself a key protagonist in the debate.   In coming to its recommendation  on content mining, the Review drew heavily on the views of various strands of academia, most of which claimed that their vital research was being hampered by the lack of such an exception.  The process of requesting licences of publishers was too time-consuming, it was claimed, and so an exception would make life easier.

We should all be automatically in favour of anything which makes life easier – provided of course it doesn’t make life more difficult elsewhere.  But that is exactly what a blanket exception would do.  If publishers lost the ability to manage access to allow content mining, three things would happen.  First, the platforms would collapse under the technological weight of crawler-bots.  Some technical specialists liken the effects to a denial-of-service attack; others say it would be analogous to a broadband connection being diminished by competing use.  Those who are already working in partnership on data mining routinely ask searchers to “throttle back” at certain times to prevent such overloads from occurring.  Such requests would be impossible to make if no-one had to ask permission in the first place. 

Then there is the commercial risk.  It is all very well allowing a researcher to access and copy content to mine if they are, indeed, a researcher.  But what if they are not?  What if their intention is to copy the work for a directly competing-use; what if they have the intention of copying the work and then infringing the copyright in it?  Sure they will still be breaking the law, but how do you chase after someone if you don’t know who, or where, they are?  The current system of managed access allows the bona fides of miners to be checked out.  An exception would make such checks impossible.

Which leads to the third risk.  Britain would be placing itself at a competitive disadvantage in the European & global marketplace if it were the only country to provide such an exception (oh, except the Japanese and some Nordic countries).  Why run the risk of publishing in the UK, which opens its data up to any Tom, Dick & Harry, not to mention the attendant technical and commercial risks, if there are other countries which take a more responsible attitude.

It is not quite right to characterise the debate as being sharply between publishers and academics.  Some institutions – like the University of Manchester’s National Centre for Text Mining – are working collaboratively with publishers to help develop the tools and technology to improve content mining.  They see that a partnership can work through some of the complexities around licensing.  Publishers support content mining and are working to facilitate and improve it.  A recent pan-European study found that 90% of publisher respondents always grant permission or licences.  These are refused in only 12% of cases.  Publishers are not acting like aggressive bouncers, denying access to works and data.  Rather, they are like good maitre d’s  - helping people to get the most out of the available service, and ensuring that the engagement is managed and not chaotic.

The main problem with content mining would appear to be complexity with the different technologies involved and the fact that not all content can be mined from the same place. An exception would do next to nothing to solve these questions, and whilst it would clearly sort out the problem of complex licensing, it seems a bit extreme to cut the Gordian knot in this fashion.  Where the problem is one of complexity we should look to simplify and clarify, not eradicate.  Content mining is a massively exciting technology and will be a true game-changer in the field of academic research.  It will only realise its full potential if it is developed with sensitivity to the surrounding environment.

 


Written on Wednesday, 09 November 2011 10:14 by Richard Mollet

Viewed 2293 times so far.
blog comments powered by Disqus