The European Commission has adopted the Attribution 4.0 International (CC BY 4.0) standard license to make information they publish online reusable by the public. But why do we need licenses and how do we use them correctly? What impact could the application of a standard license have for language data sharing? The ELRC consortium has asked Dr. iur. Pawel Kamocki to help us understand the underlying implications.
ELRC: Pawel, thank you for agreeing to this interview. Before we ask you about the CC BY 4.0 license itself, can you explain why we need a license when we want to reuse information that has been made openly available by the European Commission or other entities online?
Dr. iur. Pawel Kamocki: While raw information itself should and theoretically is free, its expression, or the data in which it is embodied, can be protected by intellectual property rights. These rights are in fact similar to traditional property: if you own a physical object, you can prevent others from using it, and sue anyone who does so without your permission. Intellectual property is the same, but - and this is what makes it both complicated and fascinating - it’s about immaterial goods: things of value that exist independently from their material support.
So, as I said, in principle, with a few exceptions, the use of IPR-protected content requires permission from the rightholder. This permission is granted in a document called ‘a license’. Licentia means ‘permission’ in Latin.
ELRC: To whom can such a license be granted?
Dr. iur. Pawel Kamocki: Usually, such a license is granted to a specific person or entity. However, a license can also be granted to the general public, i.e. everybody who has access to the content. The latter type of licenses is called ‘public licenses’ (although I really like the German term Jedermann-Lizenz). They were first developed with software in mind (we have all heard of the GPL, General Public License). At the very beginning of this century, the Creative Commons Foundation developed a series of public licenses for creative works, called… Creative Commons and maybe more commonly known as CC licenses.
The latest version of these licenses, the 4.0 version, covers not only copyright, but also related rights, such as the sui generis database right, which makes them a great tool for licensing of digital datasets in the European Union.
ELRC: Could you explain (in one short sentence) what the sui generis database right entails?
Dr. iur. Pawel Kamocki: It’s an intellectual property right similar to copyright that was created by the Database Directive 1996 to protect investment in producing a database. I explain it in more detail in an article published in one of the recent ELRC newsletters (http://lr-coordination.eu/node/969).
ELRC: Coming back to CC licenses: what is their unique benefit?
Dr. iur. Pawel Kamocki: The idea behind Creative Commons licenses is simple: to grant everyone permission to use the work and thereby shift from the traditional ‘all rights reserved’ logic to ‘some rights reserved’. It is important to keep in mind that CC-licensed content is still under copyright, but a broad permission to use it (at least in a certain manner) is granted up front. The use of the work, however, is still subject to some conditions, the violation of which amounts to copyright infringement.
ELRC: Can you explain the main characteristics of the “Attribution 4.0 International” license (CC BY 4.0)?
Dr. iur. Pawel Kamocki: BY or ‘attribution’ is the fundamental condition of all CC licenses. It is commonly believed that all that it requires is to mention the source. However, the attribution requirement under CC BY 4.0 goes further than this.
In short, the CC BY 4.0 license allows everyone to re-use, share and modify the licensed content, provided that:
- the creator of the work is identified;
- any other person or entity designated by the rightholder to receive attribution is identified (e.g. the funder);
- the copyright notice (if present) is retained;
- the CC BY 4.0 license is referred to, preferably with a URL;
- if practicable, a URL to the original work should be retained;
- if the content was modified, it should be indicated, too.
So, a proper attribution notice should at least look like this:
This work was created by P. Kamocki and is available under a CC BY 4.0 license.
ELRC: That does not sound like a very permissive license after all. Is the CC BY 4.0 an open license?
Dr. iur. Pawel Kamocki: CC BY 4.0 is by all accounts an open license, as it meets the criteria set forth in the Open Definition. Actually, only two CC licenses: CC BY and CC BY-SA are open licenses.
It should also be noted that no additional conditions or restrictions can be imposed on CC-licensed content. Therefore, anyone who shares CC-licensed content saying that it can only be used by institution x actually violates the license and infringes on the rightholder’s copyright. That said, modified versions of CC BY 4.0-licensed content can be shared under any conditions, including as ‘all rights reserved’. Only the SA (share-alike) requirement, for example in the CC BY-SA license, creates an obligation to share modified content under the same license.
ELRC: What are the main changes to the previous (not international) version of this license and what are potential weaknesses of the license?
Dr. iur. Pawel Kamocki: Most importantly, the previous versions did not cover the sui generis database right, which meant that they did not provide for an appropriate level of legal security in the EU. In fact, a bona fide user of a dataset licensed under a CC BY 3.0 license could theoretically still be sued for infringement of the sui generis database right.
Secondly, previous versions of CC licenses had many national versions, called “ported versions”. These versions were not only translated, but also adapted to local law. Oftentimes, some rather far-reaching choices were made in the adaptation process. And since in legal matters every word counts, we ended up with rather substantial differences between for example Dutch and Belgian versions of the same license.
Now, imagine you want to use a dataset licensed under a Dutch version of the license. You know it may not be identical to the German version, so you should probably make an effort to read it first. But what if you don’t speak Dutch…? As you see, the porting process had some adverse consequences, and this is why porting for CC 4.0 licenses is not authorised.
ELRC: The Commission has adopted the CC BY 4.0 license as a new standard license for the reuse of Commission documents. Is this concept easily transferrable to the reuse of Public Sector Information made available by national public administrations as well?
Dr. iur. Pawel Kamocki: There is no doubt that CC BY 4.0 is currently the best tool for those who want to make digital datasets openly available. So, the Commission made a wise choice.
However, this choice is not binding the Member States, some of which have their own, long-standing traditions when it comes to making Public Sector Information available for re-use.
Some countries, such as Poland or Germany, dedicate a fair share of Public Sector Information to the Public Domain. Public Domain material is by definition not protected by intellectual property rights and therefore cannot be licensed at all, it can be freely used by anyone for any purpose.
Many other countries have their own licenses, more or less inspired by or compatible with CC BY. I can think of France, Norway, and above all the UK. In the UK, public sector information is protected by copyright belonging to the Crown (so-called Crown copyright). It is then made available under a license called Open Government License.
Ireland, for example, endorses the use of CC BY 4.0 for Public Sector Information (Circular 12/2016 of the Department of Public Expenditure and Reform), whereas France, probably for fear of ‘americanisation’, actually… prohibits the use of CC licenses (only Licence ouverte and Open Database License are allowed by art. D323-2-1 of the Code des relations entre le public et l'administration).
Undoubtedly, some more harmonisation at the EU level would be welcome, and the official endorsement of CC BY 4.0 by the Commission is a step in the right direction.
ELRC: In what way can making Public Sector Information available under the CC BY 4.0 license, for example in Open Data Portals, help to make language data sharing easier between national institutions but also across borders?
Dr. iur. Pawel Kamocki: As mentioned above, CC 4.0 licenses cover not only copyright, but also the sui generis database right, which makes them a perfect tool for sharing digital datasets such as language resources (LR).
Unlike their previous versions, CC BY 4.0 will not have ‘ported’ versions. This indeed makes them perfect for international use.
The use of popular, internationally recognized tools can considerably reduce transaction costs related to the sharing of LR both at the national and the international level.
ELRC: You made this rather complex issue much clearer. Thank you very much Pawel!
Pawel Kamocki was trained in both law and corpus linguistics; he holds a Dr. iur. degree from the University of Münster, as well as a docteur en droit degree from Sorbonne Paris Cité. Pawel is working for ELDA as a Legal Issues Expert.
The interview was conducted by Lilli Smal (DFKI) who is part of the ELRC consortium.
 Cf. European Commission: https://ec.europa.eu/jrc/en/news/commission-makes-it-even-easier-citizens-reuse-all-information-it-publishes-online, last accessed: 12 June 2019.