go to UOP homepage © Copyright 2012–2020
Anthony D. Dutoi
All rights reserved.


Journals & Databases UNIX/Linux Programming Presentation


Parallelism Version Control Python Notes Open Licensing

Disclaimer: I am not a lawyer, and this is not legal advice. This is meant as a practical orientation. My hope is that this gives an easily digested way of thinking about open licensing for academics. It is the obligation of the reader to verify that all of this is technically correct before acting on any of it. For the GNU General Public License (GPL), there is an excellent FAQ on the GNU website here (it is long, but easy to parse/read).

Why Open Licensing? (the agnostic's case)

You have probably heard noble arguments in favor of the open software movement. These often leave the lukewarm pragmatist little motivation to join. Yet, in spite of that, so many of us do it anyway, often with an everybody is doing it or an it seems like a good idea attitude. Here is an articulation of what is often left unsaid about why so many people are driven to open licensing, even if they are not principled open-source enthusiasts.

Open licensing facilitates collaboration by breaking down barriers of mutual mistrust, which arise naturally among parties that have no prior rapport, such as collaborating (sometimes simultaneously competeing) academic groups. Among programmers that use and abide by the GPL (for example) to facilitate code exchange, it basically assures everyone that no one can make money off of anyone else's code in some way that locks the original copyright holder out of doing the same. This is the minimum thing that most of us want. We do not want to feel like a sucker later for having given away something for free. As a (somewhat) academic point, one is not forbidden from making money off of GPLed software, even if you don't own it (but GPL is maybe not the best choice for your code if money is the motivation).

The operative question then is this. Do I want to be part of the open-source community for some given project? Programmers should be able to work as programmers for money, and no one should begrudge anyone the money they make writing software (assuming they play ethically/legally). There is a respectable case to be made for selling your latest and greatest code as a black box. But there is a downside to this for most of us. We have day jobs, which does not leave enough time to find investors, retain lawyers and marketers, act as customer service reps, etc. Is your code really that brilliant, or is it more like a single optimized cog in the grand flow of gigahertz signals? To mix up the metaphor, if you are a single cork bobbing in the ocean, it is not better to find something bigger to latch onto, if you want to go anywhere? Now, enter a community of brilliant professional coders whose side-projects (and sometimes main projects) are available to build on and link against (even borrow pieces of), if you are willing to let them do the same.

So that is the pragmatic decision tree with two branches. Go all-in for the glory and the riches, and take a bigger risk of getting nowhere, or forfeit the money and hope that you can at least still get a little glory out of it, if your stuff works, which is now more likely.

Overview of the Rules

So, if you are convinced, here are some of the rules, demystified. I'm not saying there are not other great explanations out there (e.g., the FAQ linked in the disclaimer), I'm just saying I like mine better, for now. It is a matter of concrete context. I am explaining how I understand the license in the context of people like us, who do not work with discrete releases to the general public. Down here in the academic trenches, where GPL is more vital than ever (for the reasons above), we don't work that way, and we can't. We share half-finished versions of our work with collaborators, whom we may or may not trust. For all intents and legal purposes, this makes them (part of) the public. Notwithstanding that no effort has been made to post fully functional code for all the human race to use (well, those with computers and internet), the work has thus been distributed/released/conveyed, and we can make use of the same license protections as do more well-oiled open-source projects when they release to the public at large.

Theory

The most important thing to realize is that, every time your code is downloaded by a member of the public, this constitutes a conveyance. As I understand, at least according to the GPLv3, this also applies to changes pulled from a repository into a previously obtained full copy. Conveyance is GPL-ese for what you might think of as a release, regardless of the lack of any version numbers, etc. So it is important that every such download/update/release (we will use release from now on) is properly licensed. Practically, this means keeping the copyright and license notices in your repository always up to date. This can be done by habit as you go, like so many other things (e.g., code comments). Assuming the work is a coherent whole, and not a bunch of disparate unrelated files, you want to license it as a whole. The idea that a license applies to a whole package is legally important for the GPL, though we will not get into details here.

The next most important thing to clear up is that copyright notices and license notices are completely separate things. This basic legal stuff, regardless of what license is in question. As a preface, for a given piece of code, the copyright holder(s) and author(s) can be different. The copyright holder is the owner. Copyright can be reassigned, bought, sold, seized, etc., and, as with other forms of ownership, sometimes people are mistaken about who owns what. I have to leave you on your own to sort out if (or when) you own the code that you write. From now on, I will not refer to authors, but to code owners. Just like other ownership, the owner has the final say about what can be done with any code.

The owner of a piece of code can license it to someone else for use, and that word use is where things get legally hairy. Therefore, a license defines exactly what this means, that is, it defines what the licensee can (or cannot) do with the code. An important point here is that nothing about ownership changes upon licensing. The owner of a piece of code retains a special relationship with that code and can do more with it than any licensee (unless the license is so permissive as to be pointless). The GPL license, for example, does not place any restrictions at all on what you can do with your own code. You can even also take payment for allowing it to be incorporated into closed-source code, which is something that your GPL licensees are not allowed to do (but you cannot revoke their GPL license; more on this below). Exclusive-use licenses, by contrast, can restrict your rights as the code owner, in that you agree not share your code with further parties.

Finally, let us review the most fundamental aspects of how the GPL (and other open licenses) bring about their desired effects. There are two senses in which open code is open. The sharing is open (cannot be restricted to a closed group), and the source code of any such shared code is open (cannot be hidden). Firstly, a licensee of GPLed code has the right to share that code or any modification of it with anyone. But, secondly, the so-called viral clause of the GPL places restrictions on that sharing, which ensure propagation of both kinds of openness. Your release of anything that includes GPLed code from elsewhere must include the source code of the whole, both your part and the parts you got from elsewhere. Your release must also be under the same GPL license as the GPLed code that you have made use of (or possibly a later GPL version, if allowed; things can get difficult or impossible if there are multiple sources and multiple licenses involved). This then confers the same sharing rights and open-source obligations to all licensees of your modification/combination. There you have the checkmate, if you distribute derivatives of GPLed code, it must be open-source and re-sharable, so any modifications you make can get back to the original owner. If you would try to make money selling it (and you can try), the owner of the code that you built upon can get your changes and sell the exact same thing (which is fine, except that it is now all open anyway).

In a sense, including GPLed code from elsewhere infects your code with openness, if you release it. It is an important nuance that simply modifying GPLed code does not obligate you do to anything further. For example, the owner of the original code cannot demand access to your private modifications. But, if you want to make your modifications public in any way, then you must release them openly. There is legal wiggle-room (which I will not attempt to delineate the exact borders of) for co-workers secretly sharing modifications of GPLed code that they do not own, perhaps half-baked attempts at extensions that are intended for release upon completion. This works because the basis of that sharing is trust, as opposed to a license agreement between unaffiliated parties. This then automatically precludes selling the product as closed (in either sense) under such pretenses because that wouldn't be to co-workers.

Practice

   GPLing your code

Every file that you, the owner(s), will release should contain a copyright notice and a license notice. The copyright notices establish ownership claims, perhaps by multiple collaborating owners. (Similar to erecting a fence or planting a flag, these are not technically necessary, nor do they themselves legally establish ownership, but if you do own something, it is foolish not to at least mark your territory.) The license notice then informs the person who reads it (no matter how they got it) of the conditions for use. The validity of the license notice is contingent on the correctness of the copyright notice. (You have to actually be the legal owner to do the licensing, or else the license is void.) For the GPL, the license notice states that the full license agreement itself should have been provided to the licensee separately. It never actually says where to find it, but the presumption is that you will inform the licensee (or make it somehow obvious, usually in a file named COPYING in the project root directory, which is then also referenced by a file named README). Regardless, you need to include this. How can you expect someone to obey given terms if you do not tell them what they are? There is an authoritative guide to how to GPL your code here.

   Modifying a GPLed project

Let's start with the simplest situation. Assume you have obtained a release of some GPLed code from someone else. Say you will keep your copy on your computer(s) only, modified to your liking or merged with your code. In theory, you could even delete all the copyright/license notices (it's a horrible idea which risks causing you to make legally consequential mistakes, but it emphasizes a point). More conventionally, you can produce whatever data you like with this code, and you own the data (the only exceptions being if the program dumped fixed-content copyrighted material into the files, but then you could delete that and own the whole file). You can publish the data and, legally, you do not even have to give credit to the package (though you will make and keep more friends if you do). You never have to let anyone else even see the modified/merged code. This is all fine, as the license does not demand that you share your changes, not even with the owners that licensed the code to you.

   Releasing modifications to GPLed projects

Before we get to contributing to your collaborators code, it is instructive to consider what you would have to do to release your modified versions to yet a third party. First, let us address an important ambiguity head-on. There is no clear line between distributing a lightly modified version of an original (or combination of GPLed originals), at one extreme, or distributing code that is almost wholly your own, with a few borrowed GPLed files. Therefore, one mechanism needs to be sufficient to cover all cases, and the latter case is simpler (and more drastic from the point of view of intellectual property).

Let us first assume that you do not modify any of the few files that you will borrow for your project. You leave the copyright notices and license notifications in those files alone; this is not just GPL stuff, it is also copyright law. With respect to the GPL license under which you obtained other's code, you do not get to relicense the code, as it does not belong to you. You might be releasing your code under GPLv3 and the code you have was released under GPLv2 (with a provision to allow inclusion in releases under later GPLs). You cannot change their license to GPLv3 because that is not your decision to make. The best policy is just to be explicit about where code came from and what license it carries. You are from here on getting my opinion on good practice (and there can be legal ramifications to the practice you choose, which only a judge can sort out). Try to collect files from each borrowed project together, put them in separate directories with README files written by you that explain where the borrowed files came from; point to their original README and COPYING files for details on the license granted to you (and all future recipients) by the owners of the borrowed code. If you want to be a good actor, acknowledge the authors of the package directly in these local README files (in addition to preserving the copyright lines in their files). Even better, acknowledge their contributions in the top-level README file of your whole project as well (stating the contributions and their locations). To relieve you of some stress, it is my understanding that the overall code needs no copyright notice at all; these are file by file, so you will not need to put any lines at the top level that appear to give others copyright claim to anything but their files (copyright claims must not be ambiguous, or they are useless). If collecting stuff together like this is aesthetically unappealing to your code structure, put wrappers to borrowed functions where they logically belong in your code.

You can extrapolate from the above, that, also in the other extreme, of just modifying someone elses code, it is best to keep your code in separate files, to the extent possible. Of course, it makes no sense for that to be completely possible or, else you don't need their code at all, but let us first pause to consider the other end of the process, the release. This is now your release, just as if you had only borrowed a few files for something that was mostly under your ownership. Only, this time, if you want to be a decent human being, in the top-level README, you will start off with something like This is a modified version of ..., originally introduced by ..., with the following added capabilities. You can even use the same package name, though this may or may not be appropriate. In some way, shape or form (which is not clearly specified), it needs to be clear that this is a derivative work. That is part of the GPL conditions, and it protects the reputation of those brilliant people whose code that you have (possibly sloppily) hacked.

Now how do you handle files that you don't own that you must modify? You know this is can get arbitrarily messy, right? The idea is to make it as (legally) clean as possible. In the copyright line(s), all information that was there when you got the file should remain unchanged. There are several conventions for claiming copyright to your contributions (doing this in the same line or another), but what you need to add is your (or the owner's) name associated with the years in which changes were made. You need to include this information even if you do not want to claim copyright on the changes (then just state that you modified the file). Then it is good to mark off your modifications. You might be tempted to just not mark them, if you will not claim copyright, but that is not fair to the original author. Marking changes is where things get crazy though. What license should you reference, clearly the license under which you will distribute, right? But what if the source is under another license, and you can't modify that? The best strategy is to change as little as possible, mark of your changes and state that they are in the public domain. Now you don't have to worry about licensing anything; GPLed code can include as much public domain code as it wants. If you need to do something clever that you want to claim copyright on, try to wrap it up in a function and put it in a different file, this will likely lead to better (more flexible) code structure as well. Failing that, move only what you need to heavily modify to a new file and be super explicit about what used to be there; perhaps copy in the entire original code and comment it out, followed by your code, and state (at the top of the file) the orignal relevant copyright and license for the parts that belong to the original. As long as you are operating in a good faith attempt to leave other's claims in tact, it is likely that no lawyer will ever have to tease the code back apart. If even that doesn't work, try making friends with the original author and see if you can work something out (i.e., agreement to joint ownership of the file with no internal demarkations, to be released under your chosen GPL version). Beyond that, find a lawyer who is also a programmer because I can't help you.

   Contributing to GPLed projects

Now, with those cases handled, it is easier to see the generalization when you want to contribute to someone else's project. Now the third party is simply the same as the first. If you clone someone's repository (with whom you clearly have some working relationship), then when you push back your changes, you are releasing your modified project (in entirety) back to them. With your copyright and license notifications in place, your intellectual property is every bit as protected from their lack of scruples as they were protected from yours. As I read it, the requirement that you make clear that your copy of their code is a modified copy (and then vice versa) is satisfied by the distribution mechanism. I'm assuming that some orderly version control software is keeping track of versions for you. If not, then you will have to keep track of all of this manually anyway.

Although you and your collaborator are considering each other as members of the public for purposes of exhanging code, you will clearly have a working relationship with more supplemental agreements than what is found only in the GPL. There are limits on what those agreements can be (neither of you can make the other promise not to distribute GPLed code further), but you can probably agree on shared authorship of a few files, to avoid having to to fence off things to the point where no one can move. For example, if you and your collaborators are the sole contibuters to a couple of files, you could just put them in the public domain even, since these are probably just interface files that are useless on their own. In short, you have to be able to figure this out yourselves or you have no business collaborating. What is important here though is that you can both expose your most sensitive and creative secrets to each other in your pre-exisiting domains (before you got together), without having to worry that the dish will run away with the spoon and eat the pudding without you. Clearly also, if you are true co-workers operating with a deeply based or mutually vested trust or a non-disclosure agreement, none of this is relevant, since you are not using the GPL as a legal agreement between you (though you may be using it between yourselves and the rest of the world).

   Changing your mind (sort of)

An important part of the forgoing is that all copyright owners retain ownership to their code, regardless of how it has been shared. Contributing your code to an open-source project, or open-sourcing your code, does not place any obligations on you with respect to what you can or cannot do with your code in the future. As an extreme example, the owner of any code included in a GPLed release can even relicense it (and wholly owned modifications) as closed source to another party. Open-source proponents frown on this, but that is the law (they can't legally get openness and also apply such restrictions). Of course, modifications by others would have to be stripped out, since you cannot relicense their code. That said, you cannot revoke the rights you granted to others when you GPLed your code; once it is out there, it is out there.