Thoughts: Semantic versioning applications

A summary of Semantic versioning (see http://semver.org/ for more):

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

Above is the official definitions for the 3 integers in a semantic version number. I'm not going to talk about pre-release or build metadata labels (after the next paragraph). Also this document assumes that you are versioning your releases rather than releasing your versions which isn't better but makes this topic easier to discuss because it avoids release issues.

A snafu about patch is that people often use it for an additional meaning beyond the definition. The definition only names bug fixes but how do you track smaller changes? If I make a change that is not an incompatible API change, is not new functionality (from the client's perspective), and does not fix incorrect behavior then Semantic versioning can't track this change. Examples include removing dead code, updating the legal information, rewording text visible to clients, and updating versions of dependencies (only if invisible to clients). Pre-release isn't appropriate since the change could be released and build metadata can't be used since build metadata has no precedence. Which means that tracking this change requires either adding another integer to the version or changing the definition of existing integer(s). The common practice is to rename "patch" to "increment" changing the definition to "any change that isn't major or minor". For the rest of this document I will use increment with this definition.

Semantic versioning does not specify what this version number can be used for, it only provides a format definition with meaning and an order of precedence. So the whole point of this document is to talk about what semantic versioning can (or can't) be used for and what the meaning of "incompatible API" and "add functionality" are within a given context. There are many semantic versioning edge cases and while some may be covered the purpose of this document is to focus on what semantic versioning can be applied to. Additionally an API is generally only defined for expected behavior and therefore what happens when errors occur is not part of the API. Generally speaking an incompatible change means that what a client was doing is no longer supported and adding functionality means that a client can now do something that he previously could not (by these definitions changing a required field to be optional is a minor version).

Starting with the easiest I'll talk about the happy path that semantic versioning was likely made for: RESTful services. Specifically an entire RESTful service as a single black box with a defined API that clients use to make requests and receive responses. An incompatible API change occurs when the service makes a change such that a client is required to change (either request creation or response parsing) in order to continue getting the same behavior as before the service change. Adding functionality means that there is a new type of request, new optional fields in a request, or new fields in a response. An increment version would include removing dead code (since that isn't part of the API).

If the implementation of an API has a separate version number than the API and the API documentation does not use the same version as the API then the API can't make use of the increment version. Therefore it seems that semantic versioning was not intended to track an abstract API but rather an implementation that has an API (eg: why patch calls out "bug fix" which isn't possible for abstract APIs). Indeed the description of semantic versioning talks about packaged code (which would include implementation). In cases where the increment is lacking a placeholder 0 can be used so that the version can qualify as semantic (as long as major and minor meet requirements). In cases where the API documentation is tied to the API version number then updating documentation (such as rewording or correcting spelling mistakes) is considered an increment. In cases where the API implementation is tied to the API version number then things like refactoring or removing dead code is an increment.

Similar but more granular is semantic versioning for a single RESTful endpoint (request/response). The URI, HTTP method, and some of the headers (specifically Accept) are used to define a single endpoint. Examples of an incompatible API change includes changing the URI or request body. Now the interesting thing is: if I have a RESTful service that supports multiple types of requests each with a semantic version then what is the version of the service as a whole? Depending on the setup it is possible that the entire service doesn't need a version however let's suppose it does require a version because the service as a whole is a maven artifact. An incompatible API change for the service as a whole is the same whether the individual requests have semantic versioning or not. Whenever any endpoint increments a version the whole service does as well. If multiple endpoints are changing then the service increments a single number of the highest type used (major, minor, or increment). So if the Alice endpoint increases 2 major versions and Bob increases a minor version within a single release of the service then the service increases the major version once. Note that the service version generally won't match any endpoint version. Removing an endpoint is an incompatible API change (major version) and adding an endpoint is new functionality (minor version) to the service version.

What if a service supports multiple versions of a single RESTful endpoint? The endpoint version is handled normally and is agnostic to whether or not other versions of itself will exist. Removing support for a version of an endpoint is an incompatible API change (major version) and adding support for a version of an endpoint is new functionality (minor version) to the service version. If all of the endpoints are grouped into a single semantic version and a service supports multiple versions of that then the exact same principles apply (since the API of a service is the union of all supported versions of all endpoints). If any version that the service supports is increased then the version of the service is increased in the same way.

What about a server storing documents client side? The service mandates the file format of the document so it is considered the same as any endpoint. For example if a client is able to download his session and later upload it to regain the state he previously had then the upload is just like any other endpoint with the session file being the body. The same principle can be applied to saving files to databases or file systems if anything else has access to it. If your service has exclusive access to these files or database tables then it could be considered an implementation detail and any change would be a increment version. This is true if the document is JSON or a docx. In order to version the document itself rather than just how it is used continue reading.

What about a software library? All exposed code is part of the API (eg public classes). All private code and dependencies are not part of the API. Semantic versioning is used as normal.

What about a maven artifact that pulls in an API artifact? The main artifact is either a library or a service. Either way the service has a version number based on the API but separate from it. It does not matter whether the API artifact uses the same version as the version of the API itself. The API artifact is considered part of the service's API rather than an implementation detail like other dependencies are because it is exposed to be used.

What about something like a web browser? If the program only supports HTTP 1.0 then adding support for HTTP 1.1 is a functionality (minor version) and removing support for HTTP 1.0 is an incompatible change (major version). Likewise for HTML versions, CSS versions, etc. This is also true for browser plugins so the plugin API should have its own version for simplicity. This will lead to a browser having tons of semantic versions that it uses and a single overall semantic version with the overall semantic version not being very useful.

What about programs that support multiple operating systems? Nope, semantic versioning breaks down and can't be used as-is. There would be a core program with semantic versioning, an OS with semantic versioning, and a bridge that connects them which can't have semantic versioning. The core program's version includes the API to talking to the bridge and OS version is for the API that the bridge talks to. The problem is that there needs to be a bridge for each OS name and major version. So if the OS is Alice OS 10.1.2 the bridge can require at least Alice OS 10.0.0 and aggregate that together with the core program semantic version to get a semantic version so that the bridge version would be "Alice OS 10 bridge version 1.2.3" at a minimum. It would be a good idea to include the entire OS version to allow supporting more combinations. Of course it should also tell the user what the core version is. This setup allows the core program to be upgraded on each supported OS while maintaining unique numbers for the bridge. This same system is true for any adapter or facade such as a kernel which allows a shell to talk to the hardware: if there is more than 1 adapter then the adapter's name or version needs to state which thing and version of it that the adapter is adapting to.

What if the user can change the dependency versions as he desires? For example if an OS has semantic versioning when a user upgrades the version of a downloaded program then the OS version doesn't change because the OS version is the version of the API that programs use to talk to the OS.

What about programs like Microsoft Word where the user's document (in a standardized format) is also the primary purpose of the program? The file is treated like any other API in that the file contents as a whole has a version number (whether stored in the file or not) so if the program supports that document then the program edits the file without issue otherwise the program tells the user that the program can't open the file. Any file that can be edited can be saved in that same format or converted to a different file type. Since the file is standardized, complications occur when the same file is used by programs that support different versions of the standard format or if the file contains something that isn't part of the file version (see next topics).

How can a program account for a request for something that may or may not be supported? For example HTML 5 allows images tags to be .png or .jpg but a browser might not support both image formats because the image format is not part of the HTML version nor can there be any kind of image format version (although .png itself could have versions). The list of image formats support is part of the browser version but is not part of the HTML version. The browser version also includes the list of which .png versions are supported etc.

If multiple programs edit the same file and each program supports different versions of the standard format that the file is in, what happens? If the standard format is the same major version then the file uses the lowest version that supports the file contents. If the standard format major version number is different then it is treated as though it was an unrelated standardized format.

How much of the version number should be in the client's request or stored document? If it doesn't contain the major version number then the best the program can do is attempt the operation and if it fails tell the client "bad request/corrupted file or wrong major version number. Not sure which.". Additionally without a major version number it would be difficult for a service/program to support multiple major versions. If the client has a major and minor version number that is ahead of the service/program then the program can perform the function and return the result with a warning "You asked for X but I used Y. Therefore this is a partial response and is missing functionality that you might require.". This is likewise true for increment version with the warning "My response might contain a bug". If the client doesn't care about a feature or isn't tracking increment versions then he can simply send a 0 for those numbers. There are 2 reasons for a client to send non-0 version numbers: one reason is that if there are multiple versions of the service running (such as during an upgrade) then the request routing can make sure that the clients expecting the newest version go to the newest service etc. The other reason is so that the client gets a more specific warning or error message in the case of mismatch.

Does an increment version exist for stored documents? No. It could trigger a warning if the parser doesn't support that version however it doesn't make sense for the document to say "if you don't have this version then your parser has a bug". Unlike an API increment version there isn't any internal implementation details so I can't name any way for the format to have an increment version. For example the specification for YAML had things added to it in a backwards compatible way so major and minor numbers make sense but an increment number doesn't.

Can a computer language have semantic versioning? The language syntax is the same as any other file so no: only major and minor. The compiler, IDE, etc can have full semantic versioning.

Can a human language have semantic versioning? Not quite (even for synthetic languages). Incompatibility could be defined as "does this retain the same meaning" which would mean that any word or grammar that is dropped or changed would be incompatible. New words and new definitions to existing words could be added as "new functionality" but an increment number doesn't make sense. Grammar rules can't really be changed since that likely invalidates previously legal statements. For natural languages versioning of any kind is impossible due to the chaotic definition. A synthetic language could use versioning but would just use major and minor.

Can hardware have semantic versioning? No but I don't know enough about hardware. If switching from 1 Microsoft mouse to a newer one can use the same drivers etc then it wasn't a major version change. But I don't think that's possible: everything has unique drivers. While "new functionality" is subjective there isn't any type of increment version number that would make sense. For devices that don't connect to others like a non-smart digital watch then there's nothing to be incompatible with (watch battery type?) and the versioning breaks down. What about a standard port like USB 3.0? If the name was instead 1.3 then it would be nearly semantic but there's still no increment version so it isn't semantic.

Can a game like Dungeons and Dragons use semantic versioning? Not quite. Expansion books could be considered added functionality but there's no order and any combination of them can be used so the version of the core rules can't use them. Considering only the core rules a player character sheet could be used to test incompatibility obviously a breaking change occurs when the decisions made on a character sheet are no longer legal. But what about game play? Generally if any of the rules for a game changes then it isn't the same game anymore which is especially important for games like D&D where there's plenty of time for the game to change between sessions even though the same character sheet is used again (albeit altered). But that criteria is very strict and while it is possible to make an additive change, most changes would be breaking which would make semantic version useless. A more useful numbering would be more like major version for "very serious change that you likely can't use the same character sheet at all" and a minor for "every other change (might require character sheet changes)" which is what D&D did with 3.5 edition (granted it should have been 3.1) but isn't semantic.

Can a printed novel like Lord of the Rings use semantic versioning? No. The content of the book is informational and that information can't change without it being different information and therefore everything would be an incompatible change. Realistically books are allowed to reword small things between editions of a book which sounds like justification for a major version number rather than being considered a whole new book. Spelling mistakes could be considered an increment version number but "adding functionality" has no meaning to books. This is why books use a single number which is edition along with descriptive text like "hard back", "pictures are printed in color", or "includes a map".

Semantic versioning also can't be used for things like blueprints for a book shelf, cooking recipes, telescopes, vehicle safety standards, governmental laws.

To summarize semantic versioning can be used for all software except adapters and can't be used for things that aren't software (not even software specifications or abstract APIs).

Thoughts

Sunday, October 1, 2017

Semantic versioning applications

No comments:

Post a Comment