Actually I did have to face palm particularly hard in the PGN spec section 3.2.3 Speed of processing which states that programs can "easily scan PGN data without having to have a full chess engine or other complex parsing routines" aaaarg it still hurts to look at. How could it even say such a thing? Was this part written before the writer knew what SAN was and this section wasn't updated afterward? Parsing (and writing) SAN ALWAYS requires a complex parsing routine in addition to a full chess engine (AI, GUI, etc not required). To extract just the coordinates from SAN requires this regex:
/^[QBNRK]?([A-H]?)([1-8]?)[x-]([A-H][1-8])(?:=[QBNR])?[+#]?$/i
which is far more complex parsing than Long Algebraic Notation (LAN) which would be (in javascript):
move(text.substr(1, 2), text.substr(4, 2))
assuming the pawn was padded to have a symbol (MCN is even more simple).
PGN also violates a couple of its goals. Mainly 2.2.1 because SAN is designed to be unnecessarily complex. There's also a problem with goal 2.2.3 because it doesn't allow any move text format except SAN (actually there are 2 others: FEN and PGC but the support is lacking as explained later).
There's also a list of things to nitpick. Section 6 defines a completely useless token "%" which does exactly the same thing as an already existing token ";" (semicolon) which is for a single line comment. There was no reason to reserve <> (in 7) since stray character are invalid anyway. Illegal moves are not allowed by SAN but PGN itself shouldn't care. (Petty paragraphs like this are not the reason this blog post is 5ish pages in length excluding formal definitions.)
VGN: Variable Game Notation
Based on PGN, this is a game notation specification which is different than move text notations. Although PGN is almost usable there is 1 really important difference that is required: the ability to use move text notations that aren't SAN.This is done by the use of a new tag named MoveFormat eg: [MoveFormat "FCN:Friendly Coordinate Notation"]. Note that it is called MoveFormat even though FEN and others describe the board instead of the moves. The first value is what the program reads to determine which format the move text will be in, the second value is the human readable name and is not used by the parser (there are only 2 values). Because the second value is ignored by the parser it is optional. If the tag doesn't exist then the default is SAN for the sake of compatibility with PGN. In the event that multiple notations use the same acronym then number them such as: SAN, SAN2, SAN3 a parser should make public which formats are supported and how the duplicates are mapped. Multiple versions of the same format should be treated in the same way: numbers should be used (if multiple versions are supported) and make public how they map.
If the move format is a binary format (instead of plain text) then the MoveFormat tag must be the last tag and the encoding will start immediately after the closing square brace, no plain text comments of either kind are allowed after it and there is also no end line. Some example binary formats are BCCF, BCFEN, and PGC. After a binary format is terminated the parsing goes back to plain text, if there is another game to follow it is a good idea to use some end lines to separate the games so that a human can more easily find the plain text games in a file (although it would be better to segregate the games into 2 files for binary and plain text). Each binary format specifies its own game termination marker.
Binary move text sections are required to have at least 1 byte for the game termination marker (just like plain text). A binary move section is also required to be divisible by bytes for the sake of sanity. If a binary format is not normally divisible by bytes then it must contain 1 to 7 padding bits after the game termination marker. VGN parsers will send data to the binary parser byte by byte (which is one reason why it needs to be as a whole divisible by byte) so it would be a good idea for the binary format to also be divisible by bytes.
In PGN the SetUp tag always requires the FEN tag so a parser can ignore the SetUp and detect FEN because SetUp was 100% redundant/useless. For VGN the SetUp tag contains the board string directly. The first value of SetUp is the move format abbreviation and the second value is the board text. If the first value is omitted SFEN is used (for brevity and FEN compatibility). The board string may contain end game markers (the move text kind such as +# not the game terminators such as *) in which case the move text section will be empty (with only a game terminator). Note that even without this tag it is possible to have a game without any moves.
Minor things. % is no longer a symbol use semicolon instead (a semicolon is also valid PGN). Each format has its own parser but in all cases stray text is considered invalid (this includes < and >). Since the Result tag wasn't being used by parsers it should contain human readable information (such as "black ran out of time", "white surrendered", "Stalemate", or "White wins") instead of the game termination symbols (however the game termination symbols located at the end of the move text section remain the same and are still required). The Date tag is the start date. There is also a tag for EndDate, if omitted it is assumed to be the same as the start date. VGN does not disallow illegal moves, it is the move text format to decide that, and they should all allow it (SAN has a limitation of not allowing illegal moves thus making some games impossible to record in SAN).
About PGN move text formats. Earlier I said that PGN had partial support for notations that are not SAN. What PGN does is change the extension type of the file. ".pgn" is for normal games (that use SAN) ".fen" is for FEN and ".pgc" for PGC. While it is a good idea to separate binary from plain text it isn't necessary to require it. But more importantly using the extension type is bad design since not all of the information a parser needs is included in the file's contents. This is especially problematic for tools that allow text to be pasted since there will be no file extension at all. Additionally with many types of supported move text formats there would be many extension types all tied to the same program (your Operating System will require a lot of file type associations). Tags already have information about the game, the format should be located in a tag. Therefore instead of all of this VGN has only a single extension type which is ".vgn" and has the move text format included in a tag.
Which brings up a good point: VGN should indicate itself within the file contents. The tag is named GameFormat with the value defaulted to "PGN" for compatibility. It has 2 values just like the MoveFormat tag: [GameFormat "VGN:Variable Game Notation"] with the second being human readable and parser ignored. Note that this tag is required for each game (not once per file but once per game). If the file contains at least 1 VGN then the extension should be ".vgn"to indicate that a PGN parser can't read it but a VGN parser can. Note that having both FEN and SAN in the same file requires VGN since PGN doesn't have the MoveFormat tag.
VGN is very similar to PGN and is mostly backwards compatible. So you might ask "Why specify a whole new format? Why not simply call this PGN version 1.1?" because I don't own PGN. If I was to call this PGN 1.1 then later the owners of PGN may come out with an official 1.1 then my format would be incorrectly labeled.
More random stuff. White's move number is still optional and should have a single period; black's move number (if present) should be the whole move number followed by ".5" this is shorter than 3 periods and intuitively matches the half moves. Since I'm actually writing a program to parse VGN I've simplified a few things from PGN: comments can only appear before or between tags, before white's move number, or after black's NAG (PGN was unclear about where comments could exist since they weren't defined in the incomplete formal definition section). NAG can only appear after the move.
I ought to rewrite the Numeric Annotation Glyphs (NAG) entirely. NAG should be composite (taking multiple arguments) so that the NAG list doesn't need every possible commentary but only some formats. For example PGN NAG 26 is "White has a moderate space advantage" it should be more like $5.1.2.1.1 where 5 is the pattern "{0} has a {1} {2} {3}", 1 means white, 2 means moderate, 1 means space, 1 means advantage. Another example: $5.0.3.2.0 means "Black has a great piece disadvantage" (which should really be reworded as White's advantage). The text parsing would be pretty much the same, the look up table would be more complicated (but not unbearably so), the NAG list can be shortened and simplified since parameters can be reused, and it would be easier to translate into other human languages. Composite NAG also allows things that can't be otherwise done such as $1.1.300.20 to mean "White has 300 seconds and 20 milliseconds of time left". But I haven't finalized all the details or thought about how to improve or simplify it. Composite NAG is just an idea, for now VGN (1.0) will use PGN's NAG as is.
There should also be NAG to represent "only legal move" (7 and 8 are not this). There should also be NAGs for "ended turn in check" (which is legal in timed games) and "illegal move".
For the sake of versatility VGN can also support chess variants and fairy chess with the tags named "Rules" and "Pieces". For the Rules tag take every rule of normal western chess and number them (the normal rules should occupy the lowest numbers) then make a numbered list for every other rule supported. The Rules tag's values would be a list of which rules are used and there should be some short hand for starting with the normal rules. A number base above 10 could be used for compression since the value won't be human usable anyway. For the Pieces tag the normal pieces should use their normal symbol and there should be a short hand for starting with the normal pieces. The other supported piece types need to be labeled in some way. I'm not sure how pawn promotion would work but I'm thinking in the Pieces tag the symbol should start with an "=" (equal sign) to show that pawns can promote to it. Also note that without the SetUp tag the board will be the normal starting one and will not contain any special pieces. I'm a bit vague in this paragraph because my parser doesn't support it so anyone who wants to be able to do so will need to work out the details.
VGN Formal Definition 1
(* Using Extended Backus–Naur Form. And yes I realize the blog isn't wide enough for this. So paste it somewhere else. *) (* Note that only required white space is included, there are many places with optional whiteSpace *) (* Special characters in ?? were borrowed from Augmented Backus–Naur Form *) vgn_file = {vgn_game}; vgn_game = tag_section, (move_plain_text_section | move_binary_section); (* Unfortunately the binary section will cause false positives for validation since invalid plain text will match as binary. *) comment = ('{', {? VCHAR ? | ? WSP ? | ? EOL ?}, '}') | (';', {? VCHAR ? | ? WSP ?}); (* block comments can have '{' but not '}' *) (* EOL is any kind of end line *) tag_section = {{comment}, tag_pair, {comment}}; (* The tag section for a binary move format can't end with a comment or whitespace. *) tag_pair = '[', identifier, ? WSP ?, tag_value, ']'; tag_value = '"', {? VCHAR ? | ? WSP ?}, '"'; (* the value can't have '"' unless preceded by '\' *) game_termination = '1-0' | '0-1' | '1/2-1/2' | '*'; numeric_annotation_glyph = '$', digit, {digit}, ? WSP ?; white_move_number = digit, {digit}, '.', ? WSP ?; (* Can't be '0.' and must be in order *) black_move_number = digit, {digit}, '.5', ? WSP ?; (* Can't be '0.5' and must be in order *) (* move = ???; This syntax depends on the move text format *) move_element = [white_move_number], move, ? WSP ?, {numeric_annotation_glyph}, [black_move_number], move, ? WSP ?, {numeric_annotation_glyph}, {comment}; last_move_element = [white_move_number], move, ? WSP ?, {numeric_annotation_glyph}, [[black_move_number], move, ? WSP ?, {numeric_annotation_glyph}], {comment}; recursive_element = move_element | recursive_annotation_variation | comment; recursive_annotation_variation = '(', {recursive_element}, [last_move_element, {recursive_annotation_variation}, [game_termination]], {comment}, ')'; move_plain_text_section = {comment}, [ {move_element | recursive_annotation_variation}, last_move_element, {recursive_annotation_variation} ], game_termination; (* Note that comments that appear after game-termination are considered part of the next game (even if there isn't one). *) move_binary_section = ? OCTET ?, {? OCTET ?}; (* OCTET is any byte *)
VGN Formal Definition 2
//in JavaScript. Paste into a syntax highlighter for best results. var comment = '(?:\\s*(?:\\{[^}]*\\}|;.*)\\s*)'; var tag_value = '"(?:\\\\"|[^"\\r\\n])*"'; //tag_value can't contain end lines var tag_pair = '\\[\\w+\\s+' + tag_value + '\\s*\\]'; var tag_section = '(?:\\s*' + comment + '*' + tag_pair + comment + '*\\s*)*'; //The tag section for a binary move format can't end with a comment or whitespace. var game_termination = '(?:1-0|0-1|1/2-1/2|\\*)'; var numeric_annotation_glyph = '(?:$\\d+\\s)'; var white_move_number = '(?:[1-9]\\d*\\.\\s)'; var black_move_number = '(?:[1-9]\\d*\\.5\\s)'; //var move is regex based on the move text format var move_element = '(?:' + white_move_number + '?\\s*' + move.source + '\\s+' + numeric_annotation_glyph + '*\\s*' + black_move_number + '?\\s*' + move.source + '\\s+' + numeric_annotation_glyph + '*' + comment + '*)'; var last_move_element = '(?:' + white_move_number + '?\\s*' + move.source + '\\s+' + numeric_annotation_glyph + '*\\s*' + '(?:' + black_move_number + '?\\s*' + move.source + '\\s+' + numeric_annotation_glyph + '*)?' + comment + '*)'; var recursive_annotation_variation = '(?:\\s+|[()]|' + comment + '|' + last_move_element + '|' + game_termination + ')'; //note that rav can't be perfectly validated by regex because the rav can't contain itself var move_plain_text_section = comment + '*(?:' + '(?:\\s*' + move_element + '\\s*|' + recursive_annotation_variation + ')*' + '\\s*' + last_move_element + '\\s*' + recursive_annotation_variation + '*)?\\s*' + game_termination; //Note that comments that appear after game-termination are considered part of the next game (even if there isn't one). var move_binary_section = '[\\s\\S]*'; //Unfortunately the binary regex will cause vgn_file.test() to always return true var vgn_game = tag_section + '(?:' + move_plain_text_section + '|' + move_binary_section + ')'; var vgn_file = new RegExp('^(?:' + vgn_game + ')*$', 'i');
No comments:
Post a Comment