Sunday, May 10, 2015

Chess Notation Updates (1.1)

After reading the specification for Portable Game Notation (PGN; see previous blog post for more information) it got me thinking about some things I hadn't considered when I originally wrote my chess notations. So now I think I'll update them (the new versions of the notations are 1.1).

FCN and SFEN have the symbol + for check and # for "the game is over" but now I'm adding another digit which is either +, #, or absent. The first 2 digits are the same, for the final digit the + means a winner was determined but not because of a checkmate and # means the game is a draw but not because of a stalemate. The new digit allows +# to unambiguously mean checkmate. Examples of non-checkmate wins are by disqualification, surrender, or running out of time. Examples of non-stalemate draws are by agreement, unwinnable game (eg insufficient material), 3 fold repetition, or the 50 moves rule. Most games would not need to change their notation because all of these are rare outside of tournaments. The final digit being either + or # aids memory: +# is the most common win and #+ is uncommon, # is stalemate and ## is another type of draw. The first 2 digits have not changed meaning but the second digit being # is required for a third digit to exist. So Ra1-a8+#+ means that a check occurred, the game is over, and a winner was determined without checkmate. This makes a total of 6 end game combinations and 2 non-end game (instead of 2 and 2). An unfinished game should not use ## since that would mean the game is over instead in VGN use "*" (for non VGN just have no more moves).

BCFEN game termination (something used for VGN etc) is 0x88 (previously unused) although only 0x8 would be necessary, it needs to be divisible by bytes (even though it isn't divisible by board size). PGC uses 0x06 assuming you are at the top level in the game text section (ie not inside of a string, move number, etc). For BCCF the last bit (previously unused) will be 1: the game is over 0: it isn't. Therefore the game termination symbol is a single bit and doesn't change the length of the input at all.

I'll make another change to BCCF. Previously promotion bits of 00 either meant promoted to a rook or no promotion. Thus to determine if promotion was occurring the parser needed to check if the destination was either rank 1 or 8 (depending on color) and if the piece moving was a pawn. But now I'll use the penultimate bit (the bit right after the promotion bits) to be 1: promotion occurred 0: it didn't. This makes parsing easier. So now all of the bits of BCCF are used.

A note on size comparison. PGN explained SAN better. I was a bit off in my initial "Chess Notation" blog post. SAN maximum length is 7 (eg Qa6xb7# and fxg1=Q+) with an average of slightly more than 3. PGC was poorly explained but it sounded like the moves are variable length (thus slower parsing), includes NAG and other unnecessary things (thus larger), the moves are the index of every possible SAN move if sorted alphabetically (thus much more difficult than SAN), and each move takes either 2 or 3 bytes (thus is on average larger than BCCF). This all means PGC is inferior in pretty every way to BCCF. If you want a compressed game use BCCF and store any additional data elsewhere. Or just use VGN with MCN and zip the file or something but there's no need to go through the pain of PGC.

For SFEN I decided that no one cares about en passant and therefore it is not required even if a double move occurs. If you would like to use "-" for en passant (to mark that there isn't one) then you must also have the castling information so that it is not ambiguous. The castling information is now also optional. Thus the new regex for SFEN looks like this:
/^(?:[KQBNRPkqbnrp1-8]{1,8}\/){7}[KQBNRPkqbnrp1-8]{1,8}(?: [WBwb])?(?: (?:-|K?Q?k?q?)(?: -| [a-hA-H][1-8])?)?(?: \+?(?:#[+#]?)?)?$/
and again the empty string is not valid castling information and it should not have trailing spaces. FCN's regex is now:
/^(?:P[A-H][1-8]-[A-H][1-8](?:EN|(?:X[QBNRP])?(?:=[QBNR])?)|[KQBNR][A-H][1-8]-[A-H][1-8](?:X[QBNRP])?|[KQ]C)\+?(?:#[+#]?)?$/i
the castling was moved to the end because otherwise Qc1-a1 would match as a queen's castle. FCN has a minimum length of 2 (eg KC) with a maximum of 13 (eg Pa7-B8xR=q+#+) with the average being more than 6.

For the record, of my notations I just made changes to: BCCF, FCN, SFEN, and BCFEN (they are all now version 1.1). My only notation that didn't change was MCN because there's nothing to change, it does exactly what it is intended to do perfectly: express a chess move with a notation as simple as possible, as short as possible, and in a human friendly way (in that order). (I also didn't change VGN because I just made it).

No comments:

Post a Comment