Advent of Code 2020: Day 19 reflection
Advent of Code 2021 starts tonight! But before then, I have one more reflection from last year’s challenge to share.
This one is about the day 19 puzzle, which, to be perfectly honest, I did not particularly enjoy.
From the puzzle description:
Dear reader, the "a"
and "b"
from the example were not just an example.
In my puzzle input file, there were in fact only "a"
and "b"
character
rules, followed by an unwieldy number of “messages” that looked like lots of
ababbbab
and aabaaaba
and baaabbaa
.
I thought this was a bit too much.
My original approach
In my first attempt at this puzzle, I thought, “I know, I’ll use regular expressions.”
The input rules format seemed half way to regular expression syntax anyway,
so I thought, why not translate rule 0
into one giant regular expression and
run all the messages through it?
So the example rule set would turn into something like this:
rule 1: /a/
rule 3: /b/
rule 2: /(ab|ba)/
rule 0: /a(ab|ba)/
This strategy worked just fine for part 1 of the puzzle, but looking back, I will concede that it was unnecessarily complicated.
(It would probably have been better even to build a set of all possible valid messages, and use that to query whether each message in the input file was present in the set of valid messages.)
But when it came to part 2…
The new rule 8
can still be expressed as a regular expression: it’s just “1
or more” copies of rule 42
(we can use the regex +
symbol for this).
It’s rule 11
that throws a monkey wrench into the regular expression
strategy. In words, we could describe this rule as “one or more matches of rule
42
, followed by an equal number of matches of rule 31
”. Sadly this “equal
number of matches” can’t be expressed as a regular expression.
In fact, this is literally the first non-example of a regular language in the wikipedia article:
I still hacked together a solution out of my existing work for part 1, but it relied on a couple important details:
- Neither rule
42
nor rule31
(or any rules referenced by them) contained any reference to rule8
or11
. - As it turned out, all of the possible matches for both rule
42
and rule31
had the same length (8 characters).
The other crucial detail is that the actual rule 0
(at least in my puzzle
input) was this: 0: 8 11
.
Combining rules 0
, 8
, and 11
then, essentially what I was looking for was
“two or more matches of rule 42
, followed by some smaller (but non-zero)
number of matches of rule 31
”.
Putting all this together, it meant I could examine a message in chunks
of 8 characters. If a message wasn’t a multiple of 8 characters, I could
discard it right away; otherwise, I could see how many chunks (starting from
the beginning) would match rule 42
, and how many chunks (now starting from
the end) would match rule 31
. Let’s say the whole message was \( n \)
chunks in length. If we had \( x \) chunks matching rule 42
, and \( y \)
chunks matching rule 31
, then as long as there was some \( k \) that could
satisfy all of
this would indicate a valid match.
Like I said, I eventually got this to work, after a fair bit of trial and error.
But this all left me wondering, what’s the “right way” to do something like this, in the general case? Is there some existing parsing library that’s already solved this problem for us?
GLL parsing to the rescue
I recently came across the packrattle
parsing library, which can do just this! (I’m sure there are others that
would work too, but this was one in a language I felt reasonably confident
that I could use.)
The underlying parsing algorithm is fairly magical to me, but the library is simple to use.
The core of my packrattle
-based solution looks like this:
const packrattle = require('packrattle');
function parseRule(p, line) {
var [id, rule] = line.split(': ');
id = parseInt(id);
if (rule[0] === '"') {
p[id] = packrattle.string(rule[1]);
} else {
p[id] = packrattle.alt(...rule.split(' | ').map(
(e) => packrattle.seq(...e.split(' ').map(
(i) => {
i = parseInt(i);
return () => p[i];
}))));
}
}
This function can be used to build up an object p
of rule parsers (keyed
by the rule ID number), using these packrattle
functions:
string()
matches a fixed input string (in our case, either"a"
or"b"
)alt()
matches any one of a list of alternativesseq()
matches a list of things in sequence
The rest of the solution reads from standard input, parsing rules until it
reaches a blank line, and then switches to feeding input lines to the final
p[0]
parser, keeping a running count of valid matches.
This certainly works, but can we go a little meta? Why are we messing around
with split()
to parse the input rules? Why not construct a parser that
itself returns parsers?
Behold:
const ruleParser = (function() {
const ruleId = packrattle.regex(/\d+/).map(m => parseInt(m[0]));
const character = packrattle.regex(/"(.)"/).map(m => m[1]);
const ruleSeq = packrattle.repeatSeparated(ruleId, ' ');
const ruleAlts = packrattle.repeatSeparated(ruleSeq, ' | ');
const ruleDelim = packrattle.string(': ').drop();
return packrattle.seq(ruleId, ruleDelim, packrattle.alt(character, ruleAlts));
})();
function parseRule(p, line) {
var [id, rule] = ruleParser.run(line);
if (typeof rule === 'string') {
p[id] = packrattle.string(rule);
} else {
p[id] = packrattle.alt(...rule.map(
ids => packrattle.seq(...ids.map(
id => () => p[id]))));
}
}
Is this any more readable that the last version? Not really, no.
But this was a fun exercise in using a parsing library!