Whiz had a great post analyzing letter frequencies and pulling out some awesome seed words to use to boost your quordle scores. You’ll get no such help from me. Not only am I not qualified to give such advice (considering I flirt with the Tundra line most day), but I choose different seed words every day as they pop into my head.
No, this article will not be useful for you. It won’t be insightful. It’s a glorified science fair project, and the presentation quality will be on par with that.
I built a python script that I affectionately call “QuordleBot”. It takes in a solution and the wordlist Whiz used and plays quordle with the same information you and I have. It’s a test of strategy and technique. Yes, it can do things that we can’t (like count how many word possibilities there are for each quadrant), but it’s also limited in ways we aren’t.
[Insert Hype’s NERDS sign here]
QuordleBot outputs some interesting information after each guess. Under the hood, it keeps separate wordlists for each quadrant and synthesizes them into a master wordlist. After each guess, it displays how many words were removed from each quadrant’s wordlist and how many are left. You can start to see interesting patterns as you run QuordleBot a few times. The first seed word is very important, for example.
Here’s a representative run from QuordleBot for the May 24th Quordle:
One thing to note about this is that the results are non-deterministic. It doesn’t get the same score every time it is run. This is mostly due to the random first seedword. From there the choices are dictated mathematically. Oddly enough, my successive runs for this post landed on the same exact score from different seed words. I ran it again to show that the scores can be different.
Anyway, all of this is great, but the interesting part is where I get to play with the guessing algorithm. Version 0.1 was just to test out the infrastructure (making sure it would display correctly and the like). It was a pure random guess from the 5100-some words in the wordlist. I have seen a few quadruple chumps, which makes sense because you’re picking 9 times from a hat filled with 5100 letters to find 4 words. I haven’t sat in a stats class in over a decade, but I know that the likelihood of not chumping is somewhere between pigs flying and Swiss not narrowing his gaze.
Clearly, some improvements were needed, and the obvious ones came first. Version 1.0 was still random word selection, but QuordleBot kept track of independent word lists for each quadrant and removed words when they were no longer viable. It then combined those lists into a single master wordlist and randomly chose from that list. This was a massive improvement. QB was no longer a guaranteed 2 chumper, but it was still a pretty bad quordle player. The seedword guesses were hot garbage, often including only a single vowel and letter duplicates. Single and double chumps were routine, and, more alarmingly, there were situations where QB knew the solution but wasn’t able to solve it. If, for example, bottom left had one word left in its wordlist, QB dutifully added that one word to the master wordlist and randomly selected a word from that list.
Version 1.1 improved the seedword problem. I had QB generate a seedword list of all words with certain characteristics. As of now, those characteristics are (1) no duplicate letters; (2) at least 2 vowels; and (3) vowels plus common consonants (tslnhrcp) must make up at least 4 letters of the word. This is probably a bit overbroad as it covers 40% of the total wordlist (you can see that there are 2160 seedwords according to the screenshot above), but I haven’t bothered optimizing the seedword list yet. Anyway, in version 1.1, QB randomly chose from the seedword list for guesses 1 and 2 before expanding to the whole wordlist for guess 3.
Version 1.2 tackled the “unable to solve” issue. I established a threshold (that can be tuned) where QB can override the random selection from the master list (or seedlist) to instead randomly select from a certain quadrant’s list. You’ll see that behavior in most of the guesses above where it says “chosen from tr [top-right] wordlist” or similar. The threshold can actually be tuned separately for a few different conditions. For guess 2, the threshold is much lower than for guesses 3+ so that we only override the default behavior if there’s a high likelihood of getting a word in 2. At this point, QB was an average quordler. Most scores were at or slightly above the Tundra line, but the teens were popping up on occasion and chumping had become blessedly uncommon. However, I did notice a glaring issue. When I’m using seed words in my first two guesses, I make an attempt to have them be as different of words as possible. QB wasn’t doing that. It was just randomly pulling the next word from the seedlist.
Version 1.3 experimented with generating a smaller seedlist after the first guess containing only words that had no overlap with the first guess. This was not successful, and I backed it off to a smaller seedlist containing only words that had no vowel overlap with the first guess. It took me a while to figure out why I was having trouble with this code…. quite often, the first guess eliminates so many words from the seedlist that there are none left that don’t overlap with the first guess. The solution was simple. Make it cascade from the no duplicate letters seedlist to the no duplicate vowels seedlist to the full seedlist to the full wordlist. If one is empty, move on to the next one. You can see in guess two where QB outputs how big those first two lists are. They’re the same size in this example, but that’s only sometimes the case. At this point QB was a fairly decent quordler, regularly scoring in the upper teens and low 20s. Now was the time for the big experiment….
Version 2.0 is the current version and it experiments with what I call “impact”. Impact is a measure of how many words a potential guess eliminates from the wordlist. Ah, but trashy, you idiot… unless you were to know the solution ahead of time, you’d have no clue how many words are eliminated by that guess. You must be cheating with this impact thing!
Well, no. Err, at least, not exactly. QB does something humans couldn’t possibly do as it would take us ages. Here’s how QB measures the impact of every single word in the wordlist:
(1) QB iterates through every possible (and many impossible) ordering of green, yellow, and gray squares for each potential guess word.
(2) For each ordering of colors, QB scans the entire wordlist and counts how many words would be eliminated if that was the result of the guess.
(3) The number of words eliminated is summed across all of the orderings and is used as the impact score.
(4) QB selects the word with the highest impact score to play as the guess.
This process has replaced random searching in every single decision branch except for one. Guess 1 is still random. This is due to performance limitations. Even guess 2 is rather slow (on the order of a minute) when the first guess wasn’t particularly good. If, for example, there were 200 non-duplicate seedwords left after guess 1 and 1000 total words left in the wordlist, impact scoring all of the non-duplicate seedwords will result in the code running in a loop 200 (non-dup seeds)*3^5(number of orderings)*5(number letters that have to be compared)*1000(words to be compared to) = 243,000,000 times, which takes a bit of time. By the time you get to guess 3, the sizes of the lists you’re working with are small enough that it usually takes fractions of a second to calculate the impact scores (of course, right after writing this, I did a run for fun and it hung on guess 3 for reasons not worth explaining).
The impact score method has reduced the number of runs where QB struggles to sneak in the last word in guess 9 or ends up chumping, but I’m not sure that it helps with getting an earlier start on solving. Us humans tend to be able to rattle off 3 or 4 in a row as the letters start to fall into place. QB gets stuck guessing some weird esoteric words that are probably not in the quordle list, and the result is a whole lot of 3 -> 5 -> 8 -> 9 scores.
Next steps for me are to take this in a few directions. First, I want to optimize the algorithm for calculating impact by removing the impossible (e.g. 4 greens and a yellow) and irrelevant (e.g. 5 greens) cases from the calculations. Second, I want to introduce some randomness back in to the process by having QB do a weighted selection between the top X guess candidates by impact. Finally, I want to find a way to introduce commonness of the word into the equation. My intuition has been that the more common word is the right answer most of the time, and I want to give QB that same intuition.
If any of you [insert Hype’s NERDS sign here, too] want to tinker with the code, you can access it here:
Feel free to pull a fork or request edit permissions and add a branch to the project.