Wordle Solver

About

Wordle is a word puzzle game where a user attempts to guess a certain word based on hints from previous guesses. The official game was purchased by The New York Times and can be found here. According to the game page, the rules are as follows:

Green letter - Letter is in the word and in the correct spot
Yellow letter - Letter is in the word but wrong spot
Gray letter - Letter is not in the word in any spot

The program was written in Python with the help of the Pandas library which allows for quick and efficient data manipulation using Dataframes. The videos demonstrates the program solving a variety of words from lengths 4 to 11. Although the initial version 1 of the program was useful in determining the list of possible answers left, randomly choosing from that list was not going to be consistently efficient. As a result, a final version of the program was written where the program seeks out letters to be eliminated and finds a most optimal word that contains as many of those letters as possible using set operations. The Wordle game site used in the demonstration can be found here.

The Google Colab page can be found here. And source code/ word files here.

The Unexpected Edge Case

It was not until I was done writing version 0 of the program and started testing it that I realized that there was an edge case. The gray letter's condition- Letter is not in the word in any spot can actually be misleading because the Green and Yellow letter's condition take precedence over it.

For example when the answer is TRAIT, if you guess NAVAL, the first A will be yellow and the second A will be gray. The presence of the second gray A does not, however, mean that the letter A will not be found in the answer.

Similarly, if you guess DRAMA, the first A will be green and the second A will be gray. Therefore, the presence of a gray letter means that it does not appear in the answer if and only if that letter does not appear green or yellow elsewhere in the guessed word.

It may seem like the only useful information we can obtain from a gray letter that is present in a guess where it also appears as green or yellow is that the letter does not appear in that gray spot in the final answer. However, there is actually some valuable information we can gain from this edge case.

Exploiting the Edge Case

In the previous example, we saw that the first A in DRAMA was green while the second A was gray. Apart from knowing that the fifth letter in the final answer is not A, we can also deduce that the final answer will only have one A in it due to the presence of the gray A at the end.

In the absence of a gray letter, the most we can assume is that we will have at least the number of green + yellow letters in the final answer. For example, when we guess TARTS, we know for sure that the final answer will have at least two Ts but we will not know the exact number. Similarly when we guess BUTTS, we know that the final answer will have at least two Ts.

In the presence of a gray letter, we can conclude the number of Ts we will see in the final answer. In addition to telling us where that letter will not be, the presence of a gray letter also serves as an upper limit to the number of times we can see that letter appear in the answer. For example, due to the third T being gray in the word TUTTI, we are able to conclude that the final answer has exactly two Ts.

Determining the conditions

The conditions are then evaluated as follows:

1. All the Green letters are looked at and condition 1 is applied.

Condition 1: The word must contain that letter in that spot

2. All the Yellow letters are looked at and condition 2 is applied.

Condition 2: The word must contain that letter and the letter does not appear in that spot

3. All the Gray letters are looked at and either condition 3A or condition 3B is applied.

If that letter did not appear Yellow or Green in the feedback, condition 3A is applied.

Condition 3A: The word does not contain that letter.

Otherwise, if that letter appeared Yellow or Green in the feedback condition 3B is applied.

Condition 3B: The word does not contain that letter in that spot.

4. Iterate through the set of green and yellow letters and either condition 4A or condition 4B is applied.
*Set contains unique elements because it is only necessary to check each unique letter once.

If that letter did not appear gray anywhere in the feedback, condition 4A is applied.

Condition 4A: The number of times that letter appears in the answer is at least the number of times it appeared Green + Yellow.

If that letter appeared gray anywhere in the feedback, condition 4B is applied.

Condition 4B: The number of times that letter appears in the answer is exactly the number of times it appeared Green + Yellow.

5. Logical AND all the accumulated conditions together

How well the Program does so far

There are about 2300 total answers in the New York Times Wordle game and nearly 13000 possible valid words to enter as guesses. The program went through each of the 2300 answers and started off each time by randomly selecting from the list of 13000 possible valid words as a guess. It was assumed that the answer could be any of the 13000 possible valid guesses just to see how well the program does without overfitting to the data. In addition to this, we may not always know the list of answers if we run this on a site other than NY times.

The conditions were then evaluated based on the feedback color codes. The program continued to randomly select from the list of possible answers that fit the previously accumulated conditions. This process was repeated until it arrived at the correct answer.

The average number of guesses required to arrive at the correct answer was about 4.888.

A demonstration video and additional detailed information including the word guess history, chances of winning and number of possible answers on each guess round and bits of information obtained can be found here.

Determining the Best Starter Word

First of all, we have to define what we mean by the best starter word. In this case, the best starter word would be the word which on average gives us the most information or leaves us with the least number of possible words left to choose from. To illustrate this, on the right column in the matching diagram, we have the list of valid words to be used as guesses. From Aahed to Zymic, the Wordle game accepts 12953 guesses as valid words. On the right column, we have the list of Wordle answers which will be used in the New York Times which amount to about 2310 words. Each of the 12953 possible guesses will be sent in as an answer towards each of the 2310 words. The average amount of information and number of possibilities will be calculated for each of the 12953 words.

For example, when Aahed is used as a guess when the answer is Aback, we will get aahed. The number of words that match the criteria based on the color feedback is 139 so we went from 12539 possible guesses down to 139. This gives us about 6.495 bits of information since number of possibile words was halved about 6.495 times.

If we use Aahed when the answer is Abase, we will get aahed. The number of words that match the criteria based on the color feedback is 38. This gives us about 8.366 bits of information.

If we use Aahed when the answer is Zesty, we will get aahed. The number of words that match the criteria based on the color feedback is 1634. This gives us about 2.940 bits of information.

If we use Aahed when the answer is Zonal, we will get aahed. The number of words that match the criteria based on the color feedback is 1126. This gives us about 3.477 bits of information.

Aahed would be passed to all 2310 words. The average bits of information and number of words left that match the criteria will be computed to evaluate how good it is. The average number of bits of information from Aahed was 4.468 bits and the average number of words left that match the criteria was about 1048.32 words. This will be evaluated for all 12953 words. With nearly 30 million iterations, this program took a very long time to run but the results seem to be worth it in the end. While the average number of bits and average number of possible words left yielded different results (which is not surprising given that the logarithm is non-linear), it was quite clear that Soare was one of the best openers as it yielded an average of 6.245 bits of information and left us with an average of 273.92 words left to choose from. A summary of the results can be found below. The code and full results can be found here.

I also learned a lot about information theory from Grant Sanderson's video (3Blue1Brown). If you want to learn more about how this works be sure to check out his video. Because our program does not overfit to the wordle answer list, the results of the best starter words may differ slightly.

Making the most out of each guess

Perhaps we can gain some insight into how to reduce the number of guesses by observing the guess history for words that take an especially large number of guesses.

Just for example, we might end up with a list of possiblities looking like:

maker
taker
baker
faker
mater
poker

Rather than randomly select from the list of possible answers, it might be a better idea to guess a word like ambit. This method will be more efficient in shrinking the list of possible answers.

If we get a, then we can eliminate poker. Otherwise if we get a, we can conclude the answer is poker.

If we get m, then we know the answer is either maker or mater. Otherwise, if we get m, we can eliminate maker and mater.

If we get b, we know the answer is baker. Otherwise, if we get b, we can eliminate baker.

If we get t, we know the answer is either taker or mater. Otherwise, if we get t, we can eliminate taker and mater

With this method implemented and the word tares being used to start off each guess, the average number of guesses required to arrive at the correct answer was about 4.502.

A demonstration video and additional detailed information including the word guess history, chances of winning and number of possible answers on each guess round and bits of information obtained can be found here.

If the program knew the word bank

Up to this point the program had assumed that the answers could be any of the 13000 words. Using the most recent elimination method, if the program knew that the answers could only be from the list of about 2310 words, the average number of guesses decreases to 3.899 with tares used as the starting guess each time.

Perhaps we can achieve an even lower average if we tried every single possiblity and found a single word on each attempt that would yield the absolute least number of possible answers. However, since this program is written to work for words of any length and we may not always know the answer list - this will be a challenge.

The source code and DataFrames containing the recorded info can be found here.