How do I search for a character in any order (12 letters of which 6 must be a word) with PHP?

I've been thinking about this all day and can't seem to figure out the memory efficiency and the fast way. The problem is this:

for example i have the following letters: efjlnrrttuwx (12 letters)

I'm looking for this word TURTLE (6 letters)

How to find all possible words in a full range (12 words) using php? (Or with python if it could be much easier?)

Things I've tried:

  • Using permutations: I made all strings possible using a permutation algorithm by putting them in an array (only 6 characters long) and do in_array to check if it is one of the words in my array with valid words (in this case, containing a TURTLE. but sometimes two or three words). This computation costs a lot of memory and time, especially with 6+ characters to get the permutations.

  • creating a regex (I'm bad at this). I wanted to create a regex to check if 6 of 12 (input) characters are in a word from a "valid array". the problem is that we don't know which letter out of 12 will be the original position and the position of the other words.

An example of this would be: http://drawsomethingwords.net/

I hope you can help me with this problem as I would really like to fix it. Thanks for all your time :)

+3


source to share


4 answers


I am having similar problems when writing a crossword editor (eg find all words of length 5 with "B" in the second position). It basically boils down to:

  • Process the word list and order words by length (i.e. a list of all words of length 2, length 3, length 4, etc.). The reason is that you often know the length of the word (s) you want to search for. If you want to search for words of unknown length, you can repeat the search again for a different list of words.
  • Insert each separate word list into a tertiary search tree , which greatly speeds up word searches. Each node in the tree contains a character, and you can go down to the tree to search for words. There are also specialized data structures such as trie , but I haven't looked into it (yet).

Now for your problem, you can use the search tree to write the search function like

function findWords($tree, $letters) {
   // ...
}

      



where tree

is a search tree containing words of the length you want to find and letters

is a list of valid characters. Your example letters

would be a string efjlnrrttuwx

.

The search tree allows you to search for words, one character at a time, and you can keep track of the characters you come across. As long as these characters are in the list of valid letters, you continue searching. When you came across a leaf node in the search tree, you found an existing word that you can add to the result. If you come across a character that is not in letters

(or has already been used), you can skip that word and continue searching elsewhere in the search tree.

My Palabra crossword editor contains the implementation of the above steps (some are done in Python, but mostly in C). It works fast enough for the Ubuntu default wordlist of roughly 70k words.

+2


source


There are probably better ways, but this is just from the head:

I am assuming that you have a database of words (like a dictionary). Add az fields to your database table. Write a script that sums the count of each letter in a word and writes them to az fields as an integer. I.E. for a cylinder, the table will look like this:

id    name       a    b  ...  l  ...  n  ...  o
1     balloon    1    1       2  ...  1  ...  2

      



Then, when the user enters a word, you calculate how many of each character are in that word and match that using the database.

// User enters 'zqlamonrlob'
// You count the letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 1 0 0 0 0 0 0 0 0 0 2 1 1 2 0 1 1 0 0 0 0 0 0 0 1

// Query the database
$sql = "SELECT `name` FROM `my_table` WHERE `a` <= {$count['a'] AND `b` <= {$count['b'] ...}";

      

This will give you a list of words that use some or all of the letters entered by the user.

+1


source


Here's a regex, just to show it, can (but doesn't have to) be done:

preg_match('/^(?:t()|u()|r()|t()|l()|e()|.)+$\1\2\3\4\5\6/i', 'efjlnrrttuwx')

      

corresponds.

How it works? Empty parentheses for parentheses always match if the previous letter matches. The backlinks at the end of the regex make sure that each of the characters participated in the match. Hence,

preg_match('/^(?:t()|u()|r()|t()|l()|e()|.)+$\1\2\3\4\5\6/i', 'efjlnrrtuwx')

      

(correctly) won't match because there is only one in the string t

, but the regex needs two different t

s.

The problem is that, of course, the regex engine has to check many permutations to arrive at this conclusion. While a successful match can be quite fast (175 steps for the regex engine in the first case), a failed match can be expensive (3816 steps for the second case).

+1


source


I think you need to approach this problem in the opposite direction.

Scroll through the list of words, checking words with the specified number of characters to see if the words are in the specified character set.

0


source







All Articles