How to search for a string and similar word in the text?
I need to search for the word "age" and a similar word in a text file.
I have the following suggestion:
- 18 years
- man aged 51
- the age of a person is between 25 and 50 years old.
- from 5 to 75 years old (with a dot)
- from 5 to 75 years old (with a comma)
- agent name - xyz (agent contains age).
String.contains always return true in every case. My requirement is to pass the first to fifth clause and it returns false in the latter case.
I'll solve this problem by writing code containing a bunch of strings "age", "age"., "Age", "age", "age", etc.
Is there any better way to solve this problem.
source to share
A naive solution (expensive) would be as follows:
- tokenize each line (for example, separating by "" or even non-alphanumeric characters that already remove punctuation).
- calculate the edit distance of each word to the word age
- if the current word has a small edit distance (for example, below 2), the backward line
The edit distance of two lines is the number of corrections (additions, deletions, and replacements) that are required to make one line equal to the other. You can find an implementation of edit distance in the simmetrics library, or perhaps elsewhere.
Another option might be to stop the words in step 2, and the use contains the stem words age (also expensive).
If you already know all of the acceptable answers (or at least a sample of them), open up Avinash Raj's answer .
source to share
What you need is called regex (or regex)
Here is a detailed regex definition and usage in Java that can be accomplished using the match (String Regex) String class .
For your example, he can (usually) be: myString.matches(".*age? .*")
.
Note the escaping of special characters in Java. You can try your regular expressions here . I didn't do it in the example above, but you can try :)
More details:
- ... *: the offer can start from everything
- age: sentence must contain "age"
- ?: age can be followed by zero or one character.
- : followed by a space
- ... *: then all over again
Hope it helped.
source to share