Splitting a string into all possible 4-letter sequential phrases

What I am trying to do is basically this:

  • Read the file;
  • Remove all punctuation marks and convert all letters to lowercase;
  • Convert words to 4 letter phrases (if the word is shorter than 4 characters, take it as a whole);

Example:

Login: Hello, my identity is Mister Dude.

Result: hell, ello, my, iden, dent, enti, ntif, tifi, ific, fica, icat, cati, atio, tion, is, mist, iste, ster, dude.

It would be nice if I could get each 4 word phrase as a separate value in an array.

Now all I managed to accomplish:

public String[] OpenFile() throws IOException {
    FileReader fr = new FileReader(path);
    BufferedReader textReader = new BufferedReader(fr);
    int numberOfLines = readLines();
    String[] textData = new String[numberOfLines];
    int i;

    for (i = 0; i < numberOfLines; i++) {
        textData[i] = textReader.readLine();
        textData[i] = textData[i].replaceAll("[^A-Za-ząčęėįšųūž]+", " ").toLowerCase();
    }
    textReader.close();

    return textData;
}

      

textData[i]

is each line of text that I need to split. I have tried several methods such as .toCharArray

2D arrays, but I cannot describe the letter layout part. How can I complete task # 3?

+3


source to share


4 answers


Tested on ideone.com :



public static void main (String[] args) {
    String text = "Hello, my identification is Mister Dude.";
    String[] words = text.replaceAll("[^(\\w )]+", "").toLowerCase().split(" ");
    for (String word : words) {
        if (word.length() <= 4) {
            System.out.println(word);
        } 
        else {
            for (int i = 0; i <= word.length() - 4; i++) {
                System.out.println(word.substring(i, i + 4));
            }
        }
    }
}

      

+2


source


Basically, for each word, you need to iterate over the possible positions to start a four-letter sequence of:



public static List<String> sequences (String line) {
    List result = new LinkedList<>();
    String[] words = line.split(" ");
    for (String word : words) {
        if (word.length() <= 4) {
            result.add(word);
        } else {
            for (int i = 0; i <= word.length() - 4; ++i) {
                result.add(word.substring(i, i + 4));
            }
        }
    }

    return result;
}

      

+3


source


Example by command:

    List<String> result = new ArrayList<String>();
    for (int i = 0; i < textData.length; i++) {
        String[] currLine = textData[i].split("\\s+");
        for (String word : currLine) {
            if (word.length() > 4) {
                for (int j = 0; j < currLine.length - 4; j++) {
                    result.add(word.substring(j, j + 4));
                }
            } else {
                result.add(word);
            }
        }
    }

      

I have not tested it, so please check and let me know if it works.

+1


source


First you need to separate your methods with spaces and punctuation marks. Notice the division on line 3 that breaks into any combination of spaces and punctuation marks.

In my example, I had

    String text = "Hello, my identification is Mister Dude.";

    String[] textArray = text.split("\\W+");
    List<String> result = new ArrayList<>();
    for (String word : textArray) {
        result.addAll(Arrays.asList(split(word.toLowerCase(), 4)));
    }

      

and then the method

private static String[] split(String word, int letters) {
    if (word == null || word.length() == 0) {
        return new String[] {};
    } else if (word.length() <= letters) {
        return new String[] { word };
    } else {
        int quantity = (word.length() - letters) + 1;
        String[] val = new String[quantity];
        int a = 0;
        while (a + letters <= word.length()) {
            val[a] = word.substring(a, a + letters);
            a++;
        }
        return val;
    }
}

      

Outputs the following

[hell, ello, my, iden, dent, enti, ntif, tifi, ific, fica, icat, cati, atio, tion, is, mist, iste, ster, dude]

      

+1


source







All Articles