Calculate text value from dictionary of words in Java 8

I am having a problem converting my algorithm to Java 8 representation.

I have arrayList

, consisting of articles

ArrayList<Article> listArticles = new ArrayList<>();

      

With an article drawn up this way

public class Article {
    private String titleArticle;
    private String abstractArticle;
    private String textArticle;
    private Long value;
}

      

and on the other hand, I have a map of words, each of which corresponds to a meaning

HashMap<String, Long> dictionary = new HashMap<>();

      

I want to get the value of an article. The value of the article is calculated based on the words in the title, abstract and text (all together)

In Java 7 I would do something like this (I hope I am not wrong here)

for(Article article : dataArticles){
    double valueArticle = 0;

    for(Map.Entry<String, Long> word : dataDictionary.entrySet()){

         //looping through the words in the title
         for(String text : article.getTitle().split(" ")){
            if(text.equalsIgnoreCase(word.getKey())){
                valueArticle += word.getValue();
            }
         }
         //looping through the words in the abstract
         for(String text : article.getAbstractText().split(" ")){
            if(text.equalsIgnoreCase(word.getKey())){
                valueArticle += word.getValue();
            }
         }
         //looping through the words in the abstract
         for(String text : article.getText().split(" ")){
            if(text.equalsIgnoreCase(word.getKey())){
                valueArticle += word.getValue();
            }
         }
    }

    article.setValue(valueArticle);
}

      

How can I calculate the value of each article inside the array, saving time?
I was thinking about using a lambda, but it might be a bad approach.
I am new to Java 8 and am trying to learn it.

After some development

Still looking at how to use my streams arrayList

. In the meantime, I also wanted to sort the list with the highest article value down to the lowest value value. I imagined it would be something like this

Comparator<Article> byArticleValue = (a1, a2) ->
Integer.compare(a1.getValue(), a2.getValue());
dataArticles.stream()
        .sorted(byArticleValue);

      

But my list looks unsorted. What am I doing wrong in this case?

+3


source to share


4 answers


If your dictionary keys are not lowercase, you should create a lower-value version and reuse it:

/**
 * Create a copy of the dictionary with all keys in lower case.
 * @param lc a dictionary of lowercase words to their value
 * @param article the article to be evaluated
 */
static Map<String, Double> convert(Map<String, Double> dictionary) 
{
  return
      dictionary.entrySet().stream()
      .collect(Collectors.toMap(e -> e.getKey().toLowerCase(), 
               Map.Entry::getValue, 
               (p, q) -> p + q));
}

      

Then, for each article, you can quickly compute the value using a stream pipeline:



/**
 * Compute the value of an article.
 * @param lc a dictionary of lowercase words to their value
 * @param article the article to be evaluated
 */
static double evaluate(Map<String, Double> lc, Article article)
{
  return
      Stream.of(article.getTitle(), article.getAbstractText(), article.getText())
      .flatMap(s -> Arrays.stream(s.toLowerCase().split(" ")))
      .mapToDouble(k -> lc.getOrDefault(k, 0D))
      .sum();
}

      

For more flexibility in folding words, you can Collator

index with CollationKey

words rather than lowercase. A similar reinforcement could be done to tokenize the text rather than just splitting into spaces.

+1


source


A hash map can do searches very quickly. If you change your code a little, you can get huge runtime savings.

long getValueOfText(String text) {
    long value = 0;
    for(String word : text.split(" ")) {
        Long v = dataDictionary.get(word);
        if (v != null) {
            value += v;
        }
    }
    return value;
}

      

This challenge is get

almost free. No matter how many words you store on your card, it will take a while to look at it.



EDIT: It looks slightly better than Java 8 stream

long getValueOfText(String text) {
    return Arrays.stream(text.split(" "))
                .map(word -> dataDictionary.get(word))
                .filter(v -> v != null)
                .reduce(Long::sum).get();
}

      

+2


source


The Java 8 way is using streams.

You can read about them here: http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html and part 2: http://www.oracle.com/ technetwork / articles / java / architect-streams-pt2-2227132.html

Here's some sample code:

public static Map<string, integer=""> wordCount(Stream<String> stream) {
    return stream
      .flatMap(s -> Stream.of(s.split("\\s+")))
      .collect(Collectors
        .toMap(s -> s, s -> 1, Integer::sum)); 
}

      

Instead of iterating over the elements, you can process the data using the stream and use its methods to sort and organize through it. In the above example, the code flatmap

splits the lines of text into words, and the method collect

collects them into Map<String, Integer>

, with the key being a word and the value being its counter.

0


source


Java8 Streaming API is the way to go. This will make your code much faster and enable multithreading.

I rewrote your code into this compiled example:

public class Snippet {

    static ArrayList<Article> listArticles = new ArrayList<>();
    static HashMap<String, Long> dictionary = new HashMap<>();

    private static void calculateWordValueSums(ArrayList<Article> listArticles) {

        // turn your list of articles into a stream
        listArticles.stream()

        // allow multi-threading (remove this line if you expect to have few articles)
        .parallel()

        // make calculation per article
        .forEach(article -> {

            // set the "value" field in the article as the result
            article.value =

                    // combine title, abstract and text, since they are counting all together
                    Stream.of(article.titleArticle, article.abstractArticle, article.textArticle)

                    // split every text into words (consider "\s" for to allow tabs as separators)
                    .flatMap(text -> Arrays.stream(text.split(" ")))

                    // allow multi-threading (remove this line if you expect to have few words per article)
                    .parallel()

                    // convert words into their corresponding integer value
                    .mapToLong(dictionary::get)

                    // sum all Longs
                    .sum();

            System.out.println(article.value);
        });
    }

    public static void main(String[] args) {

        Article a = new Article();
        a.titleArticle = "a b c";
        a.abstractArticle = "d e";
        a.textArticle = "f g h";
        listArticles.add(a);

        dictionary.put("a", 1l);
        dictionary.put("b", 1l);
        dictionary.put("c", 1l);
        dictionary.put("d", 1l);
        dictionary.put("e", 1l);
        dictionary.put("f", 1l);
        dictionary.put("g", 1l);
        dictionary.put("h", 1l);

        calculateWordValueSums(listArticles);
    }
}

class Article {
    String titleArticle;
    String abstractArticle;
    String textArticle;
    long value;
}

      

However, you must revise your class Article

. The field value

will be null until the calculation is done. Consider a class Article

that contains only the inputs for the calculation and a class ArticleWithResultValue

that contains the link to the article and the resulting value. This will give you the compiler's help as to whether the calculation has already been done.

-1


source







All Articles