How to get the best Big O time complexity in an exercise on page sequences in java

The problem is this:

The entries are written in chronological order to a file with one entry per line. The format of each record is:

[timestamp] [space] [user id] [space] [page type-id] \ n

Your task is to identify the 10 most common three page sequences for all users in the log set.

For example, here's an example of a log:

1248977297 BBBB Search
1248977302 AAAA Browse
1248977308 BBBB Search
1248977310 AAAA Browse
1248977317 BBBB Search
1248977325 AAAA Search
1248977332 AAAA Search
1248977332 BBBB Search
1248977339 BBBB Checkout
1248977348 AAAA Search
1248977352 BBBB Browse
1248977367 AAAA Search



The first three-page sequence made by user AAAA is "Browse->Browse->Search"
The second three-page-sequence made by user AAAA is "Browse->Search->Search" 
The third three-page-sequence made by user AAAA is "Search->Search->Search"
The fourth three-page-sequence made by user AAAA is "Search->Search->Search"

      

The output of the program with the example data should be:

Search -> Search -> Search = 4
Browse -> Browse -> Search = 1
Search -> Search -> Checkout = 1
Browse -> Search -> Search = 1
Search -> Checkout -> Browse = 1

      

The output should contain the first 10 three-page sequences (in order) and the number of occurrences for each one.

The best algorithm that comes to my mind is O (n ^ 2), but I find answers that say it can be done in O (N + N * lg (N)), how can I archive this complexity ?, say enumeration in O (N) and sorting in O (N lg (N))

/* Solution
 * Runtime complexity: O(n^2).
 * Spatial complexity: O(n).
 */
import java.io.*;
import java.util.*;

public class Solution {

    public static void main(String args[]) throws IOException {
        /*
         * Reads the input from a txt file.
         */
        String file = "C:\\Users\\Public\\Documents\\txt\\files";
        BufferedReader f = new BufferedReader(new FileReader(file + ".txt"));
        String line = "";

        /*
         * @map data structure to store all the users with their page ids.
         */
        Map<Integer, List<String>> map = new HashMap<Integer, List<String>>();

        /*
         *Read the txt or log file and store in the @map the user<Integer> and in a list<String> all the page sequences that he visited.
         */
        while ((line = f.readLine()) != null && line.trim().length() != 0) {
            StringTokenizer tokens = new StringTokenizer(line);
            while (tokens.hasMoreElements()) {
                String timeStamp = tokens.nextToken();
                int userId = Integer.parseInt(tokens.nextToken());
                String pageType = tokens.nextToken();

                List<String> values = map.get(userId);
                if (values == null) {
                    values = new ArrayList<String>();
                    map.put(userId, values);
                }
                values.add(pageType);
            }
        }
        /*
         * Create the sequences by user.
         */
        List<String> listSequences = generateSequencesByUser(map);

        /*
         * Count the frequency of each sequence.
         */
        Map<String, Integer> mapFrequency = countFrequencySequences(listSequences);

        /*
         * Sort the map by values.
         */
        Map<String, Integer> sortedMap = Solution.sortByValue(mapFrequency);

        /*
         * Print the Top 10 of sequences.
         */
        printTop10(sortedMap);
    }
    /*
     * Method to create sequences by user.
     */
    public static List<String> generateSequencesByUser(Map<Integer, List<String>> map) {
        List<String> list = new ArrayList<String>();
        for (Map.Entry<Integer, List<String>> entry : map.entrySet()) {
            int key = entry.getKey();
            for (int i = 2; i < entry.getValue().size(); i++) {
                String seq = entry.getValue().get(i - 2) + "->" + entry.getValue().get(i - 1) + "->" + entry.getValue().get(i);
                list.add(seq);
            }
        }
        return list;
    }

    /*
     * Method the frequency of each sequence and stored in a map.
     */
    public static Map<String, Integer> countFrequencySequences(List<String> listSequences) {
        Map<String, Integer> mapFrequency = new HashMap<String, Integer>();

        for (String temp : listSequences) {
            Integer counter = mapFrequency.get(temp);
            if (counter == null) {
                counter = 1;
                mapFrequency.put(temp, counter);
            } else {
                mapFrequency.put(temp, counter + 1);
            }
        }
        return mapFrequency;
    }

    /*
     * Method to print the top 10 of sequences.
     */
    public static void printTop10(Map<String, Integer> map) {
        int count = 0;
        for (Map.Entry<String, Integer> entry : map.entrySet()) {
            count++;
            if (count > 10) {
                break;
            } else {
                System.out.println(entry.getKey() + " = " + entry.getValue());
            }
        }
    }

    /*
     * Order the map by values.
     */
    public static Map<String, Integer> sortByValue(Map<String, Integer> map) {
        List list = new LinkedList(map.entrySet());
        Collections.sort(list, new Comparator() {
            public int compare(Object o1, Object o2) {
                return ((Comparable) ((Map.Entry) (o2)).getValue()).compareTo(((Map.Entry) (o1)).getValue());
            }
        });

        Map result = new LinkedHashMap();
        for (Iterator it = list.iterator(); it.hasNext();) {
            Map.Entry entry = (Map.Entry) it.next();
            result.put(entry.getKey(), entry.getValue());
        }
        return result;
    }


}

      

+3


source to share


1 answer


You can accomplish the task in O (N LogN) or better by dividing the problem into three simpler tasks:

  • Order a list by timestamp,
  • Performing counts for each 3-page sequence and
  • Selection of ten items by quantity.

The first task is standard sorting. Suppose it is O (N LogN) for now * .

The second task is easily accomplished with a pair of hash maps:

  • For each user, save a three-element list of their last three pages in the first hashmap. Each time a new user action is detected, move the pages in the list by one.
  • If the list from the above step has three entries, make a three-part key for them and increase the number in the second hash map.

Each step above is an O (1) operation for a log entry, so O (N) time for this task



The third task of selecting the top ten score records can be accomplished by extracting the counter pairs and sorting them by score. In the worst case, when all page clicks are unique, you will get 3N records to sort, so this task will again be O (N LogN) * .

Once you know the algorithm, the implementation should be straightforward, because Java provides all the building blocks for each task of the algorithm.

* You can reduce the time to O (N) by making two observations:

  • The first task uses ten-digit numbers for timestamps, so you can use a non-comparative linear time algorithm like Radix sort to achieve linear time and
  • The selection of the first ten elements can be achieved using the linear time selection algorithm <
  • ...

This implementation will require significantly more work because Java does not provide ready-made building blocks for it.

+2


source







All Articles