Improving the time complexity of DFS using recursion so that each node only works on its descendants
Problem
There is a perfectly balanced m-ary tree that is n levels deep. Each inner node has exactly m child nodes. The root is said to be at depth 0, and the leaf nodes are said to be at level n, so there are exactly n ancestors of each leaf node. Therefore, the total number of nodes in the tree is:
T = 1 + m^2 + ... + m^n = (m^(n+1) - 1) / (m - 1)
Here's an example with m = 3 and n = 2.
a (depth 0)
_________|________
| | |
b c d (depth 1)
___|___ ___|___ ___|___
| | | | | | | | |
e f g h i j k l m (depth 2)
I am writing a first depth-first search function to traverse the entire tree at the deepest node first and leftmost node in the first order and insert the value of each node into the output list.
I wrote this function in two different ways and I want to compare the time complexity of both functions.
Although this question is language agnostic, I am using the Python code below to show my features, because the Python code looks almost like pseudocode.
Decision
First function dfs1
. It takes a root node argument as node
well as an empty output list as an argument output
. The function descends into the tree recursively, visits each node, and adds the node's value to the output list.
def dfs1(node, output):
"""Visit each node (DFS) and place its value in output list."""
output.append(node.value)
for child_node in node.children:
dfs1(child_node, output)
Second function dfs2
. It takes a root node as an node
argument, but does not take a list argument. The function descends into the tree recursively. At each level of recursion, when visiting each node, it creates a list with the value of the current node and all its descendants and returns that list to the caller.
def dfs2(node):
"""Visit nodes (DFS) and return list of values of visited nodes."""
output = [node.value]
for child_node in node.children:
for s in dfs2(child_node):
output.append(s)
return output
Analysis
There are two variables that determine the size of the problem.
- m is the number of child nodes, each of which has a child node.
- n is the number of ancestors, each of which has a leaf node (tree height).
At dfs1
time O (1) is carried out during a visit to each node, so the total time required for visiting all the nodes,
O(1 + m + m^2 + ... + m^n).
I am not worried about simplifying this expression further.
The dfs2
time taken to visit each node is directly proportional to all leaf nodes accessible from that node. In other words, the time spent visiting a node at depth d is O (m ^ (n - d)). Thus, the total time taken to visit all nodes is
1 * O(m^n) + m * O(m^(n - 1)) + m^2 * O(m^(n - 2)) + ... + m^n * O(1)
= (n + 1) * O(m^n)
Question
Is it possible to write dfs2
in such a way that its time complexity
O(1 + m + m^2 + ... + m^n)
without changing the essence of the algorithm, i.e. each node only creates an output list for itself and all its descendants, and shouldn't worry about the list that its ancestors might have?
Complete working code for reference
Here is the complete Python code that demonstrates the above features.
class Node:
def __init__(self, value):
"""Initialize current node with a value."""
self.value = value
self.children = []
def add(self, node):
"""Add a new node as a child to current node."""
self.children.append(node)
def make_tree():
"""Create a perfectly balanced m-ary tree with depth n.
(m = 3 and n = 2)
1 (depth 0)
_________|________
| | |
2 3 4 (depth 1)
___|___ ___|___ ___|___
| | | | | | | | |
5 6 7 8 9 10 11 12 13 (depth 2)
"""
# Create the nodes
a = Node( 1);
b = Node( 2); c = Node( 3); d = Node( 4)
e = Node( 5); f = Node( 6); g = Node( 7);
h = Node( 8); i = Node( 9); j = Node(10);
k = Node(11); l = Node(12); m = Node(13)
# Create the tree out of the nodes
a.add(b); a.add(c); a.add(d)
b.add(e); b.add(f); b.add(g)
c.add(h); c.add(i); c.add(j)
d.add(k); d.add(l); d.add(m)
# Return the root node
return a
def dfs1(node, output):
"""Visit each node (DFS) and place its value in output list."""
output.append(node.value)
for child_node in node.children:
dfs1(child_node, output)
def dfs2(node):
"""Visit nodes (DFS) and return list of values of visited nodes."""
output = [node.value]
for child_node in node.children:
for s in dfs2(child_node):
output.append(s)
return output
a = make_tree()
output = []
dfs1(a, output)
print(output)
output = dfs2(a)
print(output)
Both functions dfs1
and dfs2
give the same result.
['a', 'b', 'e', 'f', 'g', 'c', 'h', 'i', 'j', 'd', 'k', 'l', 'm'] ['a', 'b', 'e', 'f', 'g', 'c', 'h', 'i', 'j', 'd', 'k', 'l', 'm']
source to share
If dfs1 is passed by reference in the output list, then the complexity of ds1 is O (shared nodes).
Whereas, in the output list, dfs2 is returned and appended to the parent's output list, thus taking O (list size) for each return. Hence the overall complexity increases. You can avoid this overhead if both your adding and returning the output list take constant time.
This can be done if your result list is a "double terminated linked list". Hence, you can return a reference to the output list, and instead of adding, you can concatenate two related linked lists (which is O (1)).
source to share