Timecomplexity of built-in SQL functions like sum, count, avg

Question

Timecomplexity of built-in SQL functions like sum, count, avg

What is the time complexity of a function like count, sum, avg or any of the built-in "math" functions in mysql, sql server, oracle and others?

You might think that calling sum (myColumn) would be Linear.

But score (1) is not. Why and what is the complexity of real time?

In a perfect world, I would like the sum, avg and to be considered O (1). But we don't live in one of them, do we?

+2

time-complexity sql oracle mysql sql-server

Filip Ekberg 07 oct. 09 at 20:49

source to share

4 answers

What is the time complexity of a function like count, sum, avg or any of the built-in "math" functions in mysql, sql server, oracle and others?

As MySQL

with MyISAM

, COUNT(*)

without GROUP BY

there O(1)

(constant)

It is stored in the table metadata.
On all systems MAX

and MIN

for indexed expressions without GROUP BY

O(log(n))

(logarithmic).

They are retrieved with one index lookup.
Aggregate functions O(n)

(linear), when used without GROUP BY

or GROUP BY

usesHASH
Aggregate functions O(n log(n))

when GROUP BY

using SORT

.

All values must be retrieved, calculated and stored in state variables (which can be stored in a hash table).

In addition, SORT

they must be sorted when used .

+3

Quassnoi 09 oct. '09 at 11:14

source to share

Note. This is an assumption based on my understanding of how SQL query planners work and may not be entirely accurate.

I believe that all the aggregate functions, or at least the "math" ones you named above, should be O (n). The request will be executed something like this:

Get rows matching join predicates and filter predicates (i.e. WHERE clause)
Creation of row groups according to the GROUP BY clause. One row group is created for queries without GROUP BY
For each group of rows, apply the aggregate function to the rows in the group. For things like SUM, AVG, MIN, MAX, as well as non-numeric functions like CONCAT, there are simple O (n) algorithms and I suspect they are used. Create one row in the output set for each row group created in step # 2
If the predicate HAVING is present, filter the output rows using this predicate

Note, however, that although the aggregated functions are O (n), the operation may not be. If you create a query that decartes joins the table to itself, you will look for a minimum of O (n * n) just to create the initial rowset (step # 1). Sorting to create groups of rows (step # 2) can be O (n lg n) and can require disk storage for a sort operation (as opposed to a RAM operation), so your query may still not perform well if you are manipulating many rows ...

+1

dcrosta 07 oct. 09:00

source to share

For large queries such as data warehouses, the underlying databases can parallelize the task, so multiple processors are running on it. Hence, there will be cut-off points where it is not quite linear, as the cost of coordinating parallel streams is traded with the benefit of using multiple processors.

0

Gary myers 09 oct. '09 at 8:38

source to share

Remus Rusanu · Accepted Answer · 2009-10-07T21:51:13+0000

In SQL, the aggregate complexity mathematical function is fully relevant. The only thing that really matters is the complexity of the data: which access path is chosen (table scan, index range scan, index lookup, etc.) and how many pages are read. There may be slight differences within each aggregate, but they all work pretty much the same (keep the current state and calculate rolling aggregation for each input value) and there is absolutely no aggregate that looks at the input twice, so they are all O (n) as internal implementation. where "n" is the number of records submitted to the aggregate (not necessarily the number of records in the table!).

Some units have internal labels, for example. COUNT (*) can return a count from metadata on some systems if possible.

Timecomplexity of built-in SQL functions like sum, count, avg

More articles: