Using SELECT ... GROUP BY ... HAVING in SQLite

Question

Using SELECT ... GROUP BY ... HAVING in SQLite

I'm working on Exercise 17 in Teach Yourself SQL GalaXQL (SQLite based). I have three tables:

Stars

that contains starid

;
Planets

that contains planetid

and starid

;
Moons

that contains moonid

and planetid

.

I want to return starid

associated with the largest number of planets and moons combined.

I have a query that will return plugins starid

, planetid

and total

+ moons.

How do I modify this query so that it returns the only one starid

that matches max(total)

, and not a table? This is my request:

select
    stars.starid as sid,
    planets.planetid as pid,
    (count(moons.moonid)+count(planets.planetid)) as total
from stars, planets, moons
where planets.planetid=moons.planetid and stars.starid=planets.starid
group by stars.starid

+3

sql sqlite

ant bait 19 Aug 14 at 23:14

source to share

2 answers

Perhaps something like this you need?

select
stars.starid as sid, 
(count(distinct moons.moonid)+count(distinct planets.planetid)) as total 
from stars
left join planets on stars.starid=planets.starid 
left join moons on planets.planetid=moons.planetid 
group by stars.starid
order by 2 desc
limit 1

Sample SQL script

+1

jpw 19 Aug 14 at 23:40

source to share

Air · Accepted Answer · 2014-08-20T18:48:22+0000

Let me visualize the system that can be represented by this database structure and see if we can translate your question into working SQL.

I drew a galaxy for you:

badly drawn planetary systems

To distinguish stars and planets from moons, I used capital roman numerals for values starid

and lowercase roman numerals for values moonid

. And since everyone knows that astronomers have nothing to do on these long nights at the observatory but drink, I put an inexplicable gap in the middle of your values planetid

. Spaces like these will occur when using so-called "surrogate" identifiers, since their meanings have no meaning; they are just unique identifiers for strings.

If you want to follow along, here's a galaxy naively loaded into the SQL Fiddle (if you get a popup about switching to WebSQL, click "cancel" to stick with SQL.js).

Let's see what you wanted again?

I want to return starid

, associated with the largest number of planets and moons combined

Tall. Paraphrased question: Which star is associated with the most orbiting bodies?

Star (I) has 1 planet with 3 moons;
Star (II) has 1 planet with 1 moon and 1 planet with 2 moons;
A star (III) has 1 planet with 1 moon and 2 planets without moons.

All we do is count the various objects associated with each star. The winner is a star (II) with a total of 5 circular bodies. So, the final result we expect from a working request is:

| starid |
|--------|
| 2      |

I deliberately attracted this amazing galaxy so that the "victorious" star does not have most of the planets and is not associated with the planet with the most moons. If these astronomers weren't on all three sheets of the wind, I could have gotten an extra moon from planet (1), so our victorious star isn't tied to most moons. It will be convenient for us in this demo if the star (II) only answers the question we are asking, and not any other questions with potentially similar queries, in order to reduce our chances of getting the right answer on the wrong query.

Lost translation

The first thing I want to do is introduce you to the explicit syntax JOIN

. This will be your closest friend. You will always be JOIN

your tables, no matter what the stupid tutorial says. Trust me with much more stupid advice (and maybe read Explicit vs Implicit SQL Join ).

The explicit syntax JOIN

shows how we want our tables to link to each other and reserve a clause WHERE

for the sole purpose of filtering rows from the result set. There are several different types , but where do we start with the plain old one INNER JOIN

. This is essentially your original query, and it means that all you want to see in your result set is data that overlaps in all three tables. Check out the skeleton of your original request:

SELECT ... FROM stars, planets, moons
WHERE planets.planetid = moons.planetid 
    AND planets.starid = stars.starid;

Given these conditions, what happens to an orphaned planet somewhere in space that is not associated with a star (i.e. starid

equal to NULL

)? Since the lost planet does not overlap with the table stars

, INNER JOIN

will not include it in the result set.

In SQL, any equality or inequality comparison with NULL

gives the result NULL

-even NULL = NULL

is wrong! Now your request has a problem because the other condition is planets.planetid = moons.planetid

. If there is a planet for which there is no corresponding moon that turns into planets.planetid = NULL

, and the planet will not appear in your query. This is not good! Lonely planets must be counted!

Limitations `OUTER`

Luckily for you JOIN

: OUTER JOIN

which ensures that at least one of the tables always appears in our result set. They come in flavors LEFT

and RIGHT

, to indicate which table receives special treatment, relative to the position of the keyword JOIN

. What does SQLite support have in common? confirms that keywords INNER

and OUTER

are optional, so we can use LEFT JOIN

, noting that:

stars

and planets

are linked by common starid

;
planets

and moons

are linked by common planetid

;
stars

and are moons

indirectly linked by the above two links;
we always want to count all planets and all moons.

SELECT
    *
FROM
    stars
        LEFT JOIN
    planets ON stars.starid = planets.starid
        LEFT JOIN
    moons ON planets.planetid = moons.planetid;

Note that instead of big bag tables and offerings WHERE

, you now have one offer ON

for each JOIN

. When you find yourself working with a large number of tables, it will be much easier to read; and since it is standard syntax, it is relatively portable between SQL databases.

Lost space

Our new query basically grabs everything in our database. But does this match everything in our galaxy? In fact, there is a redundancy because two of our identity field ( starid

and planetid

) there are a few tables. This is just one of many reasons to avoid the SELECT *

catch-all syntax in real-world use cases. We only really need three ID fields, and I'm going to do two more tricks while we're at it:

Aliases! You can give tables more convenient names using syntax table_name AS alias

. This can be very handy when you have to refer to many different columns in a multi-table query, and you don't want to type the fully qualified table names every time.
Grab starid

from spreadsheet planets

and fully open stars

from JOIN

! Having stars LEFT JOIN planets ON stars.starid = planets.starid

means that the field starid

will be the same, no matter which table we get it from - as long as the star has any planets. If we were counting stars, we need this table, but we are counting planets and moons; moons by definition of planets orbits, so a star without planets also has no satellites and can be ignored. (This is a guess, check your data to see if it's warranted! Your astronomers may be drunker than usual!)

SELECT
    p.starid,    -- This could be S.starid, if we kept using `stars`
    p.planetid,
    m.moonid
FROM
    planets AS p
        LEFT JOIN
    moons AS m ON p.planetid = m.planetid;

Result:

| starid | planetid | moonid |
|--------|----------|--------|
|      1 |        1 |      1 |
|      1 |        1 |      2 |
|      1 |        1 |      3 |
|      2 |        2 |      6 |
|      2 |        3 |      4 |
|      2 |        3 |      5 |
|      3 |        7 |        |
|      3 |        8 |      7 |
|      3 |        9 |        |

Mathematical!

/

Now our task is to decide which star is the winner, and for that we need to do a simple calculation. Let the moons count first; since they have no "children" and only one "parent", it is easy to aggregate them:

SELECT
    p.starid,
    p.planetid,
    COUNT(m.moonid) AS moon_count
FROM
    planets AS p
        LEFT JOIN
    moons AS m ON p.planetid = m.planetid
GROUP BY p.starid, p.planetid;

Result:

| starid | planetid | moon_count |
|--------|----------|------------|
|      1 |        1 |          3 |
|      2 |        2 |          1 |
|      2 |        3 |          2 |
|      3 |        7 |          0 |
|      3 |        8 |          1 |
|      3 |        9 |          0 |

(Note: Usually we like to use COUNT(*)

because it just prints and reads, but that might get us in trouble! Since our two lines are meaningful NULL

for moonid

, we should use COUNT(moonid)

to avoid counting moons that don't exist .)

So far, so well, I see six planets, we know which star each belongs to, and the correct number of moons is shown for each planet. The next step is counting the planets. You might think it requires a subquery to also add a column moon_count

for each planet, but it's actually easier than that; if we are a GROUP BY

star, ours will moon_count

switch from counting "moon per planet, per star" to "moons per star", which is just fine:

SELECT
    p.starid,
    COUNT(p.planetid) AS planet_count,
    COUNT(m.moonid) AS moon_count
FROM
    planets AS p
        LEFT JOIN
    moons AS m ON p.planetid = m.planetid
GROUP BY p.starid;

Result:

| starid | planet_count | moon_count |
|--------|--------------|------------|
|      1 |            3 |          3 |
|      2 |            3 |          3 |
|      3 |            3 |          1 |

Now we are faced with a problem. moon_count

correct, but you should see right away what planet_count

is wrong. Why is this? Look back at the result of the non-group query and notice that there are nine rows with three rows for each starid

, and each row has a non-zero value for planetid

. This is what we asked the database to count with this query when we really wanted to ask how many different planets are there? Planet (1) appears three times with a star (I), but it is the same planet every time. The fix is to insert the keyword DISTINCT

inside the function call COUNT()

. At the same time, we can add two columns together:

SELECT
    p.starid,
    COUNT(DISTINCT p.planetid)+ COUNT(m.moonid) AS total_bodies
FROM
    planets AS p
        LEFT JOIN
    moons AS m ON p.planetid = m.planetid
GROUP BY p.starid;

Result:

| starid | total_bodies |
|--------|--------------|
|      1 |            4 |
|      2 |            5 |
|      3 |            4 |

And the winner ...

By counting the orbital bodies around each star in the drawing, we can see that the column is total_bodies

correct. But you didn't ask for all this information; you just want to know who won. Well, there are many ways to get there, and depending on the size and composition of your galaxy (database), some may be more efficient than others. One approach is to express ORDER BY

total_bodies

, so that the "winner" appears at the top LIMIT 1

so we don't see the losers and only select the column starid

( see it in the SQL Fiddle ).

The problem with this approach is that it hides connections. What if we gave the lost stars in our galaxy every additional planet or moon? We now have a triple tie - every winner! But who appears first when we ORDER BY

mean always the same? In the SQL standard, it is undefined; there is no one to come out at the top. You can run the same query twice on the same data and get two different results!

For this reason, you may prefer which stars have the most orbiting bodies, rather than indicating in your question that you know there is only one meaning. This is a more typical set-based approach, and it would be a good idea to get used to the basics thinking when working with relational databases. Until you run your query, you don't know the size of the result set; if you assume that there is no draw in the first place, you must somehow justify that assumption. (Since astronomers regularly find new moons and planets, I would have a hard time justifying this!)

The way I prefer to write this query is something called Common Table Expression (CTE). They are supported in recent versions of SQLite and many other databases , but probably not in the SQLite engine used by GalaXQL as it did not have CTEs added to SQLite earlier this year. CTEs allow a subquery to be referenced multiple times using an alias, instead of having to write it out in full each time. A CTE solution might look like this:

WITH body_counts AS
    (SELECT
        p.starid,
        COUNT(DISTINCT p.planetid) + COUNT(m.moonid) AS total_bodies
    FROM
        planets AS p
            LEFT JOIN
        moons AS m ON p.planetid = m.planetid
    GROUP BY p.starid)
SELECT
    starid
FROM
    body_counts
WHERE
    total_bodies = (SELECT MAX(total_bodies) FROM body_counts);

Result:

| STARID |
|--------|
|      2 |

You can test this query in action on the SQLFiddle, but you need to change the database type (dropdown on the top row) to something other than SQLite, but the SQLFiddle implementation doesn't seem to support CTEs yet. Here's a link to a query running in SQL Server 2008 , to confirm that this query can display more than one row in case of a link, try changing MAX()

the last row to MIN()

.

Only for you

Doing this without a CTE is ugly, but it can be done if the table size is manageable. Looking at the request above, our CTE will be an alias like body_counts

, and we refer to it twice - in a sentence FROM

and in a sentence WHERE

. We can replace both of these references with the statement we used to define body_counts

(deleting the id column once in the second subquery where it was not used):

SELECT
    starid
FROM
    (SELECT
        p.starid,
        COUNT(DISTINCT p.planetid) + COUNT(m.moonid) AS total_bodies
    FROM
        planets AS p
            LEFT JOIN
        moons AS m ON p.planetid = m.planetid
    GROUP BY p.starid)
WHERE
    total_bodies = (SELECT MAX(total_bodies) FROM 
        (SELECT
            COUNT(DISTINCT p.planetid)+ COUNT(m.moonid) AS total_bodies
        FROM
            planets AS p
                LEFT JOIN
            moons AS m ON p.planetid = m.planetid
        GROUP BY p.starid)
    );

This is a friendly approach that should work for you in GalaXQL. Check out how it works here on SQLFiddle .

Now that you've seen both, isn't it easier to understand the CTE version? MySQL, which also does not support CTEs, additionally requests aliases for our subqueries. Fortunately, SQLite doesn't, because in this case it's just an extra argument to add to an already overly complex query.

Ok, that was fun - sorry, what did you ask?;)

(PS, if you're wondering what's going on with planet nine: giant space potato chips have very eccentric orbits.)

Using SELECT ... GROUP BY ... HAVING in SQLite

I drew a galaxy for you:

Let's see what you wanted again?

Lost translation

Limitations OUTER (adsbygoogle = window.adsbygoogle || []).push({});

Lost space

Mathematical!

/

And the winner ...

Only for you

More articles:

Limitations `OUTER`