Groupwise MAX () in subquery
I am trying to calculate the returned maximum value in a table as well as other values in that table. However, the table I am doing this in is not a "real" table, it is generated by a subquery. This is giving me problems since I don't think I can join it twice without wrapping the whole subquery.
I currently have a SQL Server solution using ROW_NUMBER() OVER (PARTITION BY providerId ORDER BY partnershipSetScore DESC) rnk
, but I am looking for an agnostic version of the DBMS if possible, since the unit tests for the project are done in Sqlite DB, which does not have this feature.
Here is the schema and my specific SQL Server query if helpful:
Course:
- int id
- varchar name
- int schoolId
School:
- int id
- varchar name
Partnership:
- int id
- varcharName partnership
SchoolPartnership:
- int id
- int schoolId
- int partnersId
Here's the request:
SELECT
schoolId,
partnershipId AS bestPartnershipSetId,
partnershipScore AS bestPartnershipScore
FROM
(
SELECT
pp.schoolId,
partnershipScores.partnershipId,
partnershipScores.partnershipScore,
ROW_NUMBER() OVER (PARTITION BY schoolId ORDER BY partnershipScore DESC) rnk
FROM schoolPartnership pp
INNER JOIN (
SELECT
pp.partnershipId,
(
(CASE WHEN SUM(CASE WHEN c.name LIKE '%French%' THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END)
+ (CASE WHEN SUM(CASE WHEN c.name LIKE '%History%' THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END)
) AS partnershipScore
FROM schoolPartnership pp
INNER JOIN course c ON c.schoolId = pp.schoolId
GROUP BY partnershipId
) AS partnershipScores ON partnershipScores.partnershipId = pp.partnershipId
) AS schoolPartnershipScores
WHERE rnk = 1
If you need more information on what I am trying to achieve, see Custom Sorting Algorithm for Large Amounts of Data : This query will be a subquery of a larger query that sorts schools by the most suitable partnership.
source to share
I was unable to find a solution (other than duplicating a subquery, which I was trying to avoid), so I just defined the MAX lines for each partnership in PHP and threw out any other lines. Not a perfect solution, but since I needed a cross-platform approach, there weren't many other options available to me.
source to share
Perhaps when you talk about joining a subquery twice, you had this technique in your mind:
SELECT a.*
FROM atable a
INNER JOIN (
SELECT
col1,
MAX(col2) AS max_col2
FROM atable
GROUP BY col1
) m
ON a.col1 = m.col1 AND a.col2 = m.max_col2
;
And it would be fine to use as a DBMS agnostic method (at least one that works in both SQL Server and SQLite) to get the job done if it was about one table.
Instead, you have a subquery. However, I see no other way to accomplish what you are asking for. Therefore, in this situation, I see two options for you (one of which may not be applicable in your particular case, but still an option in general):
-
Do what you are trying to avoid, i.e. duplicate the subquery specifically to find the aggregated values for each group, then join it to the same subquery as described above.
-
Save the subquery results temporarily, then apply the above technique to the temporary result set.
The first option is not very attractive, especially since there is a hope that the second might work.
One problem with the second option is that temporary datasets are implemented differently in SQL Server and SQLite. SQLite uses the operator for this CREATE TEMPORARY TABLE
. SQL Server does not support the keyword TEMPORARY
in the context of a statement CREATE TABLE
and instead uses the special character ( #
) at the beginning of a table name to indicate that the table is actually temporary.
So, the only workaround I see is to use a regular table as temporary storage. You can create it once, and then delete its content each time you run your query, just before inserting the temporary result set:
DELETE FROM TempTable;
INSERT INTO TempTable (
schoolId,
bestPartnershipSetId,
bestPartnershipScore
)
SELECT
pp.schoolId,
partnershipScores.partnershipId,
partnershipScores.partnershipScore,
FROM
...
;
SELECT ...
FROM TempTable
...
;
Or, you can create and discard it every time you run a request:
CREATE TABLE TempTable (
...
);
INSERT INTO TempTable (...)
SELECT ...
FROM ...
;
SELECT ...
FROM TempTable
...
;
DROP TABLE TempTable;
Note that using a regular table as temporary storage like this is not concurrency friendly in SQL Server. If that can create a problem, you may have to drop this option and end up with the first one. (But this is probably the cost you have to pay when you want a platform independent solution, especially when platforms are different from SQL Server and SQLite.)
source to share
This is the structure you want:
with t as (<subquery goes here>)
select t.*,
max(col) over () as MaxVal
from t
It's a bit tricky to figure out how it fits into your query because I can't tell what the underlying subquery is.
As for joining a subquery more than once, you can do so using what SQL Server calls "common table expressions" - the clause with
above. Most other sane databases support this (MySQL and MS Access start out with two notable exceptions).
source to share
The most aggregated SQL way would be to use "NON EXISTS":
SELECT * FROM schoolPartnership t1
WHERE NOT EXISTS
(SELECT * FROM schoolPartnership t2
WHERE t1.schoolId = t2.schoolId
AND t1.partnershipScore < t2.partnershipScore)
This will give you lines from schoolPartnership with max partnerScore for each school.
source to share