Groupwise MAX () in subquery

I am trying to calculate the returned maximum value in a table as well as other values ​​in that table. However, the table I am doing this in is not a "real" table, it is generated by a subquery. This is giving me problems since I don't think I can join it twice without wrapping the whole subquery.

I currently have a SQL Server solution using ROW_NUMBER() OVER (PARTITION BY providerId ORDER BY partnershipSetScore DESC) rnk

, but I am looking for an agnostic version of the DBMS if possible, since the unit tests for the project are done in Sqlite DB, which does not have this feature.

Here is the schema and my specific SQL Server query if helpful:

Course:

  • int id
  • varchar name
  • int schoolId

School:

  • int id
  • varchar name

Partnership:

  • int id
  • varcharName partnership

SchoolPartnership:

  • int id
  • int schoolId
  • int partnersId

Here's the request:

SELECT
    schoolId,
    partnershipId AS bestPartnershipSetId,
    partnershipScore AS bestPartnershipScore
FROM
(
    SELECT
        pp.schoolId,
        partnershipScores.partnershipId,
        partnershipScores.partnershipScore,
        ROW_NUMBER() OVER (PARTITION BY schoolId ORDER BY partnershipScore DESC) rnk
    FROM schoolPartnership pp
    INNER JOIN (
        SELECT
            pp.partnershipId,
            (
                (CASE WHEN SUM(CASE WHEN c.name LIKE '%French%' THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END)
                + (CASE WHEN SUM(CASE WHEN c.name LIKE '%History%' THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END)
            ) AS partnershipScore
        FROM schoolPartnership pp
        INNER JOIN course c ON c.schoolId = pp.schoolId
        GROUP BY partnershipId
    ) AS partnershipScores ON partnershipScores.partnershipId = pp.partnershipId
) AS schoolPartnershipScores
WHERE rnk = 1

      

If you need more information on what I am trying to achieve, see Custom Sorting Algorithm for Large Amounts of Data : This query will be a subquery of a larger query that sorts schools by the most suitable partnership.

+3


source to share


4 answers


I was unable to find a solution (other than duplicating a subquery, which I was trying to avoid), so I just defined the MAX lines for each partnership in PHP and threw out any other lines. Not a perfect solution, but since I needed a cross-platform approach, there weren't many other options available to me.



0


source


Perhaps when you talk about joining a subquery twice, you had this technique in your mind:

SELECT a.*
FROM atable a
INNER JOIN (
  SELECT
    col1,
    MAX(col2) AS max_col2
  FROM atable
  GROUP BY col1
) m
ON a.col1 = m.col1 AND a.col2 = m.max_col2
;

      

And it would be fine to use as a DBMS agnostic method (at least one that works in both SQL Server and SQLite) to get the job done if it was about one table.

Instead, you have a subquery. However, I see no other way to accomplish what you are asking for. Therefore, in this situation, I see two options for you (one of which may not be applicable in your particular case, but still an option in general):

  • Do what you are trying to avoid, i.e. duplicate the subquery specifically to find the aggregated values ​​for each group, then join it to the same subquery as described above.

  • Save the subquery results temporarily, then apply the above technique to the temporary result set.

The first option is not very attractive, especially since there is a hope that the second might work.



One problem with the second option is that temporary datasets are implemented differently in SQL Server and SQLite. SQLite uses the operator for this CREATE TEMPORARY TABLE

. SQL Server does not support the keyword TEMPORARY

in the context of a statement CREATE TABLE

and instead uses the special character ( #

) at the beginning of a table name to indicate that the table is actually temporary.

So, the only workaround I see is to use a regular table as temporary storage. You can create it once, and then delete its content each time you run your query, just before inserting the temporary result set:

DELETE FROM TempTable;
INSERT INTO TempTable (
  schoolId,
  bestPartnershipSetId,
  bestPartnershipScore
)
SELECT
  pp.schoolId,
  partnershipScores.partnershipId,
  partnershipScores.partnershipScore,
FROM
  ...
;
SELECT ...
FROM TempTable
...
;

      

Or, you can create and discard it every time you run a request:

CREATE TABLE TempTable (
  ...
);
INSERT INTO TempTable (...)
SELECT ...
FROM ...
;
SELECT ...
FROM TempTable
...
;
DROP TABLE TempTable;

      

Note that using a regular table as temporary storage like this is not concurrency friendly in SQL Server. If that can create a problem, you may have to drop this option and end up with the first one. (But this is probably the cost you have to pay when you want a platform independent solution, especially when platforms are different from SQL Server and SQLite.)

+1


source


This is the structure you want:

with t as (<subquery goes here>)
select t.*,
       max(col) over () as MaxVal
from t

      

It's a bit tricky to figure out how it fits into your query because I can't tell what the underlying subquery is.

As for joining a subquery more than once, you can do so using what SQL Server calls "common table expressions" - the clause with

above. Most other sane databases support this (MySQL and MS Access start out with two notable exceptions).

0


source


The most aggregated SQL way would be to use "NON EXISTS":

SELECT * FROM schoolPartnership t1
WHERE NOT EXISTS 
       (SELECT * FROM schoolPartnership t2 
        WHERE t1.schoolId = t2.schoolId 
              AND t1.partnershipScore < t2.partnershipScore)

      

This will give you lines from schoolPartnership with max partnerScore for each school.

0


source







All Articles