How do I use a LEFT OUTER JOIN to identify a supplier's missing products?
This request was interesting, but I came to a place where I need help.
I have multiple tables and the final question is:
- How many parts are "missing", suppliers?
and / or
- How many common parts are βmissingβ, suppliers and categories?
None: not used by the supplier (see request 1).
Please note that the parts are not specific to the product or vendor because both can change with the seasons and often the parts inspire what the product will actually be.
Basically, what part that every supplier should be aware of is the question we are trying to answer at a high level to determine which suppliers have the most missing parts that categories are missing those parts?
Now I have my first request where I need to work perfectly. What it does is tell me the missing parts by category when I list a specific supplier.
Here is a SQLfiddle to create a script for the database and a working query:
Request 1:
http://sqlfiddle.com/#!9/088e7/1
And the request:
SELECT
c.name AS category,
COUNT(pt.id) AS parts,
COUNT(CASE WHEN in_stock IS NULL THEN pt.id END) AS missing_parts
FROM
season AS s
LEFT OUTER JOIN
(
SELECT
s.id AS season_id,
s.type season_type,
max(i.in_stock) AS in_stock
FROM
inventory AS i
JOIN season AS s ON i.season_id = s.id
JOIN product AS p ON i.product_id = p.id
JOIN vendor AS v ON p.vendor_id = v.id
JOIN part AS pt ON s.part_id = pt.id
WHERE
v.id = 2
AND
s.type = 'Type A'
GROUP BY
1,2) AS seas ON seas.season_id = s.id AND seas.season_type = s.type
JOIN part AS pt ON pt.id = s.part_id
JOIN part_data AS pd ON pt.id = pd.part_id
JOIN category AS c ON pt.category_id = c.id
WHERE
s.type = 'Type A'
GROUP BY
1;
The above works like a charm and here are the results:
| name | parts | missing_parts |
|-----------|-------|---------------|
| category3 | 3 | 2 |
| category4 | 2 | 0 |
| category5 | 2 | 2 |
| category6 | 3 | 3 |
My problem is that I am trying to perform a similar query using a provider instead of a category while removing the provider filter. In the following SQL script, you can see that since the parts are actually missing, they certainly cannot be attributed to the vendor when queried like I did.
http://sqlfiddle.com/#!9/088e7/2
And their request is 2:
SELECT
seas.vendor AS vendor,
COUNT(pt.id) AS parts,
COUNT(CASE WHEN in_stock IS NULL THEN pt.id END) AS missing_parts
FROM
season AS s
LEFT OUTER JOIN
(SELECT
s.id AS season_id,
v.name AS vendor,
s.type season_type,
max(i.in_stock) AS in_stock
FROM
inventory AS i
JOIN season AS s ON i.season_id = s.id
JOIN product AS p ON i.product_id = p.id
JOIN vendor AS v ON p.vendor_id = v.id
JOIN part AS pt ON s.part_id = pt.id
WHERE
s.type = 'Type A'
GROUP BY
1,2 ) AS seas ON seas.season_id = s.id AND seas.season_type = s.type
JOIN part AS pt ON pt.id = s.part_id
JOIN part_data AS pd ON pt.id = pd.part_id
JOIN category AS c ON pt.category_id = c.id
AND
s.type = 'Type A'
GROUP BY
1;
Query 2 results:
| vendor | parts | missing_parts |
|----------|-------|---------------|
| (null) | 4 | 4 |
| Vendor 1 | 2 | 0 |
| Vendor 2 | 3 | 0 |
| Vendor 3 | 2 | 0 |
| Vendor 4 | 2 | 0 |
| Vendor 5 | 2 | 0 |
Note the zero value which makes sense as these are the "missing" parts I'm looking for that cannot be attributed to the supplier.
What am I wondering, anyway, if a counter of missing parts has been added to the extra column?
The column of missing parts in the desired output is difficult to get because again and in this question I don't know ... even with this tiny amount of data. Please note that the missing parts have no suppliers, but here is my best shot.
| vendor | parts | missing_parts |
|----------|-------|---------------|
| Vendor 1 | 2 | 1 |
| Vendor 2 | 3 | 1 |
| Vendor 3 | 2 | 3 |
| Vendor 4 | 2 | 0 |
| Vendor 5 | 2 | 2 |
In a perfect world, I could add a category as well:
| category | vendor | parts | missing_parts |
|------------|----------|-------|---------------|
| category 1 | Vendor 1 | 2 | 1 |
| category 1 | Vendor 2 | 3 | 1 |
| category 1 | Vendor 3 | 2 | 3 |
| category 1 | Vendor 4 | 2 | 0 |
| category 1 | Vendor 5 | 2 | 2 |
| category 2 | Vendor 1 | 1 | 1 |
| category 2 | Vendor 2 | 1 | 1 |
| category 2 | Vendor 3 | 0 | 3 |
| category 2 | Vendor 4 | 2 | 0 |
| category 2 | Vendor 5 | 0 | 2 |
source to share
IF I understand what you are looking for, I will first start with what you are ultimately looking for ..
List of individual parts and categories. THEN you are looking for who is missing what. To do this, it's basically Cartesian of each vendor versus this "master list of parts / categories" and who / doesn't have it.
SELECT DISTINCT
pt.id,
pt.category_id
from
part pt
Now let's look at the second part. What are all the possible parts and categories that a particular VENDOR has.
SELECT DISTINCT
pt.id,
pt.category_id,
p.vendor_id
FROM
season s
JOIN inventory i
ON s.id = i.season_id
JOIN product p
ON i.product_id = p.id
JOIN part pt
ON s.part_id = pt.id
In the tables above, I don't need categories or actual salesperson tables as I only cared about the qualifying IDs of who has what. First, all the possible part IDs and a category ID, but in the second, we also get the VENDOR ID that has it.
Now link the pieces together starting with the provider connected to the category without any "ON" clause. The join is needed to "v.id" as the lower join in the syntax, this will give me the Cartesian of each vendor applied / tested for each category. Then the category table was joined to all the individual parts and finally LEFT-JOINED to query the individual parts PER VENDOR
Finally, add your aggregates and group. Because of the left join, if VndParts.ID is there, then the entry exists, thus the number of Vendor Parts FOUND nodes increases. If the vendor part id is NULL then it is missing (hence my sum / when case) to count the missing parts.
SELECT
v.name Vendor,
c.name category,
count( PQParts.ID ) TotalAvailableParts,
count( VndParts.ID ) VendorParts,
sum( case when VndParts.ID IS NULL then 1 else 0 end ) MissingParts
from
vendor v JOIN
category c
JOIN
( SELECT DISTINCT
pt.id,
pt.category_id
from
part pt ) PQParts
ON c.id = PQParts.category_id
LEFT JOIN
( SELECT DISTINCT
pt.id,
pt.category_id,
p.vendor_id
FROM
season s
JOIN inventory i
ON s.id = i.season_id
JOIN product p
ON i.product_id = p.id
JOIN part pt
ON s.part_id = pt.id ) VndParts
ON v.id = VndParts.vendor_id
AND PQParts.ID = VndParts.ID
AND PQParts.Category_ID = VndParts.Category_ID
group by
v.name,
c.name
Applies against your database SQL-Fiddle sample
Now, even if you created sample data of categories 1-6, all your PARTS are only defined by categories 3-6, as in my sample data. I cannot get data that does not exist in the example request
SELECT
*
from
category c
JOIN
( SELECT DISTINCT
pt.id,
pt.category_id
from
part pt ) PQParts
ON c.id = PQParts.category_id
If such actual DID data exists, then those missing parts of other categories will also be displayed.
Now for a final note. You also searched for a specific SEASON. I would just add a WHERE clause to place this in the VndParts query. Then change your PQParts query to include seasonal join, for example
SELECT DISTINCT
pt.id,
pt.category_id
from
part pt
Now let's look at the second part. What are all the possible parts and categories that a particular VENDOR has.
SELECT DISTINCT
pt.id,
pt.category_id
FROM
season s
JOIN part pt
ON s.part_id = pt.id
WHERE
s.type = 'Type A'
To further restrict a specific vendor, add a vendor clause, this is easy enough because it is the basis for vendor "v" by external criteria, and the vendor references a second LEFT-JOIN that also has an available vendor alias for filtering.
source to share
From your description, it looks like you want to calculate how many parts in each category each supplier could list as a product, but does not have. It's basically the difference between the number of parts that can be listed for each category and how many are actually listed. So you can count the possible and left join with counting the actual.
Based on the sqlfiddle, the code below also assumes that you want to focus on one type of season and that only the parts (with sales?) Listed in partdata are relevant.
select c.name as category
, v.name as vendor
, cpartcount.parts
, cpartcount.parts-coalesce(cvpartcount.parts,0) as missingparts
from vendor v
cross join
(
select pt.category_id, count(pt.id) as parts
from part pt
where pt.id in
(
select s.part_id
from season s
where s.type='Type A'
)
and pt.id in
(
select pd.part_id
from part_data pd
)
group by pt.category_id
) cpartcount
join category c
on cpartcount.category_id=c.id
left join
(
select pt.category_id, v.id as vendor_id, count(pt.id) as parts
from part pt,vendor v
where (v.id,pt.id) IN
(
select p.vendor_id, s.part_id
from product p
join inventory i
on p.id=i.product_id
join season s
on i.season_id = s.id
join part_data pd
on s.part_id=pd.part_id
where s.type='Type A'
)
group by pt.category_id,v.id
) as cvpartcount
on cpartcount.category_id=cvpartcount.category_id
and v.id=cvpartcount.vendor_id
source to share
The problem is that the 2nd query has GROUP BY
in the field from subquery ( vendor
) which is appended to LEFT JOIN
, so it will create an output row for each of the suppliers (including NULL
for rows from the season that do not match the subquery).
More specifically - yours is count
included
COUNT(CASE WHEN in_stock IS NULL THEN pt.id END) AS missing_parts
(I'd rather write SUM(in_stock IS NULL)
)
but since in_stock
- the result of the aggregation for each vendor
- you will never have a value NULL
. ( check sub-query results )
I think you should clarify the purpose of your queries. For example, the first one is returned -
For each category, the number of parts that it has in a given season and the number of seasons that this category is not available (not the number of parts missing, since there is no category connection with a subquery).
source to share