SAS SQL Hierarchical Query
I am trying to write a sql query (proc sql) to get data from an example table:
order_id base_order_id customer_id ================================================== ======== 1 null 1 // only one transaction -------------------------------------------------- ----------------------------- 2 null 1 // order_start 3 2 1 4 3 1 5 4 1 6 5 1 7 6 1 // order_end -------------------------------------------------- -----------------------------
in the following way:
order_id last_order_id customer_id 1 null 1 2 7 1
Let me put it this way. Order_id 2 has 6 subprocesses. We can assume that the total order for this one customer consists of order_id 2 to 7. Order start = order_id 2 when the entire order ends at order_id 7.
I am getting started in sas sql. I tried to join the same table via a left join, "having", but nothing worked. Is there a way to get the query result like in table 2?
Thanks in advance.
EDIT2. The SQL I wrote that brings the closest result.
SELECT t1.order_id, t1.base_order_id as last_order_id, t1.customer_id
FROM table1 t1
GROUP BY t1.order_id
HAVING (t1.order_id = max(t1.base_order_id)
or t1.base_order_id IS NULL)
source to share
So, as Gordon says in the comments, I cannot think of how to do this in PROC SQL.
However, the "SAS" way to do this would be a related component analysis. PROC OPTNET in SAS / OR does just that.
data have;
input order_id base_order_id customer_id;
datalines;
1 . 1
2 . 1
3 2 1
4 3 1
5 4 1
6 5 1
7 6 1
;
run;
/*Connect the first order to itself*/
data have;
set have;
if base_order_id = . then base_order_id = order_id;
run;
/*Use SAS/OR and connected components*/
proc optnet
data_links = have(rename=(order_id = to base_order_id = from))
out_nodes = out;
concomp;
run;
/*Summarize and add customer id*/
proc sql noprint;
create table want as
select a.order_id,
a.last_order_id,
b.customer_id
from (
select min(node) as order_id,
max(node) as last_order_id
from out
group by concomp
) as a
left join
test as b
on a.order_id = b.order_id;
quit;
This returns what you are looking for in the WANT dataset.
source to share
The only way I know of requires you to add 2 new columns to the data you are querying. A good explanation for this can be found here:
http://www.sitepoint.com/hierarchical-data-database-2/
I don't have time to rewrite this and pipe the whole thing into an SO answer. I will modify this answer later with some code that will add 2 new columns to your example dataset. IMO, this is the hardest part.
Some nice things about this approach:
- It is not recursive - it allows any SQL query to traverse the entire hierarchy in one pass.
- It supports indexing, so if you have a lot of data your queries will run faster.
- Quite easy to understand / query. Simple queries can return powerful results. I have used this approach in a complex supply chain to identify future bottlenecks with a simple simple SQL query.
source to share