SAS SQL Hierarchical Query

I am trying to write a sql query (proc sql) to get data from an example table:

order_id base_order_id customer_id
================================================== ========
1 null 1 // only one transaction
-------------------------------------------------- -----------------------------
2 null 1 // order_start
3 2 1
4 3 1             
5 4 1
6 5 1
7 6 1 // order_end
-------------------------------------------------- -----------------------------

in the following way:

order_id last_order_id customer_id 
1 null 1                   
2 7 1                       

Let me put it this way. Order_id 2 has 6 subprocesses. We can assume that the total order for this one customer consists of order_id 2 to 7. Order start = order_id 2 when the entire order ends at order_id 7.

I am getting started in sas sql. I tried to join the same table via a left join, "having", but nothing worked. Is there a way to get the query result like in table 2?

Thanks in advance.

EDIT2. The SQL I wrote that brings the closest result.

SELECT t1.order_id, t1.base_order_id as last_order_id, t1.customer_id

FROM table1 t1

GROUP BY t1.order_id
     HAVING (t1.order_id = max(t1.base_order_id) 
     or t1.base_order_id IS NULL)

      

+3


source to share


2 answers


So, as Gordon says in the comments, I cannot think of how to do this in PROC SQL.

However, the "SAS" way to do this would be a related component analysis. PROC OPTNET in SAS / OR does just that.



data have;
input order_id base_order_id customer_id;
datalines;
1 . 1
2 . 1
3 2 1
4 3 1
5 4 1
6 5 1
7 6 1
;
run;

/*Connect the first order to itself*/
data have;
set have;
if base_order_id = . then base_order_id = order_id;
run;

/*Use SAS/OR and connected components*/
proc optnet
    data_links = have(rename=(order_id = to base_order_id = from))
    out_nodes = out;
    concomp;
run;

/*Summarize and add customer id*/
proc sql noprint;
create table want as 
select a.order_id,
       a.last_order_id,
       b.customer_id
    from (
        select min(node) as order_id,
               max(node) as last_order_id
            from out
            group by concomp
    ) as a
      left join
    test as b
      on a.order_id = b.order_id;
quit;

      

This returns what you are looking for in the WANT dataset.

+2


source


The only way I know of requires you to add 2 new columns to the data you are querying. A good explanation for this can be found here:

http://www.sitepoint.com/hierarchical-data-database-2/

I don't have time to rewrite this and pipe the whole thing into an SO answer. I will modify this answer later with some code that will add 2 new columns to your example dataset. IMO, this is the hardest part.



Some nice things about this approach:

  • It is not recursive - it allows any SQL query to traverse the entire hierarchy in one pass.
  • It supports indexing, so if you have a lot of data your queries will run faster.
  • Quite easy to understand / query. Simple queries can return powerful results. I have used this approach in a complex supply chain to identify future bottlenecks with a simple simple SQL query.
0


source







All Articles