Hash is concatenated equivalent to PROC SQL between
I usually use PROC SQL when I join a table that also has a date condition (i.e. target_date is between start_date and end_date).
I was able to successfully translate this to a hash join when considering the inner join:
data hash_join;
if _n_ = 1 then do;
declare hash add1(dataset:'table_2',multidata: 'Y');
add1.defineKey('key_1');
add1.defineData('start_date','end_date','value_1');
add1.defineDone();
end;
format
start_date date9.
end_date date9.
value_1 10.5
;
set table_1 (keep=key_1 target_date);
if add1.find() = 0 then do until (add1.find_next());
if start_date le target_date le end_date then output;
end;
run;
This is the same as:
proc sql;
create table sql_join as select
b.start_date,
b.end_date,
b.value_1,
a.key_1,
a.target_date
from table_1 a
inner join table_2 b
on a.key_1 = b.key_1 and
a.target_date between b.start_date and b.end_date
;quit;
I'm having trouble figuring out what the equivalent would be for a left join. For example, if something is not joining, I would like to output, which I find simple:
if add1.find() ne 0 then output;
And if it joins and the date is in between, it looks simple:
if add1.find() = 0 then do until (add1.find_next());
if start_date le target_date le end_date then output;
end;
But how do I get the rest of the records from table_1 that can join, but I don't have a target_date between start_date and end_date? For example, say table_2 is the start_date and end_date of a sale, and that sale did not start until Feb 1st for key_1 = 'Clothes'. If my table_1 has "Clothes" and sales on Jan 1st, it will join the key, but I want to output blank. Any ideas on how to do this?
Any help would be greatly appreciated!
source to share
You just need to keep track of whether you found a match or not. Since you are not using hash lookup to keep track of the "between" part of things, you cannot use that, so you just have to do it yourself.
See this example. Here I modify SASHELP.CLASS to look like your input tables, then add some logic to see if anything is found.
data table_1;
set sashelp.class;
rename age=target_date name=key_1;
drop height weight;
run;
data table_2;
set sashelp.class;
do _i = 1 to mod(_n_,3);
start_date = age-3+_i;
end_date = age+1-_i;
if start_date le end_date then output;
end;
rename name=key_1 height=value_1;
keep height weight start_date age end_date name;
run;
data hash_join;
if _n_ = 1 then do;
declare hash add1(dataset:'table_2',multidata: 'Y');
add1.defineKey('key_1');
add1.defineData('start_date','end_date','value_1');
add1.defineDone();
end;
format
start_date date9.
end_date date9.
value_1 10.5
;
set table_1 (keep=key_1 target_date);
if add1.find() = 0 then do until (add1.find_next());
if start_date le target_date le end_date then do;
found=1;
output;
end;
end;
call missing(of value_1); *full list of values to clear - all of hash data elements;
if not (found) then output;
run;
source to share
I think you just need to track if something has a key but not in a range:
if add1.find() ^=0 then output;
else do;
found = 0;
do until (add1.find_next());
if start_date le target_date le end_date then do;
output;
found=1;
end;
end;
if ^found then output;
end;
No data to test, so this is just SO coding. Let me know if this doesn't work.
source to share