Limiting the number of rows in subqueries with Teradata
I'm new to Teradata and I ran into a problem that I didn't have a previous database I was using. Basically, I'm trying to reduce the number of rows returned in subqueries inside a where clause. I haven't had a problem with this before using the ROWNUM function.
My previous request was something like this:
SELECT * FROM myTable
WHERE field1 = 'foo' AND field2 in(
SELECT field2 FROM anotherTable
WHERE field3 = 'bar' AND ROWNUM<100);
Since I cannot use ROWNUM in TD, I was looking for equivalent functions, or at least functions that could get me where I wanted, even if they were not exactly equivalent. I found and tried: ROW_NUMBER, TOP and SAMPLE.
I tried ROW_NUMBER (), but Teradata does not allow analytic functions in WHERE clauses. I tried TOP N but this parameter is not supported in the subquery. I tried SAMPLE N but it is also not supported in subqueries.
So ... I must admit that I am a bit stuck right now and was wondering if there was any solution that would allow me to limit the number of rows returned in a subquery using Teradata and it would be very similar to what I did still? Also, if they are not there, how could the query be constructed differently to use it with Teradata?
Thank!
source to share
The limited use of SAMPLE or TOP in a subquery is probably due to the fact that it might be a correlated subquery.
But there are two workarounds.
Place SAMPLE or TOP in a derived table in a subquery (so this can no longer be correlated):
SELECT * FROM myTable
WHERE field1 = 'foo'
AND field2 IN (
SELECT * FROM
( SELECT field2 FROM anotherTable -- or TOP 100
WHERE field3 = 'bar' SAMPLE 100
) AS dt
);
Or rewrite it as a join to a derived table:
SELECT * FROM myTable
JOIN ( SELECT DISTNCT field2 FROM anotherTable -- can't use TOP if you need DISTINCT
WHERE field3 = 'bar' SAMPLE 100
) AS dt
WHERE field1 = 'foo'
AND myTable.field2 = dt.field1;
TOP without ORDER BY is very similar to ROWNUM. It's not random at all, but running it a second time may still return a different set of results.
SAMPLE is really random, returning a different result each time.
ROW_NUMBER can also use QUALIFY instead of WHERE, but OLAP functions always require some ORDER BY, so this is a lot more overhead.
source to share