Deleting Rows in Oracle PL / SQL
I need to remove certain keywords from an input line and return a new line. Keywords are stored in another table, for example MR, MRS, DR, PVT, PRIVATE, CO, COMPANY, LTD, LIMITED, etc. These are two kinds of keywords: LEADING - MR, MRS, DR and TRAILING - PVT, PRIVATE, CO, COMPANY, LTD, LIMITED, etc. So if the keywords are leaders then we have to remove that from the beginning, and if it is TRAILING then we have to remove that from the end. eg- MR Jones MRS COMPANY
should return JONES MRS
, but MR MRS Jones PVT COMPANY
should return JONES
(as in the first iteration MR
and PVT
will be truncated, and then the word will become MRS JONES PVT
, and on the 2nd iteration it will become JONES
. Similarly MR MRS Doe PVT COMPANY LTD
, it will finally return DOE
.
I need to do this through PL / SQL. I wrote the following code, but it removes all keywords if there are multiple keywords at the beginning or end. The reason when I iterate over the keywords is if the keyword is not at the end and the loop is already repeating, although we cannot reuse that keyword for replacement. Note that n keywords may be missing at the end or at the beginning:
CREATE OR REPLACE FUNCTION replace_keyword (p_in_name IN VARCHAR2)
RETURN VARCHAR2
IS
l_name VARCHAR2 (4000);
CURSOR c
IS
SELECT *
FROM RSRV_KEY_LKUPS
WHERE ACTIVE = 'Y';
BEGIN
l_name := TRIM (p_in_name);
--Now inside the function we’ll loop through this cursor something like below and replace the value in the input name:
FOR rec IN c
LOOP
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0
THEN --Rule 3:remove leading name
DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
l_name := LTRIM (UPPER (l_name), rec.KEY_WORD || ' ');
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0
THEN --Rule 4:remove trailing name
DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
l_name := RTRIM (UPPER (l_name), ' ' || rec.KEY_WORD);
END IF;
l_name := l_name;
END LOOP;
l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' '); --Remove multiple spaces in a word and replace with single blank space
l_name := TRIM (l_name); --Remove the leading and trailing blank spaces
RETURN l_name;
EXCEPTION
WHEN OTHERS
THEN
raise_application_error (
-20001,
'An error was encountered - ' || SQLCODE || ' -ERROR- ' || SQLERRM);
END;
/
Thanks a lot for any help.
EDIT Input example 1
MR MRS Jones PVT COMPANY
Output
JONES
Input example 2
MR MRS Doe PVT COMPANY LTD
Output
DOE
source to share
I think it can be done with a single query (which can be wrapped in a plsql function if you insist for some reason):
with inpt as (select 'MR Jones MRS COMPANY' text from dual)
select listagg(t1.word, ' ') within group (order by ord) new_text
from (
select w.*, words.*,
sum(case when nvl(POSITION, 'TRAILING') = 'TRAILING' then 1 else 0 end) over(order by ord rows between unbounded preceding and current row) l,
sum(case when nvl(POSITION, 'LEADING') = 'LEADING' then 1 else 0 end) over(order by ord desc rows between unbounded preceding and current row) t
from
(select regexp_substr(inpt.text, '[^ ]+',1,level) word , level ord
from inpt
connect by level <= regexp_count(inpt.text, ' ') + 1) words left outer join RSRV_KEY_LKUPS w on w.KEY_WORD = words.word
) t1
where t1.t > 0 and t1.l > 0
Edit: Explanation:
The 'with' clause is only for your input string to be as a column (optional).
Internal selection, which was aliased as "words", is a well-known Technic for splitting words into rows (note that I kept the order with the column ord
).
Now we can leave the outer join of the words of the input row with keywords in your table "RSRV_KEY_LKUPS", this will give us for each word in the input if it should be leading or trailing or null (if it doesn't exist)
So so far we have (for input "MR Jones MRS COMPANY"
):
KEY_WORD POSITION WORD ORD
-----------------------------------
MR LEADING MR 1
(null) (null) Jones 2
MRS LEADING MRS 3
COMPANY TRAILING COMPANY 4
Now comes the tricky part (maybe there is a better way) - we need to somehow find out which word to remove, it should be all LEADERS before "change", which means until we find zero or TRAILING (top down) and all TRAILINGs before "change", which means "null" or "LEADING" (bottom to top). So I used a known technique for the cumulative sum, while we are still zeroed out, we need to delete the row (as soon as we get the "change", we will have some values).
To make this all we have to do is remember the lines for the newline, since 11gr2 we can use LISTAGG for exactly this
source to share
If you want to be sure that a keyword at the beginning is found at the beginning, you should only remove it when INSTR returns 1:
Replace
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0
from
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) = 1
and replace
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0
by
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) = (LENGTH(l_name)-LENGTH(rec.key_word) +1)
For the multi-keyword problem, you have to bypass the for for loop:
keyword_found BOOLEAN;
LOOP
keyword_found = false;
FOR rec IN c
-- when you find a keyword
keyword_found := true;
END LOOP;
EXIT WHEN NOT(keyword_found);
END LOOP;
source to share