Grouping comma separated value for shared data
I have a table with col1 id int, col2 as varchar (comma separated value) and column 3 to assign to a group. The table looks like
col1 col2 group .............................. 1 2,3,4 2 5,6 3 1,2,5 4 7,8 5 11,3 6 22,8
This is just a sample of real data, now I need to assign group no to them in such a way that the result looks like
col1 col2 group .............................. 1 2,3,4 1 2 5,6 1 3 1,2,5 1 4 7,8 2 5 11,3 1 6 22,8 2
The logic behind assigning group no is that every similar comma-separated string value in col2 must be the same group, not the same as each where in col2 where "2" is, it must be the same group, but the complexity is in that 2, 3,4 together, so all three int values, if found anywhere in col2, will be assigned to the same group. the main part is 2,3,4 and 1,2,5, since there are 2 in col2, so all int 1,2,3,4,5 should assign the same group to no. Trial store procedure with match against col2 but not getting the desired result
Most imp (I cannot use normalization because I cannot afford to create a new table from my original table that has millions of records), even normalization in my context is not useful.
Achieved so far ...... I have set the auto increment of the group column and wrote the following procedure: -
BEGIN declare cil1_new,col2_new,group_new int; declare done tinyint default 0; declare group_new varchar(100); declare cur1 cursor for select col1,col2,`group` from company ; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=1; open cur1; REPEAT fetch cur1 into col1_new,col2_new,group_new; update company set group=group_new where match(col2) against(concat("'",col2_new,"'")); until done end repeat; close cur1; select * from company; END
This procedure works, has no syntex error, but I see that I don't understand the result.
It can be done, but I'm not sure how long it will take for your large table. I am assuming that you are allowed to create new tables that store all groups as well as numbers in the group column.
Also, it cannot be done on a live table. It is impossible to write it down without limiting my design. Think about what happens if you add a new row with values 7 and "6,7" that connects to groups 1 and 2, and all work has to be discarded.
This process must be re-run every time there are additions to the table. If that's not acceptable, run it once and then replace it with triggers that support values and combine groups when needed.
Here's the procedure. It might benefit from some modulation, but it works. I took Jay Pipes split_string and included it.
DDL first and some test data
CREATE TABLE `company` ( `col1` int(11) DEFAULT NULL, `col2` varchar(100) DEFAULT NULL, `grp` int(11) DEFAULT NULL ); CREATE TABLE `groups` ( `number` int(11) NOT NULL DEFAULT '0', `grp` int(11) NOT NULL DEFAULT '0', `processed` tinyint(1) DEFAULT NULL, PRIMARY KEY (`number`,`grp`), KEY `grp` (`grp`) ); insert into company (col1, col2) values (1,'2,3,4'), (2,'5,6'), (3,'1,2,5'), (4,'7,8'), (5,'11,3'), (6,'22,8');
And now the procedure
use test; drop procedure if exists group_it; delimiter // create procedure group_it () begin declare current_group int default 0; declare ids varchar(100); -- clear out all data from before update company set grp = null; truncate groups; main: loop -- take one unmapped (new group) set ids := null; select col2 into ids from company where grp is null limit 1; if ids is null then leave main; end if; set current_group := current_group + 1; -- put each value into groups table and mark as unprocessed call split_string(ids, ','); insert into groups select value, current_group, false from SplitValues; -- while unprocessed value in groups begin declare unprocessed int; unprocessed: loop set unprocessed = null; select number into unprocessed from groups where not processed limit 1; if unprocessed is null then leave unprocessed; end if; begin -- find all rows in company that matches this group declare row_id int; declare ids2 varchar(100); declare cur2_done boolean; declare cur2 cursor for select col1, col2 from company where col2 regexp concat('^', unprocessed, '$') or col2 regexp concat('^', unprocessed, ',') or col2 regexp concat(',', unprocessed, '$') or col2 regexp concat(',', unprocessed, ','); declare continue handler for not found set cur2_done := true; open cur2; numbers: loop set cur2_done := false; fetch cur2 into row_id, ids2; if cur2_done then close cur2; leave numbers; end if; update company set grp = current_group where col1 = row_id; -- add all new values to groups marked as unprocessed call split_string(ids2, ','); insert ignore into groups select value, current_group, false from SplitValues; end loop numbers; update groups set processed = true where number = unprocessed; end; end loop unprocessed; end; end loop main; end// delimiter ;
This is Jay Pipes split_string
DELIMITER // DROP PROCEDURE IF EXISTS split_string // CREATE PROCEDURE split_string ( IN input TEXT IN `delimiter` VARCHAR(10) ) SQL SECURITY INVOKER COMMENT a supplied string using using the given delimiter, placing values in a temporary table' BEGIN DECLARE cur_position INT DEFAULT 1 ; DECLARE remainder TEXT; DECLARE cur_string VARCHAR(1000); DECLARE delimiter_length TINYINT UNSIGNED; DROP TEMPORARY TABLE IF EXISTS SplitValues; CREATE TEMPORARY TABLE SplitValues ( value VARCHAR(1000) NOT NULL PRIMARY KEY ENGINE=MyISAM; SET remainder = input; SET delimiter_length = CHAR_LENGTH(delimiter); WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO SET cur_position = INSTR(remainder, `delimiter`); IF cur_position = 0 THEN SET cur_string = remainder; ELSE SET cur_string = LEFT(remainder, cur_position - 1); END IF; IF TRIM(cur_string) != '' THEN INSERT INTO SplitValues VALUES (cur_string); END IF; SET remainder = SUBSTRING(remainder, cur_position + delimiter_length); END WHILE; END // DELIMITER ;
source to share