Grouping comma separated value for shared data

I have a table with col1 id int, col2 as varchar (comma separated value) and column 3 to assign to a group. The table looks like

  col1          col2     group
..............................
       1        2,3,4       
       2        5,6        
       3        1,2,5
       4        7,8
       5        11,3
       6        22,8

      

This is just a sample of real data, now I need to assign group no to them in such a way that the result looks like

  col1          col2       group
..............................
       1        2,3,4       1
       2        5,6         1
       3        1,2,5       1
       4        7,8         2
       5        11,3        1
       6        22,8        2

      

The logic behind assigning group no is that every similar comma-separated string value in col2 must be the same group, not the same as each where in col2 where "2" is, it must be the same group, but the complexity is in that 2, 3,4 together, so all three int values, if found anywhere in col2, will be assigned to the same group. the main part is 2,3,4 and 1,2,5, since there are 2 in col2, so all int 1,2,3,4,5 should assign the same group to no. Trial store procedure with match against col2 but not getting the desired result

Most imp (I cannot use normalization because I cannot afford to create a new table from my original table that has millions of records), even normalization in my context is not useful.


Achieved so far ...... I have set the auto increment of the group column and wrote the following procedure: -

BEGIN
  declare cil1_new,col2_new,group_new int;
  declare done tinyint default 0;
  declare group_new varchar(100);
  declare cur1 cursor for select col1,col2,`group` from company ; 
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=1;
  open cur1;
  REPEAT
  fetch cur1 into col1_new,col2_new,group_new;
  update company set group=group_new where
  match(col2) against(concat("'",col2_new,"'"));
  until  done end repeat;
  close cur1;
  select * from company;
END

      

This procedure works, has no syntex error, but I see that I don't understand the result.

+3


source to share


1 answer


It can be done, but I'm not sure how long it will take for your large table. I am assuming that you are allowed to create new tables that store all groups as well as numbers in the group column.

Also, it cannot be done on a live table. It is impossible to write it down without limiting my design. Think about what happens if you add a new row with values ​​7 and "6,7" that connects to groups 1 and 2, and all work has to be discarded.

This process must be re-run every time there are additions to the table. If that's not acceptable, run it once and then replace it with triggers that support values ​​and combine groups when needed.

Here's the procedure. It might benefit from some modulation, but it works. I took Jay Pipes split_string and included it.

DDL first and some test data



CREATE TABLE `company` (
  `col1` int(11) DEFAULT NULL,
  `col2` varchar(100) DEFAULT NULL,
  `grp` int(11) DEFAULT NULL
);

CREATE TABLE `groups` (
  `number` int(11) NOT NULL DEFAULT '0',
  `grp` int(11) NOT NULL DEFAULT '0',
  `processed` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`number`,`grp`),
  KEY `grp` (`grp`)
);

insert into company (col1, col2) values 
(1,'2,3,4'),       
(2,'5,6'),        
(3,'1,2,5'),
(4,'7,8'),
(5,'11,3'),
(6,'22,8');

      

And now the procedure

use test;

drop procedure if exists group_it;
delimiter //

create procedure group_it ()
begin                        
  declare current_group int default 0;
  declare ids varchar(100);

  -- clear out all data from before
  update company set grp = null;
  truncate groups;

  main: loop                                
    -- take one unmapped (new group)
    set ids := null;
    select col2 into ids from company where grp is null limit 1;
    if ids is null then
      leave main;
    end if;
    set current_group := current_group + 1;

    --  put each value into groups table and mark as unprocessed
    call split_string(ids, ',');
    insert into groups select value, current_group, false from SplitValues;

    -- while unprocessed value in groups
    begin
      declare unprocessed int;

      unprocessed: loop
        set unprocessed = null;
        select number
          into unprocessed
          from groups
         where not processed
         limit 1;

        if unprocessed is null then
          leave unprocessed;
        end if;

        begin
          -- find all rows in company that matches this group
          declare row_id int;
          declare ids2 varchar(100);

          declare cur2_done boolean;
          declare cur2 cursor for
            select col1, col2 
              from company
             where col2 regexp concat('^', unprocessed, '$')
                or col2 regexp concat('^', unprocessed, ',')
                or col2 regexp concat(',', unprocessed, '$')
                or col2 regexp concat(',', unprocessed, ',');

          declare continue handler for not found set cur2_done := true;

          open cur2;    
          numbers: loop
            set cur2_done := false;
            fetch cur2 into row_id, ids2; 
            if cur2_done then
                close cur2;
                leave numbers;
            end if;

            update company set grp = current_group where col1 = row_id;
            --  add all new values to groups marked as unprocessed
            call split_string(ids2, ',');   
            insert ignore into groups select value, current_group, false from SplitValues;
          end loop numbers;
          update groups set processed = true where number = unprocessed;
        end;
      end loop unprocessed;
    end;
  end loop main;
end//

delimiter ;         

      

This is Jay Pipes split_string

DELIMITER //

DROP PROCEDURE IF EXISTS split_string //
CREATE PROCEDURE split_string (
IN input TEXT
, IN `delimiter` VARCHAR(10)
)
SQL SECURITY INVOKER
COMMENT
'Splits a supplied string using using the given delimiter,
placing values in a temporary table'
BEGIN
DECLARE cur_position INT DEFAULT 1 ;
DECLARE remainder TEXT;
DECLARE cur_string VARCHAR(1000);
DECLARE delimiter_length TINYINT UNSIGNED;

DROP TEMPORARY TABLE IF EXISTS SplitValues;
CREATE TEMPORARY TABLE SplitValues (
value VARCHAR(1000) NOT NULL PRIMARY KEY
) ENGINE=MyISAM;

SET remainder = input;
SET delimiter_length = CHAR_LENGTH(delimiter);

WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
SET cur_position = INSTR(remainder, `delimiter`);
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = LEFT(remainder, cur_position - 1);
END IF;
IF TRIM(cur_string) != '' THEN
INSERT INTO SplitValues VALUES (cur_string);
END IF;
SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
END WHILE;

END //

DELIMITER ;

      

0


source







All Articles