Can I configure Mysql to auto-split?

I want to split a very large table. As the business grows, splitting by date isn't really that good because the sections get bigger and bigger every year. I would love the section for every 10 million entries.

The Mysql manual shows this simple example:

CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);

      

But this means that everything greater than 16 and less than MAXVALUE goes to the last section. Is there a way to automatically generate a new section for each interval (10 million records in my case) so I don't need to change the active database? I am running Mysql 5.5

Thank!

EDIT: here is my actual table

CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
 PRIMARY KEY (`row_id`),
 UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
 KEY `index1` (`filename`),
 KEY `index2` (`timestamp`),
 KEY `index3` (`timestamp`,`filename`,`string`),
 KEY `index4` (`filename`,`unit_num`)
 ) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1

      

and an example query for a graph ...

SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num 
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) 
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;

      

Here is a clarification for the request ...

mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table      | type | possible_keys           | key    | key_len | ref         | rows  | Extra                                              |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
|  1 | SIMPLE      | unit_tarma | ref  | timestamp,index3,index4 | index4 | 58      | const,const | 13440 | Using index condition; Using where; Using filesort | 
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+

      

+3


source to share


2 answers


First, I have to ask, what benefit does Partitioning give you? Is there some query that is faster because of this?

No automatic partitioning.

Instead, you should have a task that runs every day and it counts the number of rows in the "last active" section to see if it is around 10 million. If so, add another section.

I recommend keeping the "last" section (one s MAXVALUE

) blank. This way you can REORGANIZE PARTITION

split it into two empty sections with zero overhead. And I recommend instead ADD PARTITION

, because you can slip away and put something in the last section.

It is not clear what will cause 10M. Are there multiple lines for each store_id? And are there new aisles for each store? If this is the case, then the partition into store_id, since all partitions will constantly grow.



OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE

so that we can be specific and not waving our hands. There are too many ways to solve this problem.

What is activity?

If you mostly find yourself in the "last" partitions (sections), then uneven distribution can be guaranteed - periodically add a new section and merge adjacent pair of old partitions. (I did this successfully on one system.)

If you are going to clean up "old" data, then you obviously need to use PARTITION BY RANGE(TO_DAYS(...))

and use DROP PARTITION

plus REORGANIZE PARTITION

.

And there are many other scenarios. But I only know of 4 scenarios where Partitioning provides any performance advantage. See my blog .

0


source


(This answer focuses on schema and SELECT.)

Since you are expecting millions of lines, I want to point out some improvements to the schema first.

  • FLOAT(m,n)

    this is usually the "wrong" thing because it results in two roundings. Either use plain FLOAT

    (which seems to be "correct" for metrics like stress) or use DECIMAL(m,n)

    . FLOAT

    - 4 bytes; in the above cases it DECIMAL

    will be 3 or 4 bytes.

  • If you have both INDEX(a)

    so and INDEX(a,b)

    , the former is not needed as the latter can cover for it. You have 3 unnecessary KEYS. It slows down INSERTs

    .

  • INT(3)

    - You say "3-digit number"? If yes, consider TINYINT UNSIGNED

    (values ​​0..255) for 1 byte instead of INT

    for 4 bytes. This will save a lot of MB of disk space, hence speed. (See also SMALLINT

    , etc. and SIGNED

    or UNSIGNED

    .)

  • If there filename

    is a lot of repetition, you can "normalize" it. This will save many megabytes.

  • Use NOT NULL

    if you don't need it NULL

    .

  • AUTO_INCREMENT=690892041

    implies that you have reached 1/3 of the road to disaster with id

    , which will exceed about 2 billion. Do you use id

    for anything? Getting rid of the column will avoid the problem; and change the value UNIQUE KEY

    to PRIMARY KEY

    . ( id

    Talk further if you need to.)

  • ENGINE=MyISAM

    - Switching has several consequences, both favorable and unfavorable. The table will be 2-3 times larger. The "right" choice PRIMARY KEY

    will greatly speed up this one SELECT

    . (And may or may not slow others down SELECTs

    .)

Note on SELECT

: Since string

and unit_num

are constants in the query, the last two fields ORDER BY timestamp asc, string asc, unit_num asc

are unnecessary. If they are related to reasons not obvious in SELECT

, my advice may not be complete.

it



WHERE filename = 'foobar'
  AND unit_num='40'
  AND string='2' 
  AND timestamp >= ...

      

optimally processed INDEX(filename, unit_name, string, timestamp)

. The order of the columns is not important, except that it timestamp

should be last. By reordering the current key UNIQUE

, you give the optimal index. (In the meantime, none of the indexes are very good for this SELECT

.) By making it PRIMARY KEY

, an InnoDB table will make it even faster.

Markup? No benefits. Not for execution; not for anything else that you mentioned. A common use for separation is to clean up "old". If you intend to do this, let me go on.

In huge tables, it's best to look at all the important ones SELECTs

at the same time so that we don't speed up one when changing the speed of others. It may even turn out that splitting helps in such a compromise.

0


source







All Articles