Can I configure Mysql to auto-split?
I want to split a very large table. As the business grows, splitting by date isn't really that good because the sections get bigger and bigger every year. I would love the section for every 10 million entries.
The Mysql manual shows this simple example:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
But this means that everything greater than 16 and less than MAXVALUE goes to the last section. Is there a way to automatically generate a new section for each interval (10 million records in my case) so I don't need to change the active database? I am running Mysql 5.5
Thank!
EDIT: here is my actual table
CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
PRIMARY KEY (`row_id`),
UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
KEY `index1` (`filename`),
KEY `index2` (`timestamp`),
KEY `index3` (`timestamp`,`filename`,`string`),
KEY `index4` (`filename`,`unit_num`)
) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1
and an example query for a graph ...
SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY)
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
Here is a clarification for the request ...
mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| 1 | SIMPLE | unit_tarma | ref | timestamp,index3,index4 | index4 | 58 | const,const | 13440 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
source to share
First, I have to ask, what benefit does Partitioning give you? Is there some query that is faster because of this?
No automatic partitioning.
Instead, you should have a task that runs every day and it counts the number of rows in the "last active" section to see if it is around 10 million. If so, add another section.
I recommend keeping the "last" section (one s MAXVALUE
) blank. This way you can REORGANIZE PARTITION
split it into two empty sections with zero overhead. And I recommend instead ADD PARTITION
, because you can slip away and put something in the last section.
It is not clear what will cause 10M. Are there multiple lines for each store_id? And are there new aisles for each store? If this is the case, then the partition into store_id, since all partitions will constantly grow.
OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE
so that we can be specific and not waving our hands. There are too many ways to solve this problem.
What is activity?
If you mostly find yourself in the "last" partitions (sections), then uneven distribution can be guaranteed - periodically add a new section and merge adjacent pair of old partitions. (I did this successfully on one system.)
If you are going to clean up "old" data, then you obviously need to use PARTITION BY RANGE(TO_DAYS(...))
and use DROP PARTITION
plus REORGANIZE PARTITION
.
And there are many other scenarios. But I only know of 4 scenarios where Partitioning provides any performance advantage. See my blog .
source to share
(This answer focuses on schema and SELECT.)
Since you are expecting millions of lines, I want to point out some improvements to the schema first.
-
FLOAT(m,n)
this is usually the "wrong" thing because it results in two roundings. Either use plainFLOAT
(which seems to be "correct" for metrics like stress) or useDECIMAL(m,n)
.FLOAT
- 4 bytes; in the above cases itDECIMAL
will be 3 or 4 bytes. -
If you have both
INDEX(a)
so andINDEX(a,b)
, the former is not needed as the latter can cover for it. You have 3 unnecessary KEYS. It slows downINSERTs
. -
INT(3)
- You say "3-digit number"? If yes, considerTINYINT UNSIGNED
(values ββ0..255) for 1 byte instead ofINT
for 4 bytes. This will save a lot of MB of disk space, hence speed. (See alsoSMALLINT
, etc. andSIGNED
orUNSIGNED
.) -
If there
filename
is a lot of repetition, you can "normalize" it. This will save many megabytes. -
Use
NOT NULL
if you don't need itNULL
. -
AUTO_INCREMENT=690892041
implies that you have reached 1/3 of the road to disaster withid
, which will exceed about 2 billion. Do you useid
for anything? Getting rid of the column will avoid the problem; and change the valueUNIQUE KEY
toPRIMARY KEY
. (id
Talk further if you need to.) -
ENGINE=MyISAM
- Switching has several consequences, both favorable and unfavorable. The table will be 2-3 times larger. The "right" choicePRIMARY KEY
will greatly speed up this oneSELECT
. (And may or may not slow others downSELECTs
.)
Note on SELECT
: Since string
and unit_num
are constants in the query, the last two fields ORDER BY timestamp asc, string asc, unit_num asc
are unnecessary. If they are related to reasons not obvious in SELECT
, my advice may not be complete.
it
WHERE filename = 'foobar'
AND unit_num='40'
AND string='2'
AND timestamp >= ...
optimally processed INDEX(filename, unit_name, string, timestamp)
. The order of the columns is not important, except that it timestamp
should be last. By reordering the current key UNIQUE
, you give the optimal index. (In the meantime, none of the indexes are very good for this SELECT
.) By making it PRIMARY KEY
, an InnoDB table will make it even faster.
Markup? No benefits. Not for execution; not for anything else that you mentioned. A common use for separation is to clean up "old". If you intend to do this, let me go on.
In huge tables, it's best to look at all the important ones SELECTs
at the same time so that we don't speed up one when changing the speed of others. It may even turn out that splitting helps in such a compromise.
source to share