At what point did the MySQL primary key error occur?

Question

At what point did the MySQL primary key error occur?

If I have a package insert statement like:

INSERT INTO TABLE VALUES (x,y,z),(x2,y2,z2),(x3,y3,z3);

And x2

violates the primary key, is the error thrown before or after processing x3

?

Specifically, I have a batch of batch inserts in a try-catch block using Python and PyMySQL, for example:

conn = myDB.cursor() 
try:
     conn.execute("INSERT INTO TABLE VALUES (x,y,z),(x2,y2,z2),(x3,y3,z3);")
except pymysql.Error as  msg:
     print("MYSQL ERROR!:{0}".format(msg)) #print error

I want to make sure that if one of the tuples in a batch insert fails, thus printing an error, the rest of the tuples in the same batch are still processed.

My motivation is that I am migrating a lot of data across two servers. On server 1, the data is stored in log files and it is inserted into MySQL on server 2. Some of the data is already in MySQL on server 2, so there are many crashes. However, if I don't use batch inserts, and I have a separate one INSERT INTO

for each of the (millions) records, it seems that they are much slower. So I'm fine: with batch inserts, duplicate failures blow up the whole statement, and without batch inserts, the process takes much longer.

+3

python mysql pymysql

Tommy Sep 18 14 at 17:01

source to share

2 answers

After doing some experimentation on a MyISAM table , I see that if you try to insert two or more value tuples into the table and one (or more) of them violate table constraints (such as primary key or unique index rules), the tuples after the offensive do not will be inserted:

create table test(
  id int unsigned not null primary key, 
  col varchar(100)
) Engine = MyISAM;

insert into test values
  (1, 'The first')
, (2, 'Should work')
, (2, 'Should fail') -- This one won't be inserted, and will be treated as an error
, (3, 'The last')    -- This one won't be inserted either, because of the
                     -- previous tuple "offense".
;
select * from test;
+----+-------------+
| id | col         |
+----+-------------+
|  1 | The first   |
|  2 | Should work |
+----+-------------+

In InnoDB tables the behavior is different (thanks to AirThomas for his comment), the insert will complete completely:

drop table test;
create table test(
  id int unsigned not null primary key, 
  col varchar(100)
) Engine = InnoDB;

insert into test values
  (1, 'The first')
, (2, 'Should work')
, (2, 'Should fail') -- This will cause the whole insert to fail
, (3, 'The last')
;
select * from test;
    Empty set

But there are alternatives. You can use the keyword ignore

(this seems to work with both MyISAM and InnoDB tables):

truncate test; -- Let work with an empty table
insert IGNORE into test values
  (1, 'The first')
, (2, 'Should work')
, (2, 'Should fail') -- This one won't be inserted, but will not cause the insert 
                     -- to fail (because of the IGNORE keyword)
, (3, 'The last');   -- This one will be inserted, even given the previous 
                     -- tuple "offence"
;
-- In MySQL CLI this will pop out a message like this:
-- Query OK, 3 rows affected
-- Records: 4 Duplicates: 1 Warnings: 0
select * from test;
+----+-------------+
| id | col         |
+----+-------------+
|  1 | The first   |
|  2 | Should work |
|  3 | The last    |
+----+-------------+

You can also use on duplicate key

... I leave this as homework for you. Read the documentation aboutinsert ... on duplicate key update

.

+1

Barranka Sep 18 14 at 17:33

source to share

Air · Accepted Answer · 2014-09-18T21:26:23+0000

The way MySQL handles multiple insert (or update) statements depends on the table mode and the SQL server.

While only the table engine is really important for the key constraints you're asking about here, it's important to understand the big picture, so I'm going to take the time to add more data. If you're in a hurry, feel free to just read the first and last sections below.

Bench engines

In the case of a non-transactional engine such as MyISAM, you can easily complete a partial update because each insert or update is done sequentially and cannot be rolled back when a bad row is encountered and the statement is aborted.

However, if you are using a transactional table engine such as InnoDB, any constraint violation during an insert or update statement will rollback any changes made up to that point, in addition to aborting the statement.

SQL modes

Server SQL mode becomes important if you do not violate the key constraint, but the data you are trying to insert or update does not match the definition of the column you are inserting it into. For example:

inserting a row without specifying values for each column NOT NULL
insert '123'

into a column defined with a numeric type (not 123

)
updating the column CHAR(3)

to store the value'four'

In these cases, MySQL will throw an error if strict mode is in effect. However, if strict mode does not work, it will often "fix" your error, which can cause all possible potentially dangerous actions (see MySQL "Truncated Invalid INTEGER" and mysql string 0 conversion for two examples).

Danger, there will be Robinson!

There are some potential "gotchas" with nontransactional tables and strict mode. You didn't tell us which table engine you are working with, but this answer , as currently written, explicitly uses a table without a transaction and it is important to know how this affects the result.

For example, consider the following set of operators:

SET sql_mode = '';  # This will make sure strict mode is not in effect

CREATE TABLE tbl (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  val INT
) ENGINE=MyISAM;  # A nontransactional table engine (this used to be the default)

INSERT INTO tbl (val) VALUES (1), ('two'), (3);

INSERT INTO tbl (val) VALUES ('four'), (5), (6);

INSERT INTO tbl (val) VALUES ('7'), (8), (9);

Since strict mode is not in effect, it's no surprise that all nine values are inserted and invalid strings are bound to integers. The server is smart enough to recognize '7'

as a number, but does not recognize 'two'

or 'four'

, so they are converted to the default for numeric types in MySQL :

mysql> SELECT val FROM tbl;
+------+
| val  |
+------+
|    1 |
|    0 |
|    3 |
|    0 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
+------+
9 rows in set (0.00 sec)

Now try it again with sql_mode = 'STRICT_ALL_TABLES'

. In short, the first statement INSERT

will result in a partial insertion, the second will fail, and the third one will be forced to forced '7'

to 7

(which does not look very "strict", if you ask me, but the documented behavior , and it is not unreasonable).

But wait, there's more! Try with sql_mode = 'STRICT_TRANS_TABLES'

. Now you will find that the first statement gives a warning instead of an error, but the second statement still doesn't work! This can be especially frustrating if you are using LOAD DATA

with a bunch of files and some don't work and others don't (see this private bug report ).

What to do

In the case of key violations, it only matters specifically whether the table engine is transactional (eg InnoDB) or not (example: MyISAM). If you are working on a transactional table, the Python code in your question will force the MySQL server to do something in this order:

Parse the statement INSERT

and start the transaction. *
Insert the first tuple.
Insert second tuple (key constraint violated).
Cancel the transaction.
Send an error message pymysql

.

^{* It would be smart to parse the statement before starting the transaction, but I don't know the exact implementation, so I'll take it as one step.}

In this case, any changes to the wrong tuple would have already been rolled back when your script receives an error message from the server and enters the block except

.

If you are working on a non-transactional table, the server will skip step 4 (and the corresponding part of step 1), since the table engine does not support transaction reporting.In this case, when your script enters a block except

, the first tuple was inserted, the second exploded, and you you won't be able to easily determine how many rows were inserted successfully, because the function that normally does this will return -1 if the last insert or update statement threw an error.

Partial updates should be avoided; they are much more difficult to fix than just making sure your expression is complete or not complete. In this type of situation, the documentation suggests :

To avoid [partial update], use single line statements that can be interrupted without changing the table.

And in my opinion, this is exactly what you should be doing. It is difficult to write a loop in Python and you don't have to repeat code as long as you insert values as parameters correctly than hard-code them - what are you doing already, right? RIGHT??? >: (

Alternative alternatives

If you occasionally break your constraint and want to take other action when the row you are trying to insert already exists, then you might be interested in `INSERT ... ON DUPLICATE KEY UPDATE ' . This allows you to perform such amazing feats of computational gymnastics as counting:

mysql> create table counting_is_fun (
    -> stuff int primary key,
    -> ct int unsigned not null default 1
    -> );
Query OK, 0 rows affected (0.12 sec)

mysql> insert into counting_is_fun (stuff)
    -> values (1), (2), (5), (3), (3)
    -> on duplicate key update count = count + 1;
Query OK, 6 rows affected (0.04 sec)
Records: 5  Duplicates: 1  Warnings: 0

mysql> select * from counting_is_fun;
+-------+-------+
| stuff | count |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     2 |
|     5 |     1 |
+-------+-------+
4 rows in set (0.00 sec)

(Note: Compare the number of tuples you inserted in the number of "affected rows" when requested versus the number of rows in the table after that. Don't you find it fun?)

Or if you think the data you are inserting right now is at least as good as the data currently in the table, you can look in REPLACE INTO

- but this is a MySQL-specific extension to the SQL standard and, as usual, it has its own peculiarities , especially with for fields AUTO_INCREMENT

and ON DELETE

actions related to foreign key references.

Another approach that people like to suggest is INSERT IGNORE

. This ignores the error and just keeps rolling. Great, right? Who cares about mistakes anyway? The reasons I don't like as a solution are:

INSERT IGNORE

will result in whatever error occurs while ignoring the operator, not just whichever error you think you don't care about.
The documentation states, "Ignored errors may generate warnings instead, although there are no duplicate-key errors." That way, you don't even have to know what warnings to expect when using this keyword!
To me, using INSERT IGNORE

, he says: "I don't know how to do it right, so I'll just do it wrong."

I do use it occasionally INSERT IGNORE

, but when the documentation aligns you say "correct" to do something, don't outsmart yourself. Try this first; if you still have a good reason to do it wrong and risk compromising the integrity of your data and damaging everything permanently, at least you've made an informed decision.

At what point did the MySQL primary key error occur?

Bench engines

SQL modes

Danger, there will be Robinson!

What to do

Alternative alternatives

More articles: