Home > database > 主备不一致:Table definition on master and slave does not match

主备不一致:Table definition on master and slave does not match

昨天一同事在线上做变更,为了保证主库的稳定性,先在备库把binlog关闭,然后在进行DDL变更,在通过切换HA,把备库切换为主库,在老的主库上做DDL变更
看上去这样做法没有太大的问题,但是当备库变更一做完,HA切换到备库,开始老主库变更的时候,备库就出现复制出现错误:
Last_Error: Table definition on master and slave does not match: Column 10 type mismatch – received type 3, dbname.table_name has type 8
Skip_Counter: 0
Exec_Master_Log_Pos: 1046252634
Relay_Log_Space: 2910773181
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1535
Last_SQL_Error: Table definition on master and slave does not match: Column 10 type mismatch – received type 3, dbname.table_name has type 8
1 row in set (0.00 sec)
<1>从这个错误上来看,是主备的表结构不一致导致的,但是之前的复制都是好好的,为什么做完变更后就会出现这个问题,应该是在DDL变更后导致的问题;
master:
mysql -uroot dbname -e “show create table table_name”>master.sql
slave:
mysql -uroot dbname -e “show create table table_name”>slave.sql

diff -u master.sql slave.sql没有找到两个表结构有什么大问题;
<2>查看出问题的数据字段:
root@127.0.0.1 : dbname 17:46:35> desc table_name;
+—————-+———————+——+—–+——————-+—————————–+
| Field | Type | Null | Key | Default | Extra |
+—————-+———————+——+—–+——————-+—————————–+
| url | varchar(333) | NO | UNI | NULL | |
| Description | varchar(255) | YES | | NULL | |
| HttpStatus | int(11) | YES | | NULL | |
| AddTime | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| ServerBanner | varchar(255) | YES | | NULL | |
| TaskTag | varchar(255) | NO | MUL | NULL | |
| TaskTag2 | varchar(255) | YES | | NULL | |
| DomainName | varchar(255) | NO | MUL | NULL | |
| R_DomainName | varchar(255) | YES | MUL | NULL | |
| ScanTaskID | int(11) | YES | | NULL | |
| SubTaskID | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| urlhash | varchar(32) | YES | UNI | | |
| duplicateHash | varchar(32) | YES | | | |
| enable | tinyint(1) | YES | | 0 | |
| webappid | int(11) | YES | | NULL | |
| crc_DomainName | int(10) unsigned | YES | MUL | NULL | |
| wapscore | int(11) | YES | | 0 | |
| ip | varchar(45) | YES | | NULL | |
+—————-+———————+——+—–+——————-+—————————–+
slave:
root@127.0.0.1 : information_schema 14:59:40> select * from columns where table_schema=”dbname” and table_name=”table_name” and ORDINAL_POSITION= 10\G;
*************************** 1. row ***************************
TABLE_CATALOG: NULL
TABLE_SCHEMA: dbname
TABLE_NAME: table_name
COLUMN_NAME: ScanTaskID
ORDINAL_POSITION: 10
COLUMN_DEFAULT: NULL
IS_NULLABLE: YES
DATA_TYPE: int
CHARACTER_MAXIMUM_LENGTH: NULL
CHARACTER_OCTET_LENGTH: NULL
NUMERIC_PRECISION: 10
NUMERIC_SCALE: 0
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
COLUMN_TYPE: int(11)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
1 row in set (0.00 sec)
master:
root@127.0.0.1 : information_schema 14:59:19> select * from information_schema.columns where table_schema=”dbname” and table_name=”table_name” and ORDINAL_POSITION= 10\G;
*************************** 1. row ***************************
TABLE_CATALOG: NULL
TABLE_SCHEMA: dbname
TABLE_NAME: table_name
COLUMN_NAME: ScanTaskID
ORDINAL_POSITION: 10
COLUMN_DEFAULT: NULL
IS_NULLABLE: YES
DATA_TYPE: int
CHARACTER_MAXIMUM_LENGTH: NULL
CHARACTER_OCTET_LENGTH: NULL
NUMERIC_PRECISION: 10
NUMERIC_SCALE: 0
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
COLUMN_TYPE: int(11)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
1 row in set (0.00 sec)
查看数据字段,主备库还是一致的,这个时候好像到了穷途;
<3>查看同事昨天的DDL变更脚本,涉及到加字段,调整字段的长度,看上去很平常,
我们是先在备库做的变更,然后在到主库的变更,期间的binlog是关闭的,这时候,印风同学想到如果在备库变更的时候,主库的业务是没有停止的,
如果主库变更的数据同步到备库,备库的变更做完,主备已经不一致了,这样的话,就会造成复制失败了,看了看脚本中有字段长度调长的,这下就迎刃而解了;
问了一下B2B的plinux,他们只有加字段的时候,才放到备库上去做,其他的还是在主库上直接做的;
<4>刚才看到从 information_schema.columns 中查询有问题的列的时候,直接代入ORDINAL_POSITION= 10得到的是ScanTaskID
字段,但出问题的字段是第11为字段(即我们调整长度的字段),所以binlog中是从0开始计算字段的位置的;
<5>5.5中报错显得更加人性了:
Column 0 of table ‘test.t3’ cannot be converted from type ‘int’ to type ‘bigint(20)’;
<6>.那如何避免这样的问题喃,由于我们的库采用的是row模式,只要把slave的复制改为statement就可以了,将主库的binlog_format由row改为statement,这样达到备库的binlog就不会出现错误;

 

 

Categories: database Tags:
  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.