Do you have the filesystems which host the datadir running on Logical Volumes? If not, you'll have to run mysqldump, and the outage duration will be increased. Also remember that when you install all the databases from the primary master to the secondary master, you won't have the replication user the primary master uses as a user on the secondary db.... if that makes any sense? I would also insure that you do not have the log slave updates flag enabled, that you have different server id's configured, and the that auto-increment-offset values differ between the masters.
Here is a brief overview of the process:
-Lock tables on Master 1
-Record log & log pos from master
-Create an LVM snapshot of the datadir partition
-Transfer the database files to Master 2
-Bring up Master 2 as a slave
-Lock tables on Master 2
-Record log & log pos on Master 2
-Unlock tables on Master 2
-Start Slave replication on Master 1
Connect to the Master 1 MySQL database with an admin account. Run
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
Record this data!SHOW MASTER STATUS;
Exit the MySQL shell. Run
#/usr/sbin/lvmcreate -L16G -s -n dbbackup <path/to/LVM/datadir/device>
Reconnect to the database and run:
UNLOCK TABLES;
GRANT REPLICATION SLAVE ON . ON '<repuser>'@'<Master 2>' IDENTIFIED BY '<reppass>';
Return to the shell and runGRANT REPLICATION SLAVE ON . ON '<repuser>'@'<Master 2>' IDENTIFIED BY '<reppass>';
#mkdir /mnt/dbbak
#mount <path/to/snapshot/device> /mnt/dbbak
If you are using an XFS filesystem, you will need to run mount with the -onouuid,ro options!
Now cd to the dbbak directory.
run
#tar cvzf - ./* | ssh <username>@<Master 2 URL> "cat > /<target datadir>/dbbak.tgz"
After this completes, turn off the mysql process on master 2. umount the the /mnt/dbbak directory and run lvremove against the dbbak device. Unpack the tarball, and bring the db on Master 2 back up. Connect to the db and run
STOP SLAVE;
CHANGE MASTER TO MASTER_HOST='<URL of Master 1>', MASTER_USER="<repuser>", MASTER_PASSWORD="<reppass>", master_log_file="<value recorded from SHOW MASTER STATUS on Master 1>", master_log_pos=<value recorded from SHOW MASTER STATUS on Master 1>;
START SLAVE;
SHOW SLAVE STATUS\G
You should see Slave_IO_Running=Yes and Slave_SQL_Running=Yes. After Seconds_Behind_Master reaches zero, we are ready to set up the replication on Master 1. Run the Grant statement from above, with the appropriate values for the Master 1 replication user. On Master 2 run
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
Now, on Master 1, we can run
CHANGE MASTER TO TO MASTER_HOST='<URL of Master 2>', MASTER_USER="<repuser>", MASTER_PASSWORD="<reppass>", master_log_file="<value recorded from SHOW MASTER STATUS on Master 2>", master_log_pos=<value recorded from SHOW MASTER STATUS on Master 2>;
START SLAVE;
SHOW SLAVE STATUS\G
Look for the output noted above.CHANGE MASTER TO MASTER_HOST='<URL of Master 1>', MASTER_USER="<repuser>", MASTER_PASSWORD="<reppass>", master_log_file="<value recorded from SHOW MASTER STATUS on Master 1>", master_log_pos=<value recorded from SHOW MASTER STATUS on Master 1>;
START SLAVE;
SHOW SLAVE STATUS\G
You should see Slave_IO_Running=Yes and Slave_SQL_Running=Yes. After Seconds_Behind_Master reaches zero, we are ready to set up the replication on Master 1. Run the Grant statement from above, with the appropriate values for the Master 1 replication user. On Master 2 run
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
Now, on Master 1, we can run
CHANGE MASTER TO TO MASTER_HOST='<URL of Master 2>', MASTER_USER="<repuser>", MASTER_PASSWORD="<reppass>", master_log_file="<value recorded from SHOW MASTER STATUS on Master 2>", master_log_pos=<value recorded from SHOW MASTER STATUS on Master 2>;
START SLAVE;
SHOW SLAVE STATUS\G
On Master 2
UNLOCK TABLES;
You should see both slaves 0 seconds behind master, with no errors. This can be tested by creating and dropping a test database on each master, verifying that the changes are reflected on the paired machine.Finally, we need to add the machines to V.I.P., and bring up the pulse daemon to enable failover.
Run
#/sbin/modprobe bonding
Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-bond0 so that they are similar to the following
ETH0:
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=static
BOND0:
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.11.8.22
NETMASK=255.255.255.0
NETWORK=10.11.8.0
Restart the network daemon, restart the pulse daemon, and verify that the V.I.P. is functional. A sample lvs.cf file is attached.
You are done!
-----Original Message-----
From: [email protected]
Sent: Monday, July 27, 2009 4:04 PM
To: [email protected]
Subject: master-master replication broken (conflicts) - what now?
Due to some problems with failover scripts, I have a broken master-master installation: some entries were inserted on one master, some were inserted
to the other master.
For now, I stopped one master to make the setup at least usable (yes, I know I lost some inserts this way - but it's OK, they are not important).
I'd like to remove the whole database on the "broken" master (the one which is currently disabled) and start the replication from scratch.
How should I proceed? I know I better start with backup, but what should I do next?
I wouldn't like to remove (drop) the database on the "broken" master just to find out that the whole database was cleared on the "good" master as
well...
Could anyone describe the procedure briefly, or point me to the documentation?
--
MySQL Replication Mailing List
For list archives: http://lists.mysql.com/replication
To unsubscribe: http://lists.mysql.com/[email protected]