Mysql 复制是否2台机器都一定需要HEARTBEAT

liyihongcug · 发表于 2010-5-22 11:19

http://www.nalai.net/content/view/305054/32/

Monitoring
Replication increases the complexity of monitoring MySQL's health. Are all the slaves replicating? Have the slaves encountered errors? How far behind is the slowest slave?

As you may have come to expect, MySQL provides all the data necessary to answer those questions (and many questions you probably haven't even considered), but extracting and understanding the data is something it won't do for you. In Section 7.5.4, later in this chapter, we'll try to provide some details to help you make sense of all the data MySQL provides, which should help you understand the tools that are helpful in processing that data.

Master status
Using the SHOW MASTER STATUS command, the master will tell you about its replication status:

mysql> SHOW MASTER STATUS G*************************** 1. row ***************************          File: binary-log.004       Position: 635904327 Binlog_do_db:Binlog_ignore_db:1 row in set (0.00 sec)
The output includes the current binary log filename and the position (or offset) into the binary log where the next query will be written. The other two fields correspond to the binlog-do-db and binlog-ignore-db filtering options in the server's my.cnf file. If you are filtering binary log records on the master, one or both of these will list the database names affected by your filters.

You can also ask the master which binary logs still exist on disk:

mysql> SHOW MASTER LOGS;+----------------+| Log_name    |+----------------+| binary-log.001 || binary-log.002 || binary-log.003 || binary-log.004 |+----------------+4 rows in set (0.02 sec)
But the output is quite limited. It would be helpful to know the sizes and ages of the files as well. MySQL is doing little more than reading and displaying the contents of the log-bin.index file. To get more information, you need to log on to the server and examine the files by hand.

Slave status
There is significantly more information available on the slave side of replication, mostly because the slaves have more information to keep track of. To start, the SHOW SLAVE STATUS command provides a good summary of the information from both the master.info and relay-log.info files:

mysql> SHOW SLAVE STATUS G*************************** 1. row ***************************       Master_Host: master.example.com       Master_User: repl       Master_Port: 3306       Connect_retry: 15    Master_Log_File: binary-log.004  Read_Master_Log_Pos: 635904807    Relay_Log_File: relay-log.004       Relay_Log_Pos: 846096118Relay_Master_Log_File: binary-log.004    Slave_IO_Running: Yes Slave_SQL_Running: Yes    Replicate_do_db:  Replicate_ignore_db:          Last_errno: 0          Last_error:       Skip_counter: 0  Exec_master_log_pos: 635904807    Relay_log_space: 8460961221 row in set (0.00 sec)
In addition, there is some other metadata in the output. The Last_errno and Last_error fields provide information about the most recent replication-related error, if any. The Relay_log_space tells you how much space the relay log is consuming.

The two most important fields are Slave_IO_Running and Slave_SQL_Running. They tell you if the IO and slave threads are running.

Replication heartbeat
Watching the values produced by SHOW MASTER STATUS and SHOW SLAVE STATUS can give you a rough idea of how up to date a slave is. The trouble with relying on that information is that you're only looking at bytes. You can determine how many more bytes of log data the slave needs to execute before it is current. However, that doesn't tell you how many queries need to be executed. There's no good way to figure that out, short of running the binary log through mysqlbinlog and counting queries.

It is possible to determine how out of date the slave is with some degree of accuracy by implementing a simple heartbeat system. The heartbeat principle is easy. At a fixed interval, say 20 seconds, a process on the master inserts a record with the latest timestamp into a table. On the slave, a corresponding process reads the most recent record every 20 seconds. Assuming that the system clocks on both machines are in sync, you can tell how far behind the slave is to within 20 seconds of accuracy.

See the write_heartbeat and read_heartbeat scripts in Section 7.5.4 for a sample implementation.

Log Rotation
Binary log files accumulate on the server until they are explicitly removed. The SHOW MASTER LOGS command tells you how many logs there are at any given time. To remove one or more logs, use the PURGE MASTER LOGS TO ... command. It removes all the logs up to but not including the given log name.

Here's an example:

mysql> SHOW MASTER LOGS;+----------------+| Log_name    |+----------------+| binary-log.001 || binary-log.002 || binary-log.003 || binary-log.004 |+----------------+4 rows in set (0.02 sec)mysql> PURGE MASTER LOGS TO 'binary-log.004';
The command tells MySQL to remove binary-log.001, binary-log.002, and binary-log.003. Be careful not to remove logs too quickly. If a slave is offline for a significant period of time, there's a chance that it still needs one or more of the logs you removed. If you're in doubt, run SHOW SLAVE STATUS on each slave to verify which log it is using.

To automate this process, see the purge_binary_logs script in Section 7.5.4.

Changing Masters
Sooner or later you'll need to point your slaves at a new master. Maybe the old one is being replaced with a newer, faster computer; perhaps there was a failure, and you are promoting a slave to master. In MySQL 3.23 and 4.0, you need to inform the slaves about their new master. A future version of MySQL is supposed to include a fail-safe replication feature that automates the process.

A planned changing of masters is a straightforward process. (In the event of a master failure, it may not be so easy.) You simply need to issue the CHANGE MASTER TO ... command on each slave. In doing so, you inform the slave of the new master's parameters—the same ones specified in the my.cnf file. The slave will begin replicating from its new master, and MySQL will also update the master.info with the new information.

Using the right values
As usual, the devil is in the details. How do you decide which values to use? What if you get them wrong?

First, let's consider the easy case. If you are in control of the situation, the process is easy. Follow these steps:

Disconnect all clients (not slaves) from the master.

Make sure the new master is completely caught up.

Execute RESET MASTER on the new master.

Make sure each slave is caught up.

Shut down the old master.

Let all clients connect to the new master.

Issue a CHANGE MASTER TO ... command on each slave, pointing it to the new master.

The RESET MASTER command tells the master to flush all its binary logs and start fresh. By starting with a clean slate on the new master, there's no guesswork involved in determining the right log position. Since it's a brand new log, we know the position is 4, because each binary log has a 4-byte header that consumes positions 0-3.

The complete CHANGE MASTER TO ... command looks like this:

mysql> CHANGE MASTER TO -> MASTER_HOST='newmaster.example.com', -> MASTER_USER='repl', -> MASTER_PASSWORD='MySecret!', -> MASTER_PORT=3306, -> MASTER_LOG_FILE='log-bin.001', -> MASTER_LOG_POS=4;
If, on the other hand, the master crashes and you can't bring it back online in a reasonable amount of time, things aren't so clear-cut. If you have only one slave, of course, there's no decision to make. You use the slave. But if you have multiple slaves, you need to determine which one is the most up to date.

By examining the output of SHOW SLAVE STATUS on each slave, you can easily determine which one is closest to matching the master at the time it crashed. Once you know the log name and position, you can construct a CHANGE MASTER TO ... command to run on the remaining slaves.

In doing so, however, you'll likely cause some the slaves to be slightly out of sync with their new master. To illustrate why, assume that each query is assigned an increasing unique ID number. The original master had just executed query 500 when it crashed. The "most up-to-date" slave, the new master, had executed query 496. That means that your best slave is missing four queries, and there's no way to recover them unless your application logs every query it writes, which is unlikely.

Now, let's assume that there are two more slaves, slave2 and slave3; slave2 executed query 490, and slave3 executed query 493. You have a choice. You can either point both slaves at the new master's current position (query 496) or you can try to figure the corresponding offsets for each slave in the new master's binary log. That will take more time, but it means you lose less data.

To find the matching log position for each slave, you need to have the binary log enabled on each slave. Use the mysqlbinlog command (described in Section 7.5.4) to locate the last query executed. Then locate exactly the same query in the new master's binary log. Once you find the query, you'll have the offset you need. The output of mysqlbinlog always includes the offset in a comment right before the query. For example:

$ mysqlbinlog log-bin.001...# at 683#021103 18:36:33 server id 1  log_pos 683 Query  thread_id=288 exec_time=0 error_code=0SET TIMESTAMP=1036377393;insert into test1 values (8);
The # at 683 line lists the position of the insert into test1 values (8) query in the log.

Tools
In this section, we'll look at some tools that can make dealing with replication a bit easier. A couple of the tools come straight out of the MySQL distribution, while others are home-grown and often ripe for improvement. The home-grown tools can serve as a starting point for solving your specific needs; such tools are available (and kept up to date) at http://highperformancemysql.com.

mysqlbinlog: Viewing data in logs
The mysqlbinlog utility has been mentioned several times in this chapter. It is used to decode the binary formats used by the binary log and relay log. Given a log file, it outputs the SQL queries contained in the log. Furthermore, it precedes each query with several pieces of metadata as comments.

$ mysql log-bin.001...# at 683#021103 18:36:33 server id 1  log_pos 683 Query  thread_id=288 exec_time=0 error_code=0SET TIMESTAMP=1036377393;insert into test1 values (8);
The first line contains the offset (or position) of the query in the log. The second line begins with a date and timestamp followed by the server ID of the server that first executed the query. The log position is repeated on this line and followed by the event type.

Finally, there's the ID of the thread that executed the query, followed by the time the query took to execute (in seconds) and the error code generated by the query.

You can use mysqlbinlog to pull the logs from a remote server by specifying a hostname, username, and password. Using the -o command-line option, you can specify the offset from which you'd like to start reading. For example:

$ mysqlbinlog -h slave3.example.com -u root -p -o 35532 log-bin.032
check_repl: Ensuring that replication takes place
As discussed earlier, it's important to check that your slaves are replicating properly when you expect them to. The following script connects to the local MySQL server and makes sure that replication is running by examining the output of SHOW SLAVE STATUS and checking for the both the 3.23.xx and 4.x values:

#!/usr/bin/perl -w## On a slave server, check to see that the slave hasn't stopped.use strict;use DBIx:

WIW;my $conn = DBIx:

WIW->Connect( DB => "mysql", User => "root", Pass => "password", Host => "localhost",) or exit;my $info = $conn->Hash("SHOW SLAVE STATUS"

;if (exists $info->{Slave_SQL_Running} and $info->{Slave_SQL_Running} eq 'No'){ warn "slave SQL thread has stoppedn";}elsif (exists $info->{Slave_IO_Running} and $info->{Slave_IO_Running} eq 'No'){ warn "slave IO thread has stoppedn";}elsif (exists $info->{Slave_Running} and $info->{Slave_Running} eq 'No'){ warn "slave has stoppedn";}
This script makes no effort to repair a problem; it simply reports when something is wrong. Without knowing why the failure occurred, it's probably not wise to blindly restart replication. To skip the problem query and restart replication, see the next section.

fix_repl: Skipping a bad query to continue replication
In the event that replication has stopped on a slave, you should tell the slave to skip the problem query and continue, unless the problem warrants further investigation. No restart of MySQL is necessary.

In MySQL 3.23.xx, execute:

SET SQL_SLAVE_SKIP_COUNTER=1SLAVE START
In Versions 4.0.0-4.0.2, execute:

SET SQL_SLAVE_SKIP_COUNTER=1SLAVE START SQL_THREAD
In Version 4.0.3 and beyond, execute:

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1SLAVE START SQL_THREAD
Yuck. If you're using a mixture of 3.23.xx and 4.0.x servers, it may be difficult to remember the exact syntax for each version. It's much easier to have a copy of the following fix_repl script on hand to do the hard work for you:

#!/usr/local/bin/perl -w## fix mysql replication if it encounters a problem$|=1; # unbuffer stdoutuse strict;use DBIx:

WIW;my $host = shift || 'localhost';my $conn = DBIx:

WIW->Connect( DB => "mysql", User => "root", Pass => "pa55word", Host => $host,) or die "Couldn't connect to database!";print "checking $host ... ";my $info = $conn->Hash("SHOW SLAVE STATUS"

;my $version = $conn->Scalar("SHOW VARIABLES LIKE 'Version'"

;my $fix_cmd;my $start_cmd;# 3.23if ($version =~ /^3.23/ and $info->{Slave_Running} eq 'No'){ $fix_cmd = "SET SQL_SLAVE_SKIP_COUNTER = 1"; $start_cmd = "SLAVE START";}# 4.0.0 - 4.0.2elsif ($version =~ /^4.0.[012]/ and $info->{Slave_SQL_Running} eq 'No'){ $fix_cmd = "SET SQL_SLAVE_SKIP_COUNTER = 1"; $start_cmd = "SLAVE START SQL_THREAD";}# 4.0.3 - 4.0.xx, 4.1.xx. Don't know what 5.0 will be like.elsif ($version =~ /^4.[01]./ and $info->{Slave_SQL_Running} eq 'No'){ $fix_cmd = "SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1"; $start_cmd = "SLAVE START SQL_THREAD";}# things are okay or unknown version?else{ print "GOODn"; exit;}print "FIXING ... ";$conn->Execute($fix_cmd);$conn->Execute($start_cmd);print "DONEn";exit;
Be careful with this technique. Blindly skipping queries on a slave may cause it to become out of sync with the master. If the query is failing due to a duplicate key error, it's probably safe, but you should investigate how that happened in the first place.

purge_binary_logs: Reclaiming space used by binary logs
To make log rotation easier, you can use something like the following purge_binary_logs Perl script. It connects to the specified server and checks to see how many binary logs are sitting around. If there are more than the specified threshold, it removes any extras.

#!/usr/bin/perl -w## On a slave server, purge the replication logs if there are too many## sitting around sucking up disk space.use strict;use DBIx:

WIW;my $MIN_LOGS = 4; ## keep main log plus three old binary logs aroundmy $conn = DBIx:

WIW->Connect( DB => "mysql", User => "root", Pass => "password", Host => 'localhost',);die "Couldn't connect to database!" if not $conn;## see if there are enough to bother, exit if notmy @logs = $conn->FlatArray("SHOW MASTER LOGS"

;exit if (@logs < $MIN_LOGS);## if so, figure out what the last one we want to keep is, then purge## the restmy $last_log = $logs[-$MIN_LOGS];print "last log is $last_logn" unless $ENV{CRON};$conn->Execute("PURGE MASTER LOGS TO '$last_log'"

;exit;
Depending on your needs, there's a lot of room for improvement in this script. It would be nice if the script took command-line arguments so you wouldn't need to hardcode the hostname, password, and so on. It would also be nice if the script could check the sizes of the log files. If a master is restarted very frequently, using the number of log files as a metric probably isn't as useful as checking the volume of log data. However, the script can't be run remotely if it checked log file sizes, because it needs to examine the files directly.

A valuable but difficult addition would be for the script to remove logs only if it can tell that all slaves had already read them. That requires knowing all the slaves and contacting each one to verify its progress in the replication process.

mysqldiff: Replication sanity checks
As with anything new, you may not trust replication right away. To help convince yourself that it is really doing what it should do, it's good to perform spot checks on the data, making sure that the slaves have exactly the data they should have.

This checking can be done to varying degrees of paranoia:

Simple metadata checks: make sure each table on the slaves contains the same number of rows that the same master table does.

Verify all or some of the data by comparing rows on the master and slaves.

Perform application-specific checks by running custom queries and comparing the results across servers.

The first check is quite easy to implement with a bit of Perl code:

#!/usr/bin/perl -wuse strict;use DBIx:

WIW;$|=1; # unbuffer stdoutmy $db_user = 'root';my $db_pass = 'password';my $db_name = 'test';my $master = 'master.example.com';my @slaves = qw( slave1.example.com slave2.example.com slave3.example.com);my %master_count;for my $server ($master){ print "Checking master... "; my $conn = DBIx:

WIW->Connect(User => $db_user, Host => $server, Pass => $db_pass, DB => $db_name) or die "$!"; for my $table ($conn->FlatArray("SHOW TABLES"

) { my $count = $conn->Scalar("SELECT COUNT(*) FROM '$table'"

; $master_count{$table} = $count; } print "OKn";}for my $server (@slaves){ print "Checking $server... "; my $conn = DBIx:

WIW->Connect(User => $db_user, Host => $server, Pass => $db_pass, DB => $db_name) or die "$!"; for my $table ($conn->FlatArray("SHOW TABLES"

) { my $count = $conn->Scalar("SELECT COUNT(*) FROM '$table'"

; if ($count != $master_count{$table}) { print "MISMATCH (got $count on $table, expecting $master_count{$table}n"; } } print "OKn";}exit;
The script connects to the master and gets the number of rows in each table of the given database. Then it connects to each slave and checks to see that the counts match. If they don't, it issues a MISMATCH warning.

This framework can easily be extended to handle multiple databases, perform more specific checks, and even attempt to take corrective action. It is even ready to handle multiple masters.

write_heartbeat: Generating a periodic health check heartbeat
The following script can implement a heartbeat monitoring system as described earlier. To use it, create a database named MySQL_Admin and a table named Heartbeat with the following structure:

CREATE TABLE Heartbeat( unix_time INTEGER NOT NULL, db_time TIMESTAMP NOT NULL, INDEX time_idx(unix_time))
The unix_time field holds the timestamp that is explicitly inserted into the table. The db_time field is set automatically by MySQL. By keeping track of both times and inserting new records instead of simply running an UPDATE on a single record, you maintain historical data in the event someone wants to graph or analyze it.

Let's look the script to add records:

#!/usr/bin/perl -wuse strict;use DBIx:

WIW;my $conn = DBIx::DWIW->Connect( DB => "MySQL_Admin", User => "root", Pass => "password", Host => 'localhost',) or die;my $unix_time = time( );my $sql = "INSERT INTO Heartbeat (unix_time, db_time) VALUES ($unix_time, NULL)";$conn->Execute($sql);exit;
Running the script at a fixed interval generates a heartbeat that can be used by the read_heartbeat script to monitor replication latency.

read_heartbeat: Measuring replication log using heartbeat
The companion to write_heartbeat reads the most recent timestamp from the database and computes how far behind the slave might be. Remember that we can't know this time exactly unless the heartbeat records are generated every second, which is probably overkill for most installations.

#!/usr/bin/perl -wuse strict;use DBIx::DWIW;my $conn = DBIx::DWIW->Connect( DB => "MySQL_Admin", User => "root", Pass => "password", Host => 'localhost',) or die;my $sql = "SELECT unix_time, db_time FROM Heartbeat ORDER BY unix_time DESC LIMIT 1";my $info = $conn->Hash($sql);my $time = $info->{unix_time};my $delay = time( ) - $time;print "slave is $delay seconds behindn";exit;
This script can also be extended to do far more than report on latency. If the latency is too great, it can send email or page a DBA.

liyihongcug · 发表于 2010-5-22 11:19

Slave Data Changes
It should go without saying that manually changing data on a slave is usually a very bad idea. The same holds true for programmatically changing slave data. By accidentally making changes to data on a slave, you can easily introduce data inconsistencies that may cause replication to fail. It may take hours, days, weeks, or even months for the problem to surface, and when it does, you'll be hard pressed to explain what's going on.

Before MySQL 4.0.14 there was no way to tell MySQL not to allow any changes that don't originate from replication. Instead, the best solution in versions prior to 4.0.14 has an ironic aspect to it: you need to make a change on all the slaves, removing the permissions (or even the accounts) of users who can change data.

But that solution is problematic for other reasons. You'd probably forget about the change after a while. Then, late one night, the master would fail and you would need to promote a slave to master. You'd have to spend a bit of time trying figure out why applications are mysteriously failing.

As of Version 4.0.14, adding read-only to the slave's configuration file allows the slave to process write queries only via replication.

It's worth remembering that MySQL is very trusting when it comes to replication. The slave threads don't switch identities to run each query as the same user that originally executed it on the master. Instead, the slave thread runs with the equivalent of root access on the slave. It can, by design, change any data it needs to change. The trust comes from the fact that the slaves never verify that a particular user has the necessary privileges to run a query that appears in the binary log. It blindly trusts the master and that the master's logs haven't been tampered with.

Nonunique Server IDs
This has to be one of the most elusive problems you can encounter with MySQL replication. If you accidentally configure two slaves with the same server ID they'll appear to work just fine if you're not watching closely. But if you watch their error logs carefully or watch the master with mytop (covered in Appendix B), you'll notice something very odd.

On the master, you'll see only one of the two slaves connected at a given moment. Usually all slaves are connecting and replicating all the time. On the slave you'll see frequent disconnect/reconnect messages appearing in the error log, but none of those messages will lead you to believe that the server ID of one slave might be misconfigured.

The only real harm in this situation is that the slaves can't replicate very quickly. Because the slaves (not the master) keep track of their replication progress, there's no need to worry about giving one query to the first slave, one to the other slave, and so on. Both slaves get all the data; they just get it much more slowly.

The only solution to this problem is to be careful when setting up your slaves. If you see symptoms like this, double check the configuration of each slave to ensure that it has the server ID you expect it to. You may find it helpful to keep a master list of slave-to-server-ID mappings so that you don't lose track of which ID belongs to each slave. Consider using numeric values that have some sort of meaning in your setup, such as the last octet of each machine's IP address.

Log Corruption or Partial Log Record
The second most elusive problem occurs when a binary log somehow becomes corrupted. When that happens, the slave will typically fail with an error message like:

Error in Log_event::read_log_event( ): '...', data_len=92,event_type=2
If that ever happens, there's little you can do. The slave is often confused enough that you can't simply try to skip the query and go to the next one. The only solution is to resync with the master and start over.

How does this happen? It's difficult to say. As long as the software is working properly, it could be a hardware or driver problem. Jeremy once saw a system have this problem repeatedly before he found that it had faulty RAM installed. We have heard of it happening on systems with disk controllers that don't have reliable drivers.

Bulk-Loading Data
While you can write code to load a lot of data into MySQL quickly, nothing beats the performance of using MySQL's LOAD DATA INFILE and LOAD DATA LOCAL INFILE commands to read data in from a flat file. In fact, the mysqlimport command-line tool uses LOAD DATA INFILE behind the scenes.

In all 3.23.xx versions of MySQL, replicating the LOAD DATA INFILE command is problematic. The contents of the file aren't stored in the binary log; only the query is, so the file must exist on the master until all slaves have copied it (they will do so automatically when they need it). If the file is removed prematurely, slaves can't copy the file, and replication will fail.

The LOAD DATA LOCAL INFILE command isn't affected. When the LOCAL option is specified, the mysql client reads the file from the client and generates the appropriate SQL to insert the data.

To avoid this problem, it's best either to load the data remotely using the latter syntax or to import the data programmatically. Either option ensures that the inserting is done via normal SQL statements that will all be properly logged.

Starting with Version 4.0, MySQL doesn't have this limitation. When a LOAD DATA INFILE command is issued, MySQL actually copies the entire file into the binary log. Slaves don't need to pull a copy of the original file from the master's disk.

Nonreplicated Dependencies
If you perform binary log filtering on either the master or the slave, it's quite easy to inadvertently break replication. For example, you may want to have a production database called production and a staging database called staging. The idea is to do all the necessary testing, development, and retesting in the staging database. When all the interim work is complete, you copy the data into the production database.

If the slave ignores queries from the staging database because of a filtering rule like the following, you'll probably end up frustrated:

replicate-do-db = production
You might try to run a query like this one to populate one of the production tables:

INSERT INTO production.sales SELECT * FROM staging.sales
This query works fine on the master, but the slaves will all fail because they don't have copies of the staging database. In fact, there's no easy way to make it work. Any attempt to reference the staging database is doomed to fail.

The only real solution in a case like this is to export all the data from the staging database and import it into the production database. You can do this programmatically if you want fine control over the process, or you can simply use mysqldump to dump the data to a text file and reimport it using mysql.

Missing Temporary Tables
This is really a special case of the previous example, but it warrants special attention because the real cause is a bit different. Instead of a filtering problem, this is a problem of restarting the slave at the wrong time.

Temporary tables replicate just fine, but if a series of queries that create and use a temporary table are interrupted on a slave by a restart or by stopping and starting replication, replication will fail.

Temporary tables are, by definition, temporary. When the server is restarted, they vanish. When the thread vanishes (such as with a SLAVE STOP or SLAVE STOP SQL_THREAD command), any temporary tables created by that thread vanish.

There is no good solution for this problem. On the application side, it's best if temporary tables are created as late as possible, which helps minimize the time between the creation of the table and when it is actually needed. But even this solution only decreases the likelihood of the problem occurring.

You can avoid temporary tables completely, but that may involve time-consuming application changes. You'd have to ensure that the nontemporary tables created by your application always have unique names and that they are dropped when appropriate.

Because they are transient, this problem also affects Heap tables. They are always dropped explicitly, however so they vanish only when a slave is restarted. Stopping and restarting replication on the slave doesn't affect Heap tables.

Binary Log Out of Sync with Transaction Log
We know that MySQL records queries in the binary log after it executes them. We also know that MySQL writes transactions to the binary log after they have been committed. What happens if MySQL crashes, or someone pulls the plug in the microseconds after a transaction has been committed but before it writes the transaction to the binary log?

The result is that the master will contain the results of having completed the transaction, but the slaves will never see it. Ever. The transaction may have been a simple insert, or it could have been something as dramatic as a DROP TABLE command.

There is currently no workaround for this problem. Luckily MySQL crashes are rare. Make sure the power cables are plugged in tightly!

Slave Wants to Connect to the Wrong Master
If you change the hostname of your master, it's important to tell slaves using the CHANGE MASTER command:

mysql> CHANGE MASTER TO MASTER_HOST='newmaster.example.com';
You can't simply shut down the slave, edit the my.cnf file, and start it back up. MySQL always uses the master.info file if it exists, despite the settings contained in the my.cnf file.[8]

Alternatively, you can manually edit the master.info file, replacing the old hostname with the new one. The danger in relying on this method is that the master.info file can be deprecated, replaced, or radically changed in a future version of MySQL. It's best to stick to the documented way of doing things.

Eliminating the Snapshot
With MySQL's current implementation, it's difficult to add a slave to a master after the master has been running for a long period of time. Many of the original binary logs have probably been removed to save space. Without all the logs, you can't simply configure the slave and point it at the master.

Even if you have all the binary logs on the master, it may take days, weeks, or even months for a slave to execute all the queries and finally catch up to the master. If you're looking to add slaves in a hurry, this clearly isn't the way to do it.

In either case, the ideal solution is simply to configure the new slave and tell it to begin replicating. Behind the scenes, the slave contacts the master and requests copies of the all the tables it needs, probably using a mechanism similar to LOAD TABLE FROM MASTER. The master will need a way to track all changes to tables between the time that the slave begins and finishes copying the tables. Upon completion of the copy, the slave receives all the necessary changes and begins replicating from the binary log.

An alternative is for all of MySQL's storage engines to implement a versioning scheme similar to InnoDB's. When a new slave connects and begins to copy the tables, it can get a snapshot from that moment in time. When the copy is complete, the slave can begin replicating from the binary log position corresponding to the moment when the snapshot was marked.

Fail-Safe Replication
When a master fails, you must select a new master and instruct all the slaves to connect to the new master and begin replicating. Not only is that process prone to errors, it can be time-consuming too. Ideally, MySQL should handle failover automatically.

The proposed solution involves each slave registering itself with the master so that the master can keep track of it. Not only will the master know which servers are slaves, it can also keep track of how up to date each slave is. The slaves, in turn, will also keep track of who all the other slaves are.

In the event that the master fails, the slaves can elect a master based on the available information. Ideally, they will find the slave that was the most up to date when the master went down.

Safe Multi-Master Replication
Today it's possible to use replication in a multi-master architecture, as depicted earlier (see Figure 7-3). The major drawback to doing so, however, is that you can't rely on AUTO_INCREMENT columns to function properly.

Each MyISAM table has a single counter that controls the next AUTO_INCREMENT value. Once that value has increased, it can't easily be decreased. If inserts are timed properly, they cause data to become inconsistent between the two masters.

Imagine the following events occurring on two servers, master1 and master2:

Both servers start with an empty orders table.

master1 inserts a record for customer 58, which is assigned ID 1.

master2 inserts a record for customer 1232, which is assigned ID 1.

master2 replicates master1's insert, adding the record for customer 58 and trying to assign it an ID of 1. That fails and results in a duplicate key error.

master1 replicates master2's insert, adding the record for customer 1232 and trying to assign it an ID of 1. That fails and results in a duplicate key error.

Each master was given an insert by some client before it had replicated the other master's insert. The result is that both masters are out of sync.

The current solution is to avoid using AUTO_INCREMENT fields completely and assign primary keys through some other means. You might use an MD5 hash of some values in the record, or perhaps use another library to generate a globally unique identifier (GUID).

Let's look at the two proposed solutions for the future.

Multipart auto-increment unique keys
The first is to use MyISAM's multipart auto-increment unique keys. Rather than using a single column as a primary key, you'd set up a table like this:

CREATE TABLE orders ( server_id    INTEGER UNSIGNED NOT NULL, record_id    INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, stuff          VARCHAR(255)    NOT NULL, UNIQUE mykey (server_id, record_id));
Notice that the record_id is an AUTO_INCREMENT field and is the second part of a two-part unique key. When you insert NULL into the record_id column, MySQL will consider the value of server_id when automatically generating a value.

To illustrate this, notice the following:

mysql> insert into orders values (1, NULL, 'testing');Query OK, 1 row affected (0.01 sec)mysql> insert into orders values (1, NULL, 'testing');Query OK, 1 row affected (0.00 sec)mysql> insert into orders values (2, NULL, 'testing');Query OK, 1 row affected (0.00 sec)mysql> select * from orders;+-----------+-----------+---------+| server_id | record_id | stuff |+-----------+-----------+---------+|       1 |       1 | testing ||       1 |       2 | testing ||       2 |       1 | testing |+-----------+-----------+---------+3 rows in set (0.03 sec)
MySQL, in effect, allows you to select from multiple AUTO_INCREMENT sequences based on the prefix you use. By adding a function such as SERVER_ID( ) to MySQL and rewriting the previous queries, you can use AUTO_INCREMENT with multi-master replication safely.

mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');Query OK, 1 row affected (0.01 sec)mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');Query OK, 1 row affected (0.00 sec)mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');Query OK, 1 row affected (0.00 sec)
There are three problems with this approach. First, it works only for MyISAM tables. An ideal solution works across all table types. Another issue is that all slaves require some special logic. Today, when a slave reads the binary log of a master, it knows the master's server ID as well as its own, but it doesn't really do anything with the master's server ID. In this solution, the slave has to actually use the master's server ID any time that it replicated a query that involved the mythical SERVER_ID( ) function. That makes the replication logic a bit trickier on the slaves.

You could work around the lack of a SERVER_ID( ) function by simply using the actual server ID in your SQL statements. If you know you're talking to server 12, write the query accordingly:

mysql> insert into orders values (12, NULL, 'testing');Query OK, 1 row affected (0.01 sec)
But there's the rub. You need to know, in advance of each query, what the server's ID is. Granted, the server's ID doesn't change, but if you're accessing one of many servers via a load balancer or don't have a persistent connection, the server you're talking to may change often. So you'd have to deal with the overhead of obtaining the server's ID whenever you need it.

mysql> show variables like 'server_id';+---------------+-------+| Variable_name | Value |+---------------+-------+| server_id    | 102 |+---------------+-------+1 row in set (0.00 sec)
Finally, and most importantly, using two columns as the primary key just doesn't feel natural. It feels like a hack or a workaround. If this solution became widespread, others problems might arise. For example, setting up foreign-key relationships would be troublesome. Putting aside the fact that InnoDB doesn't even support multipart auto-increment unique keys, how would you define a foreign-key relationship with multipart keys?

liyihongcug · 发表于 2010-5-22 11:20

Mysql 复制的各种操作 - MySql - 我的编程小屋默认分类 2010-04-20 17:45:06 阅读15 评论0 字号：大中小
姓名：hnynes 职业：软件开发年龄：24 位置：广东个性介绍：活泼.开朗.

最近对mysql的复制用得比较多，防止忘掉，在此记下各种操作。
You do not need the line
master-port=<TCP/IP port for master>

of the slave, if you not changed the port. Normal :
3306

To add a user for the master server you need this
line
mysql> GRANT FILE ON *.* TO repl@"%"
IDENTIFIED BY '<password>';

You have to change "%" to the IP Adress from the
SLAVE.
Example: mysql> GRANT FILE ON *.* TO
repl@192.168.0.2 IDENTIFIED BY 'yourpass';

A few things I came across while setting up
replication:

- Passwords can only be 16 characters long. This will
cause 'Access Denied' errors while trying to connect
to the master if set too long.

- When running replication numerous files are
created that can cause problems getting back on
track if something goes wrong. If there are
problems after you edit your my.cnf and restart
mysqld here's some cleaning up that needs to be
done while the server is shutdown (your file names
might differ):

1) On the slave (in the mysql data dir): remove
master.info file, remove all binary files created and
their indexes, remove the .err and .pid files, remove
the log.info file.

2) On the master (in the mysql data dir): remove all
binary files created and their indexes, remove
the .err and .pid files.

3) If for some reason you need to redo replication I
have found it is best to tar up the mnaster and put a
fresh copy of the database on the slave and start
again rather than trying to resolve every issue the
slave spits out. Although, it should be noted that this
is not always possible - it's a judgement call.

Here is what I had to do to set up replication with LOAD DATA FROM MASTER:

1. Edit the my.cnf file on the slave and on the master server:
master my.cnf:
[mysqld]
server-id = 1
log-bin

slave my.cnf:
[mysqld]
server-id = 2
master-host = master-host.net
master-user = repl
master-password = repl_pass
replicate-do-db = database1
replicate-do-db = database2
log-warnings

2. Restart both the slave and the master.

3. On the master do:
GRANT SUPER,REPLICATION CLIENT,REPLICATION SLAVE,RELOAD ON *.* TO repl@"%" IDENTIFIED BY 'repl_pass';

4. On the slave do:
LOAD DATA FROM MASTER;

The slave should now download the databases specified with replicate-do-db in the slaves my.cnf, and be replicating new data.

Tested on MySQL versions 4.0.20

1. Edit the my.cnf file on the slave and on the master server:
-master my.cnf:
[mysqld]
server-id = 1
log-bin

-slave my.cnf:
[mysqld]
server-id = 2
#replicate-do-db = database1 # for Replicating specific databases

2. Restart both the slave and the master.

3. SQL SYNTAX:
--On the master do

mysql> GRANT SUPER,REPLICATION CLIENT,REPLICATION SLAVE,RELOAD ON *.* TO repl@"slave.host" IDENTIFIED BY 'password';

--On the SLAVE do

mysql> CHANGE MASTER to MASTER_HOST='master.host’,MASTER_PORT=3306,
MASTER_USER='repl',MASTER_PASSWORD='password',MASTER_LOG_FILE='',MASTER_LOG_POS=4;

mysql> STOP SLAVE; #--If already started

mysql> START SLAVE;

4. On the slave do:

mysql> LOAD DATA FROM MASTER;

The slave should now download the databases specified with replicate-do-db in the slaves my.cnf, and be replicating new data.

NOTE: Please sure that the MASTER_USER had required privilege .

These are the configuration i had try out:

MySQL Failover Circular Replication
===================================
Assume we have 2 servers: Server1 and Server2.

Server1 Settings
================
1. Put the option file my.cnf to Server1 path /etc with these settings:
[mysqld]
log-bin=mysql-bin
server-id = 1
auto_increment_increment = 10
auto_increment_offset = 1

2. Change mode/permission for my.cnf to _rw_r__r__ else mysql will igonore it.
sudo chmod 644 my.cnf

3. Stop and start mysql.
cd /Library/StartupItems/MySQLCOM
sudo ./MySQLCOM stop
sudo ./MySQLCOM start

4. Configure the server:

# create a user for replication process:
cd /usr/local/mysql/bin
./mysql -p -u root

create user replicant@'%' identified by 'password';

# Grant access rights:
GRANT SELECT, PROCESS, FILE, SUPER, REPLICATION CLIENT, REPLICATION SLAVE, RELOAD ON *.* TO replicant@'%';
Flush Privileges;

# Specify the info for the serve2:
CHANGE MASTER TO
MASTER_HOST='ip_of_server2',
MASTER_USER='replication_user_name_on_server2',
MASTER_PASSWORD='replication_password_on_server2';

# Start the listerner:
Start slave;

# Verify whether the replication is working:
show slave status\G

Server2 Settings
================
1. Put the option file my.cnf on to Server2 path /etc
with these settings:
[mysqld]
log-bin=mysql-bin
server-id = 2
auto_increment_increment = 10
auto_increment_offset = 2

2. Change mode/permission for my.cnf to _rw_r__r__ else mysql will igonore it.
sudo chmod 644 my.cnf

3. Stop and start mysql.
cd /Library/StartupItems/MySQLCOM
sudo ./MySQLCOM stop
sudo ./MySQLCOM start

4. Configure the server:

# create a user for replication process:
cd /usr/local/mysql/bin
./mysql -p -u root

create user replicant@'%' identified by 'password';

# Grant access rights:
GRANT SELECT, PROCESS, FILE, SUPER, REPLICATION CLIENT, REPLICATION SLAVE, RELOAD ON *.* TO replicant@'%';
Flush Privileges;

# Specify the info for the serve1:
CHANGE MASTER TO
MASTER_HOST='ip_of_server1',
MASTER_USER='replication_user_name_on_server1',
MASTER_PASSWORD='replication_password_on_server1';

# Example:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1',
# MASTER_USER='replicant', MASTER_PASSWORD='password';

# Load data from Server1:
Load Data from Master;

# Start the listerner:
Start slave;

http://bailkai.blog.163.com/blog/static/8201173620103205456804/

gates_gp · 发表于 2010-6-3 22:46

学了一段时间的mysql, 感觉mysql的复制确实比较简单，但是还是需要一些第三方的软件来保证数据的完整性。
对于master + slaves的结构，大家是如何实现DAL层的呢?

Mysql 复制 是否2台机器都一定需要HEARTBEAT

Mysql 复制是否2台机器都一定需要HEARTBEAT