Source-Replica Tutorial
Last Update: January 2021
Exercise 7
In this exercise you will: intentionally remove data from the replica, re-checksum the cluster, and fix the data drift.
-
7.1 - Remove Data from the Replica
Under normal production environments, your replicas are ran in read-only mode (if they aren’t, fix this!). There are two ways you can override the read-only flag: replication and any user with SUPER privilege.
Let’s log in to our MySQL-DB2 instance and remove some data. We can do this because we have the SUPER privelege on our ‘root’ account.
mysql-db2> DELETE FROM imdb.company_name WHERE RAND()*100 < 5;
This will remove several thousand rows from the table.
-
7.2 - Detect Replica Drift
Now, let’s log back in to our MySQL-DB1 (source) instance and re-run our checksum. If we suspect a single table might have some problems in it, we can checksum just that one table to save some time.
pt-table-checksum h=localhost -d imdb -t company_name -u checksum -p Checksum1# \ --no-check-binlog-format --no-version-check
You’ll notice the command is almost the same as the previous exercise but with the addition of -d and -t which allow you to specify a single database and table or multiple databases and tables in a comma-separated list.
You should have seen similar to this as your output:
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-11T15:26:02 0 1 241457 5 0 3.661 imdb.company_name
Under the DIFFS column, it shows 1, indicating 1 of the 5 chunks had a problem on the replica.
-
7.3 - Fixing Replica Drift
Now that we have successfully detected the drift, we need to fix it. We do that with another Percona Toolkit tool named pt-table-sync.
pt-table-checksum saves all of the chunk/checksum information to a table called checksums in the percona database. pt-table-sync is able to read this data and isolate which chunk is bad and then do a row-by-row comparison to find that specific row (or rows) that may be missing or have incorrect column-level values.
Let’s use pt-table-sync to first display the SQL necessary to fix the replica. First, log back in to your MySQL-DB1 (source) instance.
pt-table-sync --replicate percona.checksums h=localhost,u=checksum,p=Checksum1# \ -d imdb -t company_name --no-version-check --print | wc -l
The output redirect (ie: pipe) to word count is optional. It serves as a quick check that the tool will execute the same number of SQL statements, as the number of rows we deleted originally, to fix the replica. You use –print to sanity check what will be executed on the source.
Remember that last part: executed on the source
We NEVER execute directly on the replica. We always execute on the source and let the changes flow through replication. Because the source already has this data, the command is considered a no-op (no operation) and doesn’t incur any significant load on the source. The DML still flows thru replication down to all replicas.
Run the tool again, this time specifying –execute to actually execute each statement.
pt-table-sync --replicate percona.checksums h=localhost,u=checksum,p=Checksum1# \ -d imdb -t company_name --execute --verbose
You should see output similar to this:
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 1001 0 0 Chunk 18:16:45 18:16:46 2 imdb.company_name
We can see here that 1001 REPLACE statements were executed. Let’s verify the checksum again.
-
7.4 - Verify Fixed Drift
Let’s run another checksum on just this table and verify our pt-table-sync worked.
pt-table-checksum h=localhost -d imdb -t company_name -u checksum -p Checksum1# \ --no-check-binlog-format --no-version-check TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-11T18:18:48 0 0 241457 5 0 2.670 imdb.company_name
Excellent! Our checksum now show no differences between our source and our replica.