Source-Replica Tutorial

Last Update: January 2021

Exercise 7

In this exercise you will: intentionally remove data from the replica, re-checksum the cluster, and fix the data drift.

7.1 - Remove Data from the Replica

Under normal production environments, your replicas are ran in read-only mode (if they aren’t, fix this!). There are two ways you can override the read-only flag: replication and any user with SUPER privilege.

Let’s log in to our MySQL-DB2 instance and remove some data. We can do this because we have the SUPER privelege on our ‘root’ account.

mysql-db2> DELETE FROM imdb.company_name WHERE RAND()*100 < 5;

This will remove several thousand rows from the table.
7.2 - Detect Replica Drift

Now, let’s log back in to our MySQL-DB1 (source) instance and re-run our checksum. If we suspect a single table might have some problems in it, we can checksum just that one table to save some time.
```
pt-table-checksum  h=localhost -d imdb -t company_name -u checksum -p Checksum1# \
  --no-check-binlog-format --no-version-check
```
You’ll notice the command is almost the same as the previous exercise but with the addition of -d and -t which allow you to specify a single database and table or multiple databases and tables in a comma-separated list.

You should have seen similar to this as your output:
```
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
10-11T15:26:02      0      1   241457       5       0   3.661 imdb.company_name
```
Under the DIFFS column, it shows 1, indicating 1 of the 5 chunks had a problem on the replica.
7.3 - Fixing Replica Drift

Now that we have successfully detected the drift, we need to fix it. We do that with another Percona Toolkit tool named pt-table-sync.

pt-table-checksum saves all of the chunk/checksum information to a table called checksums in the percona database. pt-table-sync is able to read this data and isolate which chunk is bad and then do a row-by-row comparison to find that specific row (or rows) that may be missing or have incorrect column-level values.

Let’s use pt-table-sync to first display the SQL necessary to fix the replica. First, log back in to your MySQL-DB1 (source) instance.
```
pt-table-sync --replicate percona.checksums h=localhost,u=checksum,p=Checksum1# \
  -d imdb -t company_name --no-version-check --print | wc -l
```
The output redirect (ie: pipe) to word count is optional. It serves as a quick check that the tool will execute the same number of SQL statements, as the number of rows we deleted originally, to fix the replica. You use –print to sanity check what will be executed on the source.

Remember that last part: executed on the source

We NEVER execute directly on the replica. We always execute on the source and let the changes flow through replication. Because the source already has this data, the command is considered a no-op (no operation) and doesn’t incur any significant load on the source. The DML still flows thru replication down to all replicas.

Run the tool again, this time specifying –execute to actually execute each statement.
```
pt-table-sync --replicate percona.checksums h=localhost,u=checksum,p=Checksum1# \
  -d imdb -t company_name --execute --verbose
```
You should see output similar to this:
```
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0    1001      0      0 Chunk     18:16:45 18:16:46 2    imdb.company_name
```
We can see here that 1001 REPLACE statements were executed. Let’s verify the checksum again.

7.4 - Verify Fixed Drift

Let’s run another checksum on just this table and verify our pt-table-sync worked.

pt-table-checksum  h=localhost -d imdb -t company_name -u checksum -p Checksum1# \
  --no-check-binlog-format --no-version-check
    
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
10-11T18:18:48      0      0   241457       5       0   2.670 imdb.company_name

Excellent! Our checksum now show no differences between our source and our replica.