Get started with Data Masking¶
This quickstart gets the data masking component installed and lets you try the component with a test database. You need access to a Percona Server for MySQL 8.4 server. Follow the steps to install, set permissions, create sample data, and run masking examples. For production use and limitations, see the Data masking overview.
Prerequisites¶
- Percona Server for MySQL (latest recommended).
- The
component_masking_functionslibrary in the server plugin directory. RunSHOW VARIABLES LIKE 'plugin_dir';and confirm the library exists in that directory. If the library is missing, see Install the data masking component.
Install the component and set permissions¶
Do these once, in order. Full details: Install the data masking component.
-
Create the dictionary table (required before step 2):
CREATE TABLE IF NOT EXISTS mysql.masking_dictionaries( Dictionary VARCHAR(256) NOT NULL, Term VARCHAR(256) NOT NULL, UNIQUE INDEX dictionary_term_idx (Dictionary, Term) ) ENGINE = InnoDB DEFAULT CHARSET=utf8mb4;Expected output
Query OK, 0 rows affected (0.01 sec) -
Install the component:
INSTALL COMPONENT 'file://component_masking_functions';Expected output
Query OK, 0 rows affected (0.00 sec) -
Grant
mysql.sessionaccess to the dictionary table (required forgen_dictionaryandgen_blockliston 8.4.4-1 and later):GRANT SELECT, INSERT, UPDATE, DELETE ON mysql.masking_dictionaries TO 'mysql.session'@'localhost';Expected output
Query OK, 0 rows affected (0.00 sec) -
Grant dictionary management to your user (required for the dictionary-based masking section below; skip if you will not try that section). Replace
<user>and<host>with the MySQL user you connect as (for example,rootandlocalhost):GRANT MASKING_DICTIONARIES_ADMIN ON *.* TO '<user>'@'<host>';Expected output
Query OK, 0 rows affected (0.00 sec)No
FLUSH PRIVILEGESis needed; the grant takes effect immediately. If the privilege does not appear, reconnect to the server.
Create the test database and tables¶
Run the following script to create the database and both tables with sample rows. The script drops the database if the database already exists so you can run the script again without creating duplicate rows. The column types and values are chosen so you can try masking functions in the next sections.
DROP DATABASE IF EXISTS masking_demo;
CREATE DATABASE masking_demo;
USE masking_demo;
CREATE TABLE contacts (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
ssn VARCHAR(11),
card_no VARCHAR(19),
email VARCHAR(255),
phone VARCHAR(20),
notes VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO contacts (name, ssn, card_no, email, phone, notes)
VALUES
('Priya Sharma', '123-45-6789', '4111111111111111', 'priya.sharma@example.com', '1-555-010-1234', 'VIP'),
('Yuki Tanaka', '987-65-4321', '5500000000000004', 'yuki.tanaka@example.com', '1-555-010-5678', 'Newsletter'),
('Fatima Hassan', '111-22-3333', '340000000000009', 'fatima.hassan@example.com', '1-555-010-9012', NULL);
CREATE TABLE contacts_intl (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
country VARCHAR(50),
canada_sin VARCHAR(11),
uk_nin VARCHAR(20),
iban VARCHAR(40)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO contacts_intl (name, country, canada_sin, uk_nin, iban)
VALUES
('Jean Lefebvre', 'Canada', '506-948-819', NULL, NULL),
('Siobhan O''Brien', 'UK', NULL, 'CT264683D', NULL),
('Hans Mueller', 'Germany', NULL, NULL, 'DE27 1002 02003 77495 4156');
Expected output
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Database changed
Query OK, 0 rows affected (0.01 sec)
Query OK, 3 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 3 rows affected (0.00 sec)
View raw data¶
SELECT id, name, ssn, card_no, email, phone FROM contacts;
Expected output
+----+--------------+-------------+------------------+-------------------------+----------------+
| id | name | ssn | card_no | email | phone |
+----+--------------+-------------+------------------+-------------------------+----------------+
| 1 | Priya Sharma | 123-45-6789 | 4111111111111111 | priya.sharma@example.com| 1-555-010-1234 |
| 2 | Yuki Tanaka | 987-65-4321 | 5500000000000004 | yuki.tanaka@example.com | 1-555-010-5678 |
| 3 | Fatima Hassan| 111-22-3333 | 340000000000009 | fatima.hassan@example.com| 1-555-010-9012 |
+----+--------------+-------------+------------------+-------------------------+----------------+
3 rows in set (0.00 sec)
Mask sensitive columns in queries¶
The examples below use masking functions in SELECT; the table data is unchanged. Expected outputs in this guide are illustrative; results from random or dictionary-based functions vary on each run and may not match the examples. To enforce masking for other users, use a view (see Expose masked data via a view) and restrict access to the base table. For limitations and security, see the Data masking overview.
Payment card and SSN¶
mask_pan leaves the last four digits visible; mask_ssn does the same for SSN.
SELECT
name,
mask_pan(card_no) AS card_masked,
mask_ssn(ssn) AS ssn_masked
FROM contacts;
Expected output
+--------------+------------------+-------------+
| name | card_masked | ssn_masked |
+--------------+------------------+-------------+
| Priya Sharma | XXXXXXXXXXXX1111 | ***-**-6789 |
| Yuki Tanaka | XXXXXXXXXXXX0004 | ***-**-4321 |
| Fatima Hassan| XXXXXXXXXX0009 | ***-**-3333 |
+--------------+------------------+-------------+
3 rows in set (0.00 sec)
Inner and outer parts of a string¶
mask_inner keeps the first and last character(s); mask_outer masks the left and right ends. The third and fourth arguments are the number of characters to keep unmasked on the left and right.
SELECT
name,
mask_inner(name, 1, 1) AS name_inner_masked,
mask_outer(phone, 2, 4) AS phone_outer_masked
FROM contacts;
Expected output
+--------------+------------------+------------------+
| name | name_inner_masked| phone_outer_masked|
+--------------+------------------+------------------+
| Priya Sharma | PXXXXXXXXXXa | XX5-010-1234 |
| Yuki Tanaka | YXXXXXXXXa | XX5-010-5678 |
| Fatima Hassan| FXXXXXXXXXXn | XX5-010-9012 |
+--------------+------------------+------------------+
3 rows in set (0.00 sec)
Generated values (for testing)¶
gen_rnd_email() and gen_rnd_ssn() return new values each time. They are useful for generating test data, not for masking existing values in place.
SELECT
id,
gen_rnd_email(5, 8, 'test.example') AS generated_email,
gen_rnd_ssn() AS generated_ssn
FROM contacts;
Expected output
+----+---------------------------+-------------+
| id | generated_email | generated_ssn|
+----+---------------------------+-------------+
| 1 | abcde.fghijkl@test.example| 912-34-5678 |
| 2 | vwxyz.abcdefg@test.example| 987-65-4321 |
| 3 | qwert.yuiopas@test.example| 923-45-6789 |
+----+---------------------------+-------------+
3 rows in set (0.00 sec)
Output will vary on each run and may not match the expected results above. For constraints and caveats, see Data masking component functions and variables.
International data masking functions¶
The component includes functions for region-specific identifiers: Canadian Social Insurance Number (SIN) (mask_canada_sin), United Kingdom National Insurance Number (NIN) (mask_uk_nin), and International Bank Account Number (IBAN) (mask_iban). The script above already created the contacts_intl table. View the raw data, then apply the region-specific masking functions:
SELECT name, country, canada_sin, uk_nin, iban FROM contacts_intl;
Expected output
+----------------+---------+-------------+----------------+------------------------------+
| name | country | canada_sin | uk_nin | iban |
+----------------+---------+-------------+----------------+------------------------------+
| Jean Lefebvre | Canada | 506-948-819 | NULL | NULL |
| Siobhan O'Brien| UK | NULL | CT264683D | NULL |
| Hans Mueller | Germany | NULL | NULL | DE27 1002 02003 77495 4156 |
+----------------+---------+-------------+----------------+------------------------------+
3 rows in set (0.00 sec)
SELECT
name,
country,
mask_canada_sin(canada_sin) AS canada_sin_masked,
mask_uk_nin(uk_nin) AS uk_nin_masked,
mask_iban(iban) AS iban_masked
FROM contacts_intl;
Expected output
+----------------+---------+------------------+---------------+----------------------------+
| name | country | canada_sin_masked| uk_nin_masked | iban_masked |
+----------------+---------+------------------+---------------+----------------------------+
| Jean Lefebvre | Canada | XXX-XXX-XXX | NULL | NULL |
| Siobhan O'Brien| UK | NULL | CT******* | NULL |
| Hans Mueller | Germany | NULL | NULL | DE** **** **** **** **** |
+----------------+---------+------------------+---------------+----------------------------+
3 rows in set (0.00 sec)
Each function returns NULL when the input is NULL. For more options (for example, custom mask characters) and the corresponding generators (gen_rnd_canada_sin, gen_rnd_uk_nin, gen_rnd_iban), see Data masking component functions and variables.
Expose masked data via a view¶
To ensure that only masked data is visible, define a view that applies the masking functions and grant SELECT on the view (not on the base table) to users who should see masked data. Example:
DROP VIEW IF EXISTS contacts_masked;
CREATE VIEW contacts_masked AS
SELECT
id,
name,
mask_pan(card_no) AS card_no,
mask_ssn(ssn) AS ssn,
mask_outer(email, 2, 4) AS email,
mask_outer(phone, 2, 4) AS phone,
notes
FROM contacts;
SELECT id, name, card_no, ssn, email, phone FROM contacts_masked;
Expected output
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
+----+--------------+------------------+-------------+--------------------------+------------------+
| id | name | card_no | ssn | email | phone |
+----+--------------+------------------+-------------+--------------------------+------------------+
| 1 | Priya Sharma | XXXXXXXXXXXX1111 | ***-**-6789 | XXriya.sharma@exampXXXX | XX5-010-1234 |
| 2 | Yuki Tanaka | XXXXXXXXXXXX0004 | ***-**-4321 | XXuki.tanaka@exampXXXX | XX5-010-5678 |
| 3 | Fatima Hassan| XXXXXXXXXX0009 | ***-**-3333 | XXtima.hassan@examXXXX | XX5-010-9012 |
+----+--------------+------------------+-------------+--------------------------+------------------+
3 rows in set (0.00 sec)
The view uses mask_outer(..., 2, 4) for email and phone. Grant SELECT on the view (not the base table) to users who should see masked data only. For access control and limitations, see the Data masking overview.
Optional: dictionary-based masking¶
Functions such as gen_dictionary and gen_blocklist use the mysql.masking_dictionaries table.
You need the step 4 grant (MASKING_DICTIONARIES_ADMIN) and, on 8.4.4-1 and later, the step 3 grant to mysql.session described in the install guide.
Add a small dictionary with masking_dictionary_term_add, then try gen_dictionary.
The role column is chosen at random from the dictionary; your output will vary and may not match the expected results.
SELECT masking_dictionary_term_add('roles', 'Engineer');
SELECT masking_dictionary_term_add('roles', 'Analyst');
SELECT masking_dictionary_term_add('roles', 'Manager');
SELECT id, name, gen_dictionary('roles') AS role FROM contacts;
Expected output (role column varies; your output may not match)
+-----------------------------------------------+
| masking_dictionary_term_add('roles', 'Engineer')|
+-----------------------------------------------+
| 1 |
+-----------------------------------------------+
1 row in set (0.00 sec)
+----------------------------------------------+
| masking_dictionary_term_add('roles', 'Analyst')|
+----------------------------------------------+
| 1 |
+----------------------------------------------+
1 row in set (0.00 sec)
+----------------------------------------------+
| masking_dictionary_term_add('roles', 'Manager') |
+----------------------------------------------+
| 1 |
+----------------------------------------------+
1 row in set (0.00 sec)
+----+---------------+----------+
| id | name | role |
+----+---------------+----------+
| 1 | Priya Sharma | Manager |
| 2 | Yuki Tanaka | Engineer |
| 3 | Fatima Hassan | Manager |
+----+--------------+----------+
3 rows in set (0.00 sec)
To remove the dictionary when finished: SELECT masking_dictionary_remove('roles'); (masking_dictionary_remove). For large dictionaries or production use, see the Data masking overview and function list.
Clean up (optional)¶
To remove the test database and all objects in the database (tables, the contacts_masked view, and data):
DROP DATABASE IF EXISTS masking_demo;
Expected output
Query OK, 0 rows affected (0.01 sec)