Somacon.com: Articles on websites & etc.

§ Home > Index > Databases

SQL to Upgrade Entire MySQL database to utf8mb4

Here are SQL queries to help migrate an entire MySQL database to the utf8mb4 character set, and utf8mb4_unicode_ci collation.

Warning: Backup your database before running these commands. This article assumes you are a MySQL pro. Do not use if you do not know what you are doing, or on databases you are not familiar with!

The queries below will generate ALTER queries to perform the conversions, for anything that needs to be converted. This way, you can review the queries before running them. If there are no results, then everything is already converted.

HeidiSQL Export Generated Queries

To export the generated queries easily, follow these steps:

  1. Modify the query to only select the GeneratedQuery column
  2. Run the query in HeidiSQL
  3. Select Tools > Export grid rows from the menu
  4. Select Copy to clipboard
  5. Select Delimited text
  6. Select Complete for the row selection
  7. Uncheck Include column names
  8. Click the OK button
  9. Paste the results in a new Query tab

Check current variable settings

The first step in the conversion process is to check your current variables using this query.

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

my.cnf or my.ini Settings to enable utf8mb4

Modify your MySQL server configuration file to enable UTF8MB4 at the server level with the following lines:

skip-character-set-client-handshake
collation_server              = utf8mb4_unicode_ci
character_set_server          = utf8mb4

Upgrade all tables to InnoDB engine and Dynamic row format

This is to support longer key prefixes.

SELECT *,
  CONCAT('ALTER TABLE ',TABLE_SCHEMA, '.',TABLE_NAME,' ENGINE=InnoDB,ROW_FORMAT=Dynamic;') AS GeneratedQuery
FROM INFORMATION_SCHEMA.TABLES WHERE (ENGINE != 'InnoDB' OR ROW_FORMAT != 'Dynamic')
AND TABLE_SCHEMA NOT IN ('sys','performance_schema','mysql','information_schema')
ORDER BY TABLE_SCHEMA;

Upgrade databases to utf8mb4

SELECT *,
CONCAT('ALTER DATABASE ',SCHEMA_NAME,' CHARACTER SET = utf8mb4 COLLATE = UTF8MB4_UNICODE_CI;') AS GeneratedQuery
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME NOT IN ('sys','performance_schema','mysql','information_schema')
	AND (DEFAULT_CHARACTER_SET_NAME != 'utf8mb4'
		OR DEFAULT_COLLATION_NAME != 'utf8mb4_unicode_ci')

Upgrade tables and columns to utf8mb4

The naive approach (and the one I used first) would be to generate queries like this on INFORMATION_SCHEMA.COLUMNS:

ALTER TABLE table_schema.table_name MODIFY `column_name` column_type CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; DO NOT USE QUERIES GENERATED LIKE THIS FROM INFORMATION_SCHEMA.COLUMNS

However, that is a bad way to do it. That query will remove null constraints, column comments, and default settings on the modified columns! This will definitely hose a typical database schema. Instead, use MySQL's handy-dandy ALTER TABLE CONVERT TO command as follows:

SELECT *,
CONCAT('ALTER TABLE `',TABLE_SCHEMA, '`.`',TABLE_NAME,'` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;') AS GeneratedQuery
FROM INFORMATION_SCHEMA.TABLES
WHERE 1=1
	AND TABLE_SCHEMA NOT IN ('sys','performance_schema','mysql','information_schema')
	AND CONCAT(TABLE_SCHEMA,'.',TABLE_NAME) IN
	(
		SELECT CONCAT(TABLE_SCHEMA,'.',TABLE_NAME)
		FROM INFORMATION_SCHEMA.COLUMNS
		WHERE TABLE_SCHEMA NOT IN ('sys','performance_schema','mysql','information_schema')
			AND ((CHARACTER_SET_NAME IS NOT NULL AND CHARACTER_SET_NAME != 'utf8mb4')
					OR (COLLATION_NAME IS NOT NULL AND COLLATION_NAME != 'utf8mb4_unicode_ci'))
	)
ORDER BY TABLE_SCHEMA, TABLE_NAME;

This will generate the queries for each table that has any column that needs to be updated. The generated queries will update both the table and all the columns in the table.

NOTE: TEXT columns may be converted automatically to MEDIUMTEXT.

NOTE: This command could cause data loss if the character sets are incompatible. You still need to know what you are doing.

Thanks to Mathias Bynens for his article: How to support full Unicode in MySQL databases

See also Set MySQL character set and collation to UTF-8 (RDS)


Have you heard of the new, free Automated Feeds offered by Google Merchant Center? Learn more in Aten Software's latest blog post comparing them to traditional data feed files.
Created 2020-05-04, Last Modified 2020-05-08, © Shailesh N. Humbad
Disclaimer: This content is provided as-is. The information may be incorrect.