Somacon.com: Articles on websites & etc.

§ Home > Index > Web Development

Set MySQL character set and collation to UTF-8 (RDS)

These are notes for how to set UTF-8 (UTF8) collation and character set in MySQL database server. There are also notes on how to set the dfeault collation and characters set to UTF-8 on Amazon RDS MySQL database instances.

utf8_unicode_ci vs utf8_general_ci collation

In general, utf8_unicode_ci should always be used. It provides correct sorting in all languages. The utf8_general_ci collation takes shortcuts and does not sort exactly right in certain languages, and its only benefit is a tiny performance increase.

Notes


# Show current collation variable settings
SHOW VARIABLES LIKE '%colla%';

# This is the output you want to see
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+

# Show current character set variable settings
SHOW VARIABLES LIKE '%char%';

# This is the output you want to see
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+

# Change the MySQL database collation to UTF-8
ALTER DATABASE my_database DEFAULT COLLATE utf8_unicode_ci;

# Change the MySQL database character set to UTF-8
ALTER DATABASE my_database DEFAULT CHARACTER SET utf8;

# These settings should be in my.cnf
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
collation_connection=utf8_unicode_ci
collation_database=utf8_unicode_ci
character_set_server=utf8
character_set_database=utf8

Amazon RDS Set Database Parameters

# You can create a DB PARAMETER GROUP FROM AMAZON RDS CONSOLE

# Download RDS CLI command line tools
wget http://s3.amazonaws.com/rds-downloads/RDSCli.zip

# Unzip the tools
unzip RDSCli.zip

# Check the version of Java (must be at least 1.6 for RDSCLI 1.12+)
java -version

# Add the following lines to a batch file called setup-rds-environment.bat
# Double-click this batch file to open a pre-prepared command prompt
# It will show RDS status upon opening
SET AWS_RDS_HOME="C:\RDSCli\RDSCli-1.12.001"
SET JAVA_HOME="C:\Program Files\Java\jre6"
SET AWS_CREDENTIAL_FILE="C:\RDSCli\aws-credential-file.txt"
CD C:\RDSCli\RDSCli-1.12.001\bin
CMD /K "rds-describe-db-instances"


# Check if rds tools are working
rds --help

# Show current status of database instances (rebooting, active, etc.)
rds-describe-db-instances

# Use the following to specify the 
#  --I ACCESSKEY --S "SECRETKEY"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_allowed_packet, value=67108864, method=immediate"

# must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=268435456, method=pending-reboot"

# Does not work
# rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_log_file_size, value=67108864, method=pending-reboot"


rds-modify-db-parameter-group "my-database-param-group" --parameters "name=character_set_server, value=utf8, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=collation_server, value=utf8_unicode_ci, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=character_set_database, value=utf8, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=collation_connection, value=utf8_unicode_ci, method=immediate"

# Note, collation_database is not a valid setting

# You can assigin DB PARAMETER GROUP FROM AMAZON RDS CONSOLE

# Reboot the MySQL database
# Reboot is also possible from the Amazon RDS Console
rds-reboot-db-instance "my-database-param-groups"

------------------------------------------------------------

# the following parameter must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=134217728, method=pending-reboot"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=myisam_sort_buffer_size, value=1048576, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=key_buffer_size, value=8388608, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_connections, value=24, method=immediate"



DEFAULTS-except max_allowed_packet-----------------------------------------------------------
# must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=402653184, method=pending-reboot"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=myisam_sort_buffer_size, value=8388608, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=key_buffer_size, value=16777216, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_connections, value=48, method=immediate"


Altering MySQL database, table, and column collations to utf8_unicode_ci


# Show database collations
select * from information_schema.schemata;

# Alter database collations
ALTER DATABASE dbname DEFAULT COLLATE utf8_unicode_ci;

# Show tables using utf8_general_ci
select distinct table_schema, table_name, table_collation from information_schema.tables where table_collation = 'utf8_general_ci' and table_type != 'SYSTEM VIEW' AND table_schema != 'mysql';

# Alter table collation (note, this does not change the data in the columns. 
#     you must convert the data using the command below)
ALTER TABLE dbname.tablename COLLATE utf8_unicode_ci;

# Show columns
select table_schema, table_name, column_name, collation_name, character_set_name
from information_schema.columns 
where table_schema not in ('information_schema','mysql','tmp') 
and collation_name is not null and character_set_name is not null 
and collation_name='utf8_general_ci';

# Alter the data of all columns in a table
# When converting UTF-8 data stored as latin1, this command will NOT convert that to UTF-8
#  For example, a 3-byte UTF-8 character stored as 3 latin1 characters will not be converted
#   to a single UTF-8 character.  It will be converted to 3 separate UTF-8 characters 
#  representing each latin1 (gibberish) characters. You'll need to reload the data into the table.
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;



Created 2013-01-31, Last Modified 2016-12-01, © Shailesh N. Humbad
Disclaimer: This content is provided as-is. The information may be incorrect.