Somacon.com: Articles on websites & etc.

§ Home > Index > Web Development

Set MySQL character set and collation to UTF-8 (RDS)

These are notes for how to set UTF-8 (UTF8) collation and character set in MySQL database server. There are also notes on how to set the dfeault collation and characters set to UTF-8 on Amazon RDS MySQL database instances.

utf8_unicode_ci vs utf8_general_ci collation

In general, utf8_unicode_ci should always be used. It provides correct sorting in all languages. The utf8_general_ci collation takes shortcuts and does not sort exactly right in certain languages, and its only benefit is a tiny performance increase.

Notes


# Show current collation variable settings
SHOW VARIABLES LIKE '%colla%';

# This is the output you want to see
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+

# Show current character set variable settings
SHOW VARIABLES LIKE '%char%';

# This is the output you want to see
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+

# Change the MySQL database collation to UTF-8
ALTER DATABASE my_database DEFAULT COLLATE utf8_unicode_ci;

# Change the MySQL database character set to UTF-8
ALTER DATABASE my_database DEFAULT CHARACTER SET utf8;

# These settings should be in my.cnf
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
collation_connection=utf8_unicode_ci
collation_database=utf8_unicode_ci
character_set_server=utf8
character_set_database=utf8

Amazon RDS Set Database Parameters

# You can create a DB PARAMETER GROUP FROM AMAZON RDS CONSOLE

# Download RDS CLI command line tools
wget http://s3.amazonaws.com/rds-downloads/RDSCli.zip

# Unzip the tools
unzip RDSCli.zip

# Check the version of Java (must be at least 1.6 for RDSCLI 1.12+)
java -version

# Add the following lines to a batch file called setup-rds-environment.bat
# Double-click this batch file to open a pre-prepared command prompt
# It will show RDS status upon opening
SET AWS_RDS_HOME="C:\RDSCli\RDSCli-1.12.001"
SET JAVA_HOME="C:\Program Files\Java\jre6"
SET AWS_CREDENTIAL_FILE="C:\RDSCli\aws-credential-file.txt"
CD C:\RDSCli\RDSCli-1.12.001\bin
CMD /K "rds-describe-db-instances"


# Check if rds tools are working
rds --help

# Show current status of database instances (rebooting, active, etc.)
rds-describe-db-instances

# Use the following to specify the 
#  --I ACCESSKEY --S "SECRETKEY"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_allowed_packet, value=67108864, method=immediate"

# must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=268435456, method=pending-reboot"

# Does not work
# rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_log_file_size, value=67108864, method=pending-reboot"


rds-modify-db-parameter-group "my-database-param-group" --parameters "name=character_set_server, value=utf8, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=collation_server, value=utf8_unicode_ci, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=character_set_database, value=utf8, method=immediate"

rds-modify-db-parameter-group "my-database-param-group" --parameters "name=collation_connection, value=utf8_unicode_ci, method=immediate"

# Note, collation_database is not a valid setting

# You can assigin DB PARAMETER GROUP FROM AMAZON RDS CONSOLE

# Reboot the MySQL database
# Reboot is also possible from the Amazon RDS Console
rds-reboot-db-instance "my-database-param-groups"

------------------------------------------------------------

# the following parameter must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=134217728, method=pending-reboot"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=myisam_sort_buffer_size, value=1048576, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=key_buffer_size, value=8388608, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_connections, value=24, method=immediate"



DEFAULTS-except max_allowed_packet-----------------------------------------------------------
# must be 'pending-reboot'
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=innodb_buffer_pool_size, value=402653184, method=pending-reboot"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=myisam_sort_buffer_size, value=8388608, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=key_buffer_size, value=16777216, method=immediate"
rds-modify-db-parameter-group "my-database-param-group" --parameters "name=max_connections, value=48, method=immediate"


Altering MySQL database, table, and column collations to utf8_unicode_ci


# Show database collations
select * from information_schema.schemata;

# Alter database collations
ALTER DATABASE dbname DEFAULT COLLATE utf8_unicode_ci;

# Show tables using utf8_general_ci
select distinct table_schema, table_name, table_collation from information_schema.tables where table_collation = 'utf8_general_ci' and table_type != 'SYSTEM VIEW' AND table_schema != 'mysql';

# Alter table collation (note, this does not change the data in the columns. 
#     you must convert the data using the command below)
ALTER TABLE dbname.tablename COLLATE utf8_unicode_ci;

# Show columns
select table_schema, table_name, column_name, collation_name, character_set_name
from information_schema.columns 
where table_schema not in ('information_schema','mysql','tmp') 
and collation_name is not null and character_set_name is not null 
and collation_name='utf8_general_ci';

# Alter the data of all columns in a table
# When converting UTF-8 data stored as latin1, this command will NOT convert that to UTF-8
#  For example, a 3-byte UTF-8 character stored as 3 latin1 characters will not be converted
#   to a single UTF-8 character.  It will be converted to 3 separate UTF-8 characters 
#  representing each latin1 (gibberish) characters. You'll need to reload the data into the table.
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;



Have you heard of the new, free Automated Feeds offered by Google Merchant Center? Learn more in Aten Software's latest blog post comparing them to traditional data feed files.
Created 2013-01-31, Last Modified 2016-12-01, © Shailesh N. Humbad
Disclaimer: This content is provided as-is. The information may be incorrect.