miniBB ® 

miniBB

®
Support Forums
  
 | Start | Register | Search | Statistics | File Bank | Manual |
The Other miniBB Support Forums / The Other /  
 

How to store 4 byte unicode characters (emojis)?

 
Author Karel
Partaker
#1 | Posted: 28 Sep 2020 21:40 
Recently someone tried to post a message in my forum, containing only one character:

πŸ‘πŸ»

I've got an e-mail notification about this (the symbol was properly displayed in the e-mail body).

However when I checked the forum itself, I've discovered that the post is empty (minibbtable_posts.post_text contained an empty string, instead of πŸ‘πŸ»).

When I tried to insert the πŸ‘πŸ» into the minibbtable_posts.post_text manually (using phpMyAdmin), I've got the following warning:

Warning: #1366 Incorrect string value: '\xF0\x9F\x91\x8D\xF0\x9F...' for column 'post_text' at row 1

and instead of πŸ‘πŸ», a couple of question marks ?? was inserted.

At first I thought that it was a bug, but when I tested it in your demo forum, everything was ok. So it seems like a problem on my side.

By default, I use utf8_czech_ci collation for string type columns (in miniBB tables and all others). When I change collation to utf8mb4_czech_ci for example (on table column level), i. e. something like

ALTER TABLE `minibbtable_posts` CHANGE `post_text` `post_text` TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci NOT NULL;

I can insert everything, including πŸ‘πŸ», πŸ’© etc. But only in phpMyAdmin. When I submit a post using miniBB, πŸ‘πŸ»s are replaced with question marks.

So the question is, what is the proper database/miniBB setup for proper 4 byte characters support.

Author Karel
Partaker
#2 | Posted: 13 Oct 2020 13:26 
UPDATE: After migrating to new server with MariaDB 10.3.17, miniBB "crashes" when one tries to submit post containing a 4-byte emoji:

post body:
Unicode Character "πŸ‘" (U+1F44D)

clicking on "Post message" results in this database level error:
insert into minibbtable_posts (forum_id,topic_id,poster_id,poster_name,post_text,post_time,poster_ip,post_status) values ('3','9068','1','Admin','Unicode Character "Δ‘ΕΊ'Ε€Δ‘ΕΊΕΉΒ»" (U+1F44D)','2020-10-13 12:21:45','37.221.243.246',0)
Incorrect string value: '\xF0\x9F\x91\x8D\xF0\x9F...' for column `www-eknih`.`minibbtable_posts`.`post_text` at row 1. Try another name or value.


Other than that, the forum behaves just fine after the upgrade (i. e. suports unicode messages, without emojis).

Author tom322
Active Member
#3 | Posted: 13 Oct 2020 16:46 

Author Karel
Partaker
#4 | Posted: 13 Oct 2020 17:31 
tom322:
Possible solution:
Thanks for the reply. My server character set is cp1252 and I most probably cannot change that (webhosting limitations), but it seems that it doesn't matter and that altering the tables

ALTER TABLE `minibbtable_posts` CHANGE `post_text` `post_text` TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci NOT NULL;
ALTER TABLE `minibbtable_topics` CHANGE `topic_title` `topic_title` VARCHAR(700) CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci NOT NULL DEFAULT '';

in combination with changing the value of mysql_set_charset parameter in setup_options.php

$mysql_set_charset='utf8mb4';

does the trick, I can submit posts containing emojis now. So the problem seems to be solved.

Author Paul
Lead Developer 
#5 | Posted: 15 Oct 2020 13:13 
Karel:
So the problem seems to be solved.
Thanks for posting the solution and solving this; however just for any case, I'd like to mention that on our forum, as well as on the demo forum, we do not have anything specific. 'character_set'_*' and 'collation_*' values in DB are mostly latin1 (default); 'post_text' and 'topic_title' are utf8_unicode_ci. As you see, it all works for emojis. But maybe it works only for specific emojis and not all :-)

I'd recommend more reading in a dedicated Manual Chapter; you could use some test scripts to analyze your database. Resetting everything to 'utf8mb4' seems to be a good solution, 'cause 'utf8' (alias 'utf8mb3') uses only up to 3 bytes per symbol storage, however 'utf8mb4' uses 4 bytes. This allows to store emojis and some rich symbols from languages like Chinese.

I will consider switching default miniBB setup to 'utf8mb4' too, seems to be the right time for it.

Also it seems in mySQL 8.0 and up, the 'utf8' alias will handle the 'utf8mb4' by default. Take care of your mySQL version.

Author Karel
Partaker
#6 | Posted: 15 Oct 2020 23:47 
Paul:
I'd recommend more reading in a dedicated Manual Chapter; you could use some test scripts to analyze your database.
...
Take care of your mySQL version.
I am glad it works and I do not mind a bit of black magic in the background which I do not fully understand. :)

I tried to post the same emoji to several forums. miniBB demo forum works well. This forum works well too (test β€” πŸ±πŸ”«). Even my prehistoric installation still running 10 year old version of miniBB with no mention of utf8mb4 anywhere works well (stores and shows problematic emojis correctly, without need to change any default setting):

--

1) Old semi-abandoned forum, miniBB 2.5a, MySQL 5.5.60, miniBB tables created with utf8_general_ci collation, there is no $mysql_set_charset in setup_options.php.

dbs_check.php says:
character_set_client: latin1
character_set_connection: latin1
character_set_database: latin1
character_set_filesystem: binary
character_set_results: latin1
character_set_server: latin1
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
collation_connection: latin1_swedish_ci
collation_database: latin1_swedish_ci
collation_server: latin1_swedish_ci


Emojis work flawlessly.

--

2) Current actively used forum, miniBB 3.4.2, MariaDB 10.3.17, miniBB tables created with utf8_czech_ci collation, $mysql_set_charset was set to 'utf8' in setup_options.php.

dbs_check.php says:
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8mb4
character_set_server: latin1
character_set_system: utf8
character_sets_dir: /usr/share/mariadb/charsets/
collation_connection: utf8mb4_general_ci
collation_database: utf8_czech_ci
collation_server: latin1_swedish_ci


Some emojis cannot be used in posts (they cause "white screen of death" with SQL command and database error displayed).

Only after I did the abovementioned 2 things, i. e.
β€” changed collation of table columns (in tables 'posts' and 'topics') to utf8mb4_czech_ci
β€” set $mysql_set_charset = 'utf8mb4' in setup_options.php

it started working as expected. Only altering tables (without changing $mysql_set_charset) was not enough.

When I remove the variable $mysql_set_charset from setup_options.php, all posts turn to gibberish.

Setting $mysql_set_charset = 'utf8' turns gibberish to Czech.

Setting $mysql_set_charset = 'utf8mb4' turns gibberish to Czech too and in addition it allows using problematic emojis; but I need to change collation of table columns to utf8mb4_czech_ci also, $mysql_set_charset = 'utf8mb4' alone is not enough.

--

So these are my observations, but I am afraid there is not much universal truth in them.

Author Paul
Lead Developer 
#7 | Posted: 18 Oct 2020 23:26 
Karel
Thank you for this report. I may expect that the browser you are using also makes sense. As about MariaDB, it also seems some way doubtful, as from your reports... anyway, I suppose, switching defaults to 'utf8mb4' truly could make sense in most of cases. I'll take care of it :)

The Other miniBB Support Forums / The Other /
 How to store 4 byte unicode characters (emojis)?
 Share Topic's Link

Your Reply Click this icon to move up to the quoted message


  ?
Post as a Guest, leaving the Password field blank. You could also enter a Guest name, if it's not taken by a member yet. Sign-in and post at once, or just sign-in, bypassing the message's text.


Before posting, make sure your message is compliant with forum rules; otherwise it could be locked or removed with no explanation.

 

 
miniBB Support Forums Powered by Forum Software miniBB ® Home  Features  Requirements  Demo  Download  Showcase  Gallery of Arts
Compiler  Premium Extensions  Premium Support  License  Contact Us
Try the Captcha add-on: protect your miniBB-forums from the automated spam and flood.


  ⇑