miniBB ® miniBB®
miniBB Support Forums
 | Forums | Register | Reply | Search | Statistics | Manual |
Specific miniBB Support Forums / Specific /   

problem with non-standard character set in MySQL

Author gorin
Registered
#1 | Posted: 23 Mar 2005 22:39 | Edited by: gorin 
In my MySQL configuration default-character-set is koi8r . When miniBB database is creating it's take a default charset of MySQL server for all miniBB's tables (koi8r in my situation).
Meanwhile, php's 'mysql_connect' function always set connection to database with latin1 charset. And when data are selected from db and inserted in generating html pages all non-english (e.g. russian) symbols are displayed like a '?' . This is not configuration of languagepak/apache (all charsets there are correct) - this is a problem with charset chosing by mysql_connect.

If user have an access to my.cnf file then their can put in [mysqld] section of my.cnf:
init-connect="SET NAMES your_charset"
This will cause ,,set names'' command to be issued on every connection.

Or another solution (which have to be implemented by developers of miniBB if it's right) is to define one additional variable in option_setup.php (e.g. $mysql_charset) which have to bet setted to suitable charset by user and execute an additional query after mysql_connect function:
"SET NAMES $mysql_charset"

Author Ivan
Advanced Member
#2 | Posted: 23 Mar 2005 23:09 | Edited by: Ivan 
The default current character set of MySQL database can be changed through your database manager. (In my case this is phpMyAdmin, but this is a standard option in the others, too). Simply set it to 1251...

Author gorin
Registered
#3 | Posted: 23 Mar 2005 23:27 | Edited by: gorin 
Ivan
This is not a problem. I have a correct charset in miniBB's tables (koi8r) and all data in miniBB's database are actually in koi8r too. And charset in lang/*.php is set to koi8r. But php's function 'mysql_connect' choose a latin1 charset when setting up a connection to database.

Phrases inserted from files (e.g. from lang/rus.php) to generated html's are displayed correctly. But data which has been selected from db are displayed via questions signs (''??? ??????").

I've solve this problem via first solution which i'd posted above. But this is too rough method imho.

Second solution required Team's participation is more correct i think.

Author Ivan
Advanced Member
#4 | Posted: 23 Mar 2005 23:54 | Edited by: Ivan 
You obviously have a special consideration to have your database set in KOI8-R, because your language file must be in Windows-1251. At least, the language file rus.php that is in "Downloads" is in Windows-1251, not in KOI8-R - I checked it a minute ago.

So I think, that the reason for the question marks is that your database is KOI8-R, not in miniBB. Check again your rus.php file, I think it's in Windows-1251 (what else if you got it from here?).

By the way, your e-mail templates, too, are readable with Windows-1251, with KOI8-R they become a mess :)

Author gorin
Registered
#5 | Posted: 24 Mar 2005 00:34 
Ivan
Fortunately I'm not so stupid :)

Yes, the langpak in download area are really in cp1251 but i'd use
recode CP1251..KOI8R rus.php
and related email templates are has been recoded too.

I have a reason to store all my data in koi8r - sometimes the only way to fix some misspelling or other error in text on my server (not only in mysql's database) is to fix it via remote shell. And the shell is tuned to accept koi8r charset (this is freebsd box).

Well I repeat: I have solved a problem (see my post above). It's trully problem of choosing charset by mysql_connect.

And another evidence:
untill i've not make php's 'mysql_connect' choose koi8r when connecting to db, all data selected from db have been broken. I mean a can't choose a proper charset to read selected from db data (while data from e.g. rus.php are readable in koi8r). I'd simply open my forum's main page and check all possible russian encodings one by one.

You can try yourself to create table with koi8r charset options, put into it data in koi8r and then connect via console (mysql -u user -p --set-default-charset=latin1) to db and select the data.
--set-default-charset=latin1 - this options guarantee that you will connect as mysql_connect do this.

Author Team
8-)
#6 | Posted: 24 Mar 2005 10:18 
gorin
I guess, it depends not on connect, but on which character sets are defined in mySQL options. I am developing miniBB with character_set default 'latin' and have no troubles at all with russian language. Don't forge that Apache also stores encoding directives, and I know many cases, when customly modified Apache (russian-localized) having such big troubles with encoding. Use original software from vendors, not mods.

Author gorin
Registered
#7 | Posted: 24 Mar 2005 12:27 
Use original software from vendors, not mods.

I use original software:
apache-2.0.53_1 = up-to-date with port
mysql-client-4.1.10a = up-to-date with port
mysql-server-4.1.10a = up-to-date with port
php5-extensions-1.0 = up-to-date with port
php5-mysql-5.0.3_2 = up-to-date with port

I guess, it depends not on connect, but on which character sets are defined in mySQL options.

Yeah :) you are getting close to problem. I have:
-------------------------------------------------------------------
-bash-2.05b$ mysql -u minibb -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4 to server version: 4.1.10a

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> SHOW VARIABLES;
...
| character_set_client | koi8r
| character_set_connection | koi8r
| character_set_database | koi8r
| character_set_results | koi8r
| character_set_server | koi8r
.....
---------------------------------------------------------------------- --
minibb - is mysql user which pointed in setup_options.php
That's means that i've correct character sets defined in mySQL options (i use wide-system configuration for mysql and my.cnf is placed in /etc/my.cnf)

There is a part from my.cnf file:
---------------------------------------------------------------------- ----
[client]
#password = your_password
port = 3306
socket = /tmp/mysql.sock
default-character-set = koi8r

[mysqld]
port = 3306
socket = /tmp/mysql.sock
default-character-set = koi8r
#init-connect="SET NAMES koi8r"

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates
default-character-set = koi8r
----------------------------------------------------------------------

With this settings i have russian phrases which were selected from mysql displayed via "??? ??????? ???", while phrases from rus.php are displayed correctly.

But when i uncomment string "#init-connect="SET NAMES koi8r"" in my.cnf and restart mysql - I get a right result. All symbols are displayed correctly.
Repeat: this string cause ,,set names koi8r'' command to be issued on every connection.

Author Team
8-)
#8 | Posted: 24 Mar 2005 12:32 
Check character_sets, not default character_set. All possible and supported sets must be listed here.

Author gorin
Registered
#9 | Posted: 24 Mar 2005 12:37 
Probably i dont understand you, but ofcourse all charsets are supported :)

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+------ --+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+------ --+
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | ISO 8859-1 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
+----------+-----------------------------+---------------------+------ --+

Author Team
8-)
#10 | Posted: 24 Mar 2005 12:45 
Ok.

But probably, problem might be in PHP5. We have not tested miniBB on it.

Anwyay, we are not going to change something in connection this time in default version.

Author Ivan
Advanced Member
#11 | Posted: 24 Mar 2005 13:54 | Edited by: Ivan 
Team, the problem is in the recoding from KOI8-R to standard Cyrillic, I think. PHP does not deal directly with the text, but through the codepage you have set up to work with.

gorin, you have this:

| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |

I cannot see the problem in miniBB that appears from some very particular Cyrillic server.

This:

| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |

is made for specific Russian needs despite there is one Cyrillic already, and is not standard.

We all here, in Bulgaria, write in Cyrillic, nevermind under 1251 (Cyrillic) or 1252 (Latin 1), and nobody has problems :)

Author gorin
Registered
#12 | Posted: 24 Mar 2005 14:24 | Edited by: gorin 
There is a simple script demonstrating that php chose non-mysql-server(or client) default charset, but it's (php's) default charset - latin1.
-------------------test.php------------------------------------
<?php

$link = mysql_connect('localhost', 'minibb', '=)');

if (!$link) {
printf("Connect failed: %s\n", mysql_error());
exit();
}

$charset = mysql_client_encoding($link);

printf ("Current character set is %s\n",$charset);

mysql_close($link);

?>
-------------------------------------------------------------------

after executing test.php via browser I see this string:
Current character set is latin1.
While in ALL mysql configuration sections I type koi8r. I don't know is this bug or feature of PHP5 (or even PHP4). But in php's documentation comments some people suggest to use additional query "SET NAMES" directly after mysql_connect to get valid charset.

Author Team
8-)
#13 | Posted: 24 Mar 2005 14:32 
gorin
I myself put miniBB on 4 russian sites, and they have never reported such kind of problem within 2 years already. I think, problem is somewhere in PHP, or mySQL, or deep inside the server, and not related to miniBB which uses very standard and clear procedures for conneting to database.

Author gorin
Registered
#14 | Posted: 24 Mar 2005 14:59 | Edited by: gorin 
probably you never discovered such problem because all databases in your servers are in standart latin1 encoding ?? And since mysql 4.1 branch there is many differences with charsets handling in comparation with old branches.

IMHO problem is evident: lack of convergence between database encoding and php-client encoding.

I dont know is php should try to determine database encoding automatically (probably from my.cnf) or this work lie on developers via using 'set names' query.

I suppose that second variant is more probable :)

Author tuvi
Registered
#15 | Posted: 26 Dec 2005 22:51 
I find that the data stored in the DB is latin1-encoded even though my data is utf-8... phpMyAdmin cannot display the data correctly.. Of course, the forum is running fine...

The problem is when I start to transfer the data to a different server (website)... After exporting data, when importing from phpMyAdmin, the data is latin1 or utf8?? I check the data from the exporting file, and I find that it is encoded in latin1 (ENGINE=MyISAM)... but when treating utf8 data as latin1, some characters are unknown and hence result in the symbol "?".

Gorin is right. The default charset of mysql (at least on my server) is latin_swedish_1. And when miniBB connects to mysql, does not identify the charset, the default will take effect. I think.

The workaround for me now then:

* From the exporting data file, I re-encode the data back to utf-8 (luckily for me, I just start the forum for a week, so I only have 300 posts to re-encode)

* change to ENGINE=MyISAM DEFAULT CHARSET=utf8, for all the tables in the exporting data file

* Import the data back in the new server

* In the file setup_mysql.php, add the line:
mysql_query("SET NAMES 'utf8'");
right after
@mysql_select_db($DBname) or die ('<b>Database/configuration error (DB is missing).</b>');

After the above line is added to the code, now I can post new messages and retrieve old messages... successfully in utf-8. And a bonus, now I can see actual data displayed correctly in phyMyAdmin!!!

Unless anyone can come up with a better solution???

Specific miniBB Support Forums / Specific / problem with non-standard character set in MySQL Top

Your Reply Click this icon to move up to the quoted message

 Short link for this topic:

 ?
Only registered users are allowed to post here. Please, enter your username/password details upon posting a message, or register first.


Before posting, make sure your message is compliant with our forum posting rules. If not, it may be locked or deleted with no explanation.
 
miniBB Support Forums Powered by Forum Software miniBB ® Features  Requirements  Demo  Download  Showcase  Gallery of Arts
Compiler  Premium Extensions  Premium Support  License  Contacts
Get the Captcha add-on: protect your miniBB-forums from the automated spam and flood.
Captcha Addon for miniBB