cancel
Showing results for 
Search instead for 
Did you mean: 

File encoding - UTF-8 / ISO

Community Veteran
Posts: 14,345
Thanks: 685
Fixes: 10
Registered: 01-08-2007

File encoding - UTF-8 / ISO

Hi
I'm having trouble with a php site I'm helping to convert to dynamic. The site is UTF8 but the php scripts are ANSII. The data has been scraped and stored in utf8_unicode fields in mysql.
The rss feed is not displaying but is putting out garbage characters at the beginning. I had this a few days ago and it was down to the included files being a different encoding. They're now all the same but now the files have been uploaded to the server the problem has returned. I suspect there is mixed encoding in the files.
Does anyone know of a program that will let me examine the encoding letter by letter to see if there is mixed encoding in the output?
I need a new signature... i'm bored of the old one!
2 REPLIES
kmilburn
Grafter
Posts: 902
Thanks: 2
Registered: 30-07-2007

Re: File encoding - UTF-8 / ISO

The garbled characters at the begining are probably the Byte Order Mark (BOM),  which identifies the encoding of the file/data.
No BOM can mean it's either a plain text for,  or  it might be UTF-8.
If the characters are  0xEF, 0xBB, 0xBF  ()  the data is UTF-8.
For UTF-16,  it'll be either 0xFE,0xFF (þÿ) if big endian and 0xFF,0xFE (ÿþ) for little endian.
A good editor for checking and changing the file encoding is Programmers Notepad.
The only way to confirm the exact contents of the files is to use a Hex editor,  XVI32 or HxD are good choices.
It's not uncommon to have to write code to read or skip the BOM.
Community Veteran
Posts: 14,345
Thanks: 685
Fixes: 10
Registered: 01-08-2007

Re: File encoding - UTF-8 / ISO

I use notepad++ which allows you to set the encoding however I have content coming from included files, templates, even the database so it's a bit hard to know exactly what is really what and where.
The hex editor tip is probably what I'm looking for - hopefully that might point something out to me although it has been years since I looked at hex so maybe that will do me some good.
I need a new signature... i'm bored of the old one!