Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
Windows-1252 to utf8
We use OI to serve various web requests. Some of the responses contain characters like é. These characters appear incorrectly in users' browsers. The characters that appear incorrectly are the ones that OI encodes differently to utf-8. For example, the way OI encodes é is different to how utf-8 does.
Other than swapping é with "ampersand#x00E9;" in the response, is there any way to fix this? We can't use OI in utf-8 mode, so that's out of the question.
Other than swapping é with "ampersand#x00E9;" in the response, is there any way to fix this? We can't use OI in utf-8 mode, so that's out of the question.
Comments
1252 unicode #
0x80 0x20ac ;Euro Sign
0x81 0x0081
0x82 0x201a ;Single Low-9 Quotation Mark
0x83 0x0192 ;Latin Small Letter F With Hook
0x84 0x201e ;Double Low-9 Quotation Mark
0x85 0x2026 ;Horizontal Ellipsis
0x86 0x2020 ;Dagger
0x87 0x2021 ;Double Dagger
0x88 0x02c6 ;Modifier Letter Circumflex Accent
0x89 0x2030 ;Per Mille Sign
0x8a 0x0160 ;Latin Capital Letter S With Caron
0x8b 0x2039 ;Single Left-Pointing Angle Quotation Mark
0x8c 0x0152 ;Latin Capital Ligature Oe
0x8d 0x008d
0x8e 0x017d ;Latin Capital Letter Z With Caron
0x8f 0x008f
0x90 0x0090
0x91 0x2018 ;Left Single Quotation Mark
0x92 0x2019 ;Right Single Quotation Mark
0x93 0x201c ;Left Double Quotation Mark
0x94 0x201d ;Right Double Quotation Mark
0x95 0x2022 ;Bullet
0x96 0x2013 ;En Dash
0x97 0x2014 ;Em Dash
0x98 0x02dc ;Small Tilde
0x99 0x2122 ;Trade Mark Sign
0x9a 0x0161 ;Latin Small Letter S With Caron
0x9b 0x203a ;Single Right-Pointing Angle Quotation Mark
0x9c 0x0153 ;Latin Small Ligature Oe
0x9d 0x009d
0x9e 0x017e ;Latin Small Letter Z With Caron
0x9f 0x0178 ;Latin Capital Letter Y With Diaeresis
Each of these characters' encoding in 1252 is different from its Unicode number. That seems to be the issue.
btw, all of the characters are appearing OK except the below ones. The char in the left was encoded in windows 1252, and the char on the right was encoded using the html character reference code. Most of these I don't care about. it's the TM symbol that we need correct, and possibly the euro sign.
? 80 €
81
, 82 ‚
f 83 ƒ
" 84 „
. 85 …
? 86 †
? 87 ‡
^ 88 ˆ
? 89 ‰
S 8A Š
< 8B ‹
O 8C Œ
8D
Z 8E Ž
8F
90
' 91 ‘
' 92 ’
" 93 “
" 94 ”
. 95 •
- 96 –
- 97 —
~ 98 ˜
T 99 ™
s 9A š
> 9B ›
o 9C œ
9D
z 9E ž
Y 9F Ÿ
In your eserver.cfg file, you should have some lines that look like this:
UTFPort_Disabled=0 UTFPortNumber=18089
I am not sure if anything more is needed, but that is a starting point.