Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

Windows-1252 to utf8

edited March 2021 in General
We use OI to serve various web requests. Some of the responses contain characters like é. These characters appear incorrectly in users' browsers. The characters that appear incorrectly are the ones that OI encodes differently to utf-8. For example, the way OI encodes é is different to how utf-8 does.

Other than swapping é with "ampersand#x00E9;" in the response, is there any way to fix this? We can't use OI in utf-8 mode, so that's out of the question.

Comments

  • edited March 2021
    Actually, I have just realised that é is correct. It's only a few that are different, like the trademark symbol.
  • We support a lot of UTF-8 applications so I was surprised by your first post. It might help if you could identify a few (perhaps 3 or more) characters you think are problematic.
  • edited March 2021
    these are the ones i believe. I will test each one though

    1252 unicode #
    0x80 0x20ac ;Euro Sign
    0x81 0x0081
    0x82 0x201a ;Single Low-9 Quotation Mark
    0x83 0x0192 ;Latin Small Letter F With Hook
    0x84 0x201e ;Double Low-9 Quotation Mark
    0x85 0x2026 ;Horizontal Ellipsis
    0x86 0x2020 ;Dagger
    0x87 0x2021 ;Double Dagger
    0x88 0x02c6 ;Modifier Letter Circumflex Accent
    0x89 0x2030 ;Per Mille Sign
    0x8a 0x0160 ;Latin Capital Letter S With Caron
    0x8b 0x2039 ;Single Left-Pointing Angle Quotation Mark
    0x8c 0x0152 ;Latin Capital Ligature Oe
    0x8d 0x008d
    0x8e 0x017d ;Latin Capital Letter Z With Caron
    0x8f 0x008f
    0x90 0x0090
    0x91 0x2018 ;Left Single Quotation Mark
    0x92 0x2019 ;Right Single Quotation Mark
    0x93 0x201c ;Left Double Quotation Mark
    0x94 0x201d ;Right Double Quotation Mark
    0x95 0x2022 ;Bullet
    0x96 0x2013 ;En Dash
    0x97 0x2014 ;Em Dash
    0x98 0x02dc ;Small Tilde
    0x99 0x2122 ;Trade Mark Sign
    0x9a 0x0161 ;Latin Small Letter S With Caron
    0x9b 0x203a ;Single Right-Pointing Angle Quotation Mark
    0x9c 0x0153 ;Latin Small Ligature Oe
    0x9d 0x009d
    0x9e 0x017e ;Latin Small Letter Z With Caron
    0x9f 0x0178 ;Latin Capital Letter Y With Diaeresis

    Each of these characters' encoding in 1252 is different from its Unicode number. That seems to be the issue.
  • Before digging into this further, is your OECGI configured for the UTF-8 port?
  • no idea. I will have a look. How do you even check that?

    btw, all of the characters are appearing OK except the below ones. The char in the left was encoded in windows 1252, and the char on the right was encoded using the html character reference code. Most of these I don't care about. it's the TM symbol that we need correct, and possibly the euro sign.

    ? 80 €
     81 
    , 82 ‚
    f 83 ƒ
    " 84 „
    . 85 …
    ? 86 †
    ? 87 ‡
    ^ 88 ˆ
    ? 89 ‰
    S 8A Š
    < 8B ‹
    O 8C Œ
     8D 
    Z 8E Ž
     8F 
     90 
    ' 91 ‘
    ' 92 ’
    " 93 “
    " 94 ”
    . 95 •
    - 96 –
    - 97 —
    ~ 98 ˜
    T 99 ™
    s 9A š
    > 9B ›
    o 9C œ
     9D 
    z 9E ž
    Y 9F Ÿ
  • Sorry for the delayed response. I got caught up with several issues this week and I wanted to take the time to test a few things for you. I can confidently report that OI is encoding the characters correctly. I literally pasted your encoded characters from the above post into an OI record. I saw the characters as they should appear and then I returned this data via a web API. The content in the browser looked correct as well. This confirms my suspicion that you don't have UTF8 configured for your web requests.

    In your eserver.cfg file, you should have some lines that look like this:
    UTFPort_Disabled=0 UTFPortNumber=18089
    I am not sure if anything more is needed, but that is a starting point.
  • thank you, I will have a look at this file once i find it lol.
  • It should be in your OI folder.
  • I had a look, and those variables do not exist in the file. Do you have any documentation on how this config file works/ what the variables mean?
  • Unless they have been removed, you should have a Documents sub-folder underneath your OpenInsight folder. Look for the file called 103-966 OpenInsight OEngineServer Configuration.pdf. Note, I don't think this documents the UTF8 port but it should help with the other stuff (which you normally don't need to mess with anyway).
  • if our data in OI is not encoded in utf-8, should we still use those settings you mentioned? The html that we send is utf-8 though.
  • If you need to properly store and retrieve UTF-8 encoded data then you should use those settings.
  • ok, i will pass this on to my team
Sign In or Register to comment.