Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

SRP_Run_Command in UTF8 mode reconverts UTF8 data

Our app runs in UTF8 mode and calls SRP_Run_Command() to return image metadata. It seems that if SRP_Run_Command() is called in UTF8 mode, it will convert any high-order bytes in the output to UTF8 characters. The trouble is, if the data happens to already be UTF8 encoded then its gets converted again unnecessarily on return to Basic+.

For example, this command extracts a portion of metadata that contains the copyright symbol ©, UTF8 encoded as \C2A9\

_run evalv "@ans = 'VAR'` d = 'C:\T\exiftool-13.30_32'` c = d: '\exiftool.exe -s2 ': quote( 'C:\T\Exif UTF8.jpg')` call srp_run_command( c, retval, d)` @ans = retval[ indexc( retval, 'Photo ', 1), \0D\]"



When run in ANSI mode, the data is returned as is and so shows correctly when viewed in UTF8 mode
Photo © Joshua Morris


However, when run in UTF8 mode, the data is converted again and so additional characters are inserted incorrectly:
Photo © Joshua Morris
where both  and © end up as separate UTF8 encoded characters.

The difficulty is that it is unknown how the data is encoded before running SRP_Run_Command().
For our case in this instance, we need to temporarily switch to ANSI mode around the Run Command (using SetUTF8) to get the raw data, and convert that ourselves depending on flags in the data (such as byte order marks, or XML/HTML/Exif encoding tags).

Ideally, it would be nice to tell SRP_Run_Command() not to do the UTF8 conversion automatically. I don't suppose I've missed a way to do that?

Cheers, M@

Comments

  • Try passing VARW instead VAR to the Output parameter. Let me know if that is better or worse.
  • That sounds hopeful, but I don't see a difference in the output unfortunately, even at byte level.
    This is version 2.2.14 (and 2.2.2)

    I'm guessing that VARW invokes a DLL function that's prototyped differently?
  • It was worth a shot. It is OI that does the conversions, not me. The parameters for SRP_Run_Command are set to LPWSTR, so OI converts everything to UTF-16 (or Wide characters). I pass everything to the command in UTF-16, and then I decide whether or not to convert the output of the command to UTF-16. By default, the output is assumed to be UTF-8, so I convert it to UTF-16, which OI will convert back to UTF-8 when it returns. By passing VARW, this tells SRP_Run_Command to assume the output is already UTF-16 and to do no conversion.

    It appears that this command is not returning either of these, but instead returns ANSI as it's output, which is why disabling UTF8 mode in OI helps.
Sign In or Register to comment.