Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

SRP_Run_Command in UTF8 mode reconverts UTF8 data

Our app runs in UTF8 mode and calls SRP_Run_Command() to return image metadata. It seems that if SRP_Run_Command() is called in UTF8 mode, it will convert any high-order bytes in the output to UTF8 characters. The trouble is, if the data happens to already be UTF8 encoded then its gets converted again unnecessarily on return to Basic+.

For example, this command extracts a portion of metadata that contains the copyright symbol ©, UTF8 encoded as \C2A9\

_run evalv "@ans = 'VAR'` d = 'C:\T\exiftool-13.30_32'` c = d: '\exiftool.exe -s2 ': quote( 'C:\T\Exif UTF8.jpg')` call srp_run_command( c, retval, d)` @ans = retval[ indexc( retval, 'Photo ', 1), \0D\]"



When run in ANSI mode, the data is returned as is and so shows correctly when viewed in UTF8 mode
Photo © Joshua Morris


However, when run in UTF8 mode, the data is converted again and so additional characters are inserted incorrectly:
Photo © Joshua Morris
where both  and © end up as separate UTF8 encoded characters.

The difficulty is that it is unknown how the data is encoded before running SRP_Run_Command().
For our case in this instance, we need to temporarily switch to ANSI mode around the Run Command (using SetUTF8) to get the raw data, and convert that ourselves depending on flags in the data (such as byte order marks, or XML/HTML/Exif encoding tags).

Ideally, it would be nice to tell SRP_Run_Command() not to do the UTF8 conversion automatically. I don't suppose I've missed a way to do that?

Cheers, M@

Comments

  • Try passing VARW instead VAR to the Output parameter. Let me know if that is better or worse.
  • That sounds hopeful, but I don't see a difference in the output unfortunately, even at byte level.
    This is version 2.2.14 (and 2.2.2)

    I'm guessing that VARW invokes a DLL function that's prototyped differently?
  • It was worth a shot. It is OI that does the conversions, not me. The parameters for SRP_Run_Command are set to LPWSTR, so OI converts everything to UTF-16 (or Wide characters). I pass everything to the command in UTF-16, and then I decide whether or not to convert the output of the command to UTF-16. By default, the output is assumed to be UTF-8, so I convert it to UTF-16, which OI will convert back to UTF-8 when it returns. By passing VARW, this tells SRP_Run_Command to assume the output is already UTF-16 and to do no conversion.

    It appears that this command is not returning either of these, but instead returns ANSI as it's output, which is why disabling UTF8 mode in OI helps.
  • Hmm, so why does it think the command is returning ANSI as its output, when the characters are UTF8 encoded? Is it the lack of a byte order mark, or some other encoding identifier?
  • I don't know. Is there documentation for the command?
  • Well, there is some docs on coded character sets. None of those CHARSET options seems to make any difference.

    But this program is just outputting a string of bytes to STDIO, right? Apparently the default encoding for stdio is UTF-8. So how is it being determined as ANSI instead?
  • Let's try this. Download and install version 2.2.15.2, then set our Output variable to "VARUTF8" and see if this gets it right.
  • Ok, that does seem to resolve the UTF8 - thanks :)

    One issue though is that sometimes strings come back with extra random characters added at the end. It's as if some buffer lengths are getting a bit skewed..?
  • Whoops. That's on me. Try 2.2.15.3. This zip file has full RDKs, but you only need the DLL for the fix.
  • That's much better - thanks :)
Sign In or Register to comment.