SRP_Run_Command in UTF8 mode reconverts UTF8 data

MattCrozier · May 2025

Our app runs in UTF8 mode and calls SRP_Run_Command() to return image metadata. It seems that if SRP_Run_Command() is called in UTF8 mode, it will convert any high-order bytes in the output to UTF8 characters. The trouble is, if the data happens to already be UTF8 encoded then its gets converted again unnecessarily on return to Basic+.

For example, this command extracts a portion of metadata that contains the copyright symbol ©, UTF8 encoded as \C2A9\

_run evalv "@ans = 'VAR'` d = 'C:\T\exiftool-13.30_32'` c = d: '\exiftool.exe -s2 ': quote( 'C:\T\Exif UTF8.jpg')` call srp_run_command( c, retval, d)` @ans = retval[ indexc( retval, 'Photo ', 1), \0D\]"

When run in ANSI mode, the data is returned as is and so shows correctly when viewed in UTF8 mode

Photo © Joshua Morris

However, when run in UTF8 mode, the data is converted again and so additional characters are inserted incorrectly:

Photo Â© Joshua Morris

where both Â and © end up as separate UTF8 encoded characters.

The difficulty is that it is unknown how the data is encoded before running SRP_Run_Command().
For our case in this instance, we need to temporarily switch to ANSI mode around the Run Command (using SetUTF8) to get the raw data, and convert that ourselves depending on flags in the data (such as byte order marks, or XML/HTML/Exif encoding tags).

Ideally, it would be nice to tell SRP_Run_Command() not to do the UTF8 conversion automatically. I don't suppose I've missed a way to do that?

Cheers, M@

KevinFournier · May 2025

Try passing VARW instead VAR to the Output parameter. Let me know if that is better or worse.

MattCrozier · May 2025

That sounds hopeful, but I don't see a difference in the output unfortunately, even at byte level.
This is version 2.2.14 (and 2.2.2)

I'm guessing that VARW invokes a DLL function that's prototyped differently?

KevinFournier · May 2025

It was worth a shot. It is OI that does the conversions, not me. The parameters for SRP_Run_Command are set to LPWSTR, so OI converts everything to UTF-16 (or Wide characters). I pass everything to the command in UTF-16, and then I decide whether or not to convert the output of the command to UTF-16. By default, the output is assumed to be UTF-8, so I convert it to UTF-16, which OI will convert back to UTF-8 when it returns. By passing VARW, this tells SRP_Run_Command to assume the output is already UTF-16 and to do no conversion.

It appears that this command is not returning either of these, but instead returns ANSI as it's output, which is why disabling UTF8 mode in OI helps.

MattCrozier · June 2025

Hmm, so why does it think the command is returning ANSI as its output, when the characters are UTF8 encoded? Is it the lack of a byte order mark, or some other encoding identifier?

KevinFournier · June 2025

I don't know. Is there documentation for the command?

MattCrozier · June 2025

Well, there is some docs on coded character sets. None of those CHARSET options seems to make any difference.

But this program is just outputting a string of bytes to STDIO, right? Apparently the default encoding for stdio is UTF-8. So how is it being determined as ANSI instead?

KevinFournier · June 2025

Let's try this. Download and install version 2.2.15.2, then set our Output variable to "VARUTF8" and see if this gets it right.

MattCrozier · June 2025

Ok, that does seem to resolve the UTF8 - thanks :)

One issue though is that sometimes strings come back with extra random characters added at the end. It's as if some buffer lengths are getting a bit skewed..?

KevinFournier · June 2025

Whoops. That's on me. Try 2.2.15.3. This zip file has full RDKs, but you only need the DLL for the fix.

MattCrozier · June 2025

That's much better - thanks :)

SRP_Run_Command in UTF8 mode reconverts UTF8 data

Comments