Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
Encoding issues
I am having trouble finding the right solution for the encoding. I am pretty sure that the data that is coming from the API is in ISO, or at least non-utf8. In order to have a correct json decode, I apply this, in PHP: $arr = json_decode(utf8_code($data_from_api));
Unfortunately this doesn't seem to work the other way around, which would be $json = utf8_decode(json_encode($arr)) .. I guess this makes sense because json should be in UTF-8 anyways. But when I leave the utf8_decode out, the subvalue markers are not saved correctly to my record.
So what is the best practise on how to handle this? Does the SRP framework take care of encoding, regardless what the encoding settings of the server are? Are there utilities or settings to force UTF-8 encoding?
Unfortunately this doesn't seem to work the other way around, which would be $json = utf8_decode(json_encode($arr)) .. I guess this makes sense because json should be in UTF-8 anyways. But when I leave the utf8_decode out, the subvalue markers are not saved correctly to my record.
So what is the best practise on how to handle this? Does the SRP framework take care of encoding, regardless what the encoding settings of the server are? Are there utilities or settings to force UTF-8 encoding?
Comments
Having said that, your post raises a few questions:
1+2) PHP doesn't decode json nicely with ISO encoding and 'high ascii'. So when I UTF-8 encode the entire string, then it decodes perfectly. But somehow I can't get this to work the other way around, so I figured I may get pass this by using UTF-8 on all levels.
This stackoverflow article seems to support my theory: http://stackoverflow.com/questions/995063/php-convert-to-iso-8859-9
3) Good question. The framework didn't automatically convert the SVM markers into an array. I was trying to keep the http framework in tact, and maybe was a little bit lazy too. So I figured I would convert this on my PHP side, where I map the JSON data to objects anyways. It would be very easy to convert the data to an array there. Otherwise I'd need to write custom code for every object that uses multi value fields like this, while in PHP i'd create two functions and have the data mapper do the work for me.
4) I was thinking the exact same thing on my way home. I didn't check this yet, I assumed the unirest class would handle this properly, but I will certainly check this. In the 'Registry Configuration' article on the wiki the last key/value is AdditionalValues="HTTP_AUTHORIZATION,HTTP_MEDIA_TYPE,HTTP_ACCEPT_ENCODING,HTTP_ACCEPT_CHARSET,HTTP_ACCEPT_LANGUAGE". To me, the 'ACCEPT_ENCODING' indicates that the framework maybe automatically does SetUTF(1) as you suggested. So this is worth a try :-)
5) Encoding with base64 might be my last resort. I don't like the extra overhead if it's not needed, and also the human readable input and output is nice. I find base64 a lot harder to read in the log files than json ;-).
Thank you for your input, there sure are some options left to try out.
How can I check in what encoding Open Insight is running? Is there a TCL command I can issue?
However, this does not cause OECGI engines to run in UTF8 mode. You should always plan to use the SetUTF8() function.
Now I turn on the UTF8 flag for my API, within HTTP_MCP as you suggested. What will happen with the data that I retrieve from files with a read statement? Will OI assume that it's UTF-8 encoded, because the SetUTF8 flag is set to 1? (which is then a false assumption?) Wouldn't this cause problems, because the data isn't stored as UTF-8?
Or is the OI engine smart enough to convert between these encodings; will data storage be independent from the UTF8 flag? I guess it should be, if you are able to change the UTF-8 flag at runtime.
I think I am a bit further in finding the problem. It seems to me that my apache2 server is sending the data to the CGI script urlencoded. Because when I do exactly the same post to request.bin, I see the correct data.I guess it's normal that the data is URL encoded, because I see a DecodePercentageString now in HTTP_SERVICES. I tried adding the subvalue marker here, I added a wap from %FC to ü. This didn't help much unfortunately.
So in the end I am probably better off working with proper json, instead of keeping in the subvalue marker. So I'll try to figure out what the best way is to overload this for specific endpoints.
Supporting MV or AMV was relatively easy. If a field is just MV then I create an array and name it after the field. If several MV fields were grouped together as AMVs then I create multiple arrays and make them subordinate them to an object named after the AMV Group.
But how do I organize @SVM, @TM, and @STM data? I have no name that I can use other than making one up based on the field. How do I distinguish between the different delimiters? If one field only as @STM data, do I embed several objects in order to denote depth or do I flatten it out? The challenge here is going back the other way. I have to make assumptions about which delimiters to use for each array and that is only possible based on the depth of the array within the JSON object itself.
For these reasons, I do recommend a custom logic in your end point API rather than a complete reliance upon HTTP_Resource_Services. Your idea of an "overload" might work out the best. I've done something similar. That is, I let HTTP_Resource_Service do the heavy lifting, bring the resulting JSON string back into my end point API, and then use SRP_JSON to make it into an object again and then customize the JSON as needed.
Another idea I had was that maybe I can pass in a callback-like structure. So then the HTTP_Resource_Service could check if this parameter is set, and then execute that bit of code.
Anyways, when I was driving home I thought it would be better to have this SVM decoding in the standard procedure, because I think I will need it again. I'm not sure what AVM means (I consider myself still a OI rookie), but in my case I use the SVM and VM delimiters to have an array in an array. It basically represents a list of files that go with a product. For each file I want to store several properties, so now the list is something like this:
1:@svm:filename.jpg@svm:filesize:@vm:2:@svm:file2.jpg:@svm:filesize
This is just an example but you get the idea. So in my case I think it's perfectly fine to bind it to the json without having to add a name. It's just an array in an array.
Or is this already implemented and am I doing something wrong?
@KevinFournier thanks for pointing that out.. i suppose it's best to avoid those characters in the content anyways :-)
Another option for you to consider is to create a symbolic column that helps with some of the normalization of your data...or you can use it to create a partial JSON string so that you do not have to do this in your end point API code as I originally suggested. For instance, you could convert each group of @SVM delimited data into a JSON object and then keep the @VM delimiter. I haven't tested this kind of solution, but I think it could work. Then you would just call HTTP_Resource_Services and exclude the data columns that you are reworking as a calculated column.
That said, this really isn't my recommendation. I was merely trying to suggest one way to avoid re-writing a core Framework routine.
In HTTP_RESOURC_SERVICES at line 373, I changed it to: In HTTP_JSON_SERVICES there are two locations where basically the same change needs to be done. First one is in the SetHALItem label, at line 182. The same check for @VM is done in the SetHALCollectionEmbedded label. After this code edit, it's around line 472.
I am probably missing your point Don, about binding to names, so if this is a dumb thing to do, please let me know.
BTW, I am note sure how the basic plus syntax highlighter thingy should work; in my preview it screws up the line numbers. Maybe a nice topic for in the FAQ because I couldn't find it there :-).