Encoding issues

r.od.akker · February 2017

I am having trouble finding the right solution for the encoding. I am pretty sure that the data that is coming from the API is in ISO, or at least non-utf8. In order to have a correct json decode, I apply this, in PHP: $arr = json_decode(utf8_code($data_from_api));

Unfortunately this doesn't seem to work the other way around, which would be $json = utf8_decode(json_encode($arr)) .. I guess this makes sense because json should be in UTF-8 anyways. But when I leave the utf8_decode out, the subvalue markers are not saved correctly to my record.

So what is the best practise on how to handle this? Does the SRP framework take care of encoding, regardless what the encoding settings of the server are? Are there utilities or settings to force UTF-8 encoding?

JaredBratu · February 2017

Are you starting OpenInsight in UTF8 mode?

DonBakke · February 2017

To follow up on Jared's question, if your web app needs to support Unicode then you should do something like calling SetUTF8(1) either in your end point service or even at the beginning of HTTP_MCP.

Having said that, your post raises a few questions:

I'm not a PHP programmer, but shouldn't your second code example look like $json = json_encode(utf8_decode($arr))?
Subvalue marks are not wide characters, so I'm not even sure why UTF8 encoding is necessary. Are you supporting UTF8 for other reasons and you just discovered the problem with subvalue marks or were you having trouble with subvalue marks and then you tried to use UTF8 as a way of rectifying the problem?
Why are subvalue marks being transported in the JSON object in the first place? This is more of a philosophical and design question, but I am curious. If you are converting your record to JSON (and back), it seems to me that all of your system delimiters should be converted into proper JSON array objects.
Probably not relevant to your situation, but always make sure that the Content-Type request header has an appropriate value (e.g., application/json or text/plain). We worked with a 3rd party mobile app developer who continually passed in application/x-www-form-urlencoded because his library defaulted to that value. This changes the content of the body enough to cause problems.
Finally, if you ever run into a problem where your content isn't transmitting back and forth as expected, you can always BASE64 encode/decode your data to make it safe for transport.

r.od.akker · February 2017

I haven't tried all options, but to answer your questions:

1+2) PHP doesn't decode json nicely with ISO encoding and 'high ascii'. So when I UTF-8 encode the entire string, then it decodes perfectly. But somehow I can't get this to work the other way around, so I figured I may get pass this by using UTF-8 on all levels.

This stackoverflow article seems to support my theory: http://stackoverflow.com/questions/995063/php-convert-to-iso-8859-9

3) Good question. The framework didn't automatically convert the SVM markers into an array. I was trying to keep the http framework in tact, and maybe was a little bit lazy too. So I figured I would convert this on my PHP side, where I map the JSON data to objects anyways. It would be very easy to convert the data to an array there. Otherwise I'd need to write custom code for every object that uses multi value fields like this, while in PHP i'd create two functions and have the data mapper do the work for me.

4) I was thinking the exact same thing on my way home. I didn't check this yet, I assumed the unirest class would handle this properly, but I will certainly check this. In the 'Registry Configuration' article on the wiki the last key/value is AdditionalValues="HTTP_AUTHORIZATION,HTTP_MEDIA_TYPE,HTTP_ACCEPT_ENCODING,HTTP_ACCEPT_CHARSET,HTTP_ACCEPT_LANGUAGE". To me, the 'ACCEPT_ENCODING' indicates that the framework maybe automatically does SetUTF(1) as you suggested. So this is worth a try :-)

5) Encoding with base64 might be my last resort. I don't like the extra overhead if it's not needed, and also the human readable input and output is nice. I find base64 a lot harder to read in the log files than json ;-).

Thank you for your input, there sure are some options left to try out.

r.od.akker · February 2017

Another question about this encoding issue; if I set SetUTF(8), will this only affect the data that is going through the web server? Or will this also assume that data that is stored in tables is in UTF-8?

How can I check in what encoding Open Insight is running? Is there a TCL command I can issue?

DonBakke · February 2017

Another question about this encoding issue; if I set SetUTF(8), will this only affect the data that is going through the web server? Or will this also assume that data that is stored in tables is in UTF-8?

The SetUTF8() function does not permanently change the UTF8 flag for OpenInsight, if that is what you are asking. It only affects the current session.

How can I check in what encoding Open Insight is running? Is there a TCL command I can issue?

To tell if the desktop OpenInsight application is running in UTF8 mode just open the Application Properties dialog and check the value of the UTF8 character mode box.

However, this does not cause OECGI engines to run in UTF8 mode. You should always plan to use the SetUTF8() function.

r.od.akker · February 2017

The SetUTF8() function does not permanently change the UTF8 flag for OpenInsight, if that is what you are asking. It only affects the current session.

No, that's not what I ment entirely. I was wondering if the SetUTF8 mode only applies to the http framework routines, or also to the data that is stored in OI. So imagine that normally our Open Insight does not run with the UTF8 flag. It then stores data to a file. This data is probably not UTF8 encoded, right?

Now I turn on the UTF8 flag for my API, within HTTP_MCP as you suggested. What will happen with the data that I retrieve from files with a read statement? Will OI assume that it's UTF-8 encoded, because the SetUTF8 flag is set to 1? (which is then a false assumption?) Wouldn't this cause problems, because the data isn't stored as UTF-8?

Or is the OI engine smart enough to convert between these encodings; will data storage be independent from the UTF8 flag? I guess it should be, if you are able to change the UTF-8 flag at runtime.

r.od.akker · February 2017

I think I am a bit further in finding the problem. It seems to me that my apache2 server is sending the data to the CGI script urlencoded. Because when I do exactly the same post to request.bin, I see the correct data.

I guess it's normal that the data is URL encoded, because I see a DecodePercentageString now in HTTP_SERVICES. I tried adding the subvalue marker here, I added a wap from %FC to ü. This didn't help much unfortunately.

So in the end I am probably better off working with proper json, instead of keeping in the subvalue marker. So I'll try to figure out what the best way is to overload this for specific endpoints.

KevinFournier · February 2017

I would just like to point out that OI's use of Char(255) - Char(250) as delimiters make it impossible for it to be decoded properly by anything outside of OI. Char(252) is a sub-value mark in OI, but in UTF-8, it marks the beginning of a multi-byte character. So, depending on what follows, it could become anything once converted to a true Unicode character. Thus, you must base64 encode your delimited data to preserve it, have OI encode it to Unicode, or break it apart as discussed here.

DonBakke · February 2017

So in the end I am probably better off working with proper json, instead of keeping in the subvalue marker. So I'll try to figure out what the best way is to overload this for specific endpoints.

I would consider making an official update to HTTP Framework to support lower delimiters if I had a good way to associate the resulting arrays to the entire JSON object. Thus, I'm open to input from you or other customers.

Supporting MV or AMV was relatively easy. If a field is just MV then I create an array and name it after the field. If several MV fields were grouped together as AMVs then I create multiple arrays and make them subordinate them to an object named after the AMV Group.

But how do I organize @SVM, @TM, and @STM data? I have no name that I can use other than making one up based on the field. How do I distinguish between the different delimiters? If one field only as @STM data, do I embed several objects in order to denote depth or do I flatten it out? The challenge here is going back the other way. I have to make assumptions about which delimiters to use for each array and that is only possible based on the depth of the array within the JSON object itself.

For these reasons, I do recommend a custom logic in your end point API rather than a complete reliance upon HTTP_Resource_Services. Your idea of an "overload" might work out the best. I've done something similar. That is, I let HTTP_Resource_Service do the heavy lifting, bring the resulting JSON string back into my end point API, and then use SRP_JSON to make it into an object again and then customize the JSON as needed.

r.od.akker · February 2017

To begin at the bottom: could you maybe post your code so I have an example to work with? I was actually going the same route, although my developer-performance-emotion says it's a pity to undo stuff that was done by the code before. But I cannot extends the class as in other languages, so I guess it's the best place to put the code.

Another idea I had was that maybe I can pass in a callback-like structure. So then the HTTP_Resource_Service could check if this parameter is set, and then execute that bit of code.

Anyways, when I was driving home I thought it would be better to have this SVM decoding in the standard procedure, because I think I will need it again. I'm not sure what AVM means (I consider myself still a OI rookie), but in my case I use the SVM and VM delimiters to have an array in an array. It basically represents a list of files that go with a product. For each file I want to store several properties, so now the list is something like this:

1:@svm:filename.jpg@svm:filesize:@vm:2:@svm:file2.jpg:@svm:filesize

This is just an example but you get the idea. So in my case I think it's perfectly fine to bind it to the json without having to add a name. It's just an array in an array.

Or is this already implemented and am I doing something wrong?

@KevinFournier thanks for pointing that out.. i suppose it's best to avoid those characters in the content anyways :-)

DonBakke · February 2017

To begin at the bottom: could you maybe post your code so I have an example to work with? I was actually going the same route, although my developer-performance-emotion says it's a pity to undo stuff that was done by the code before. But I cannot extends the class as in other languages, so I guess it's the best place to put the code.

Actually, you should already have an example of how this can work in your package. Look for the GetItem service within HTTP_Contacts_Services. You will see that the code calls the GetDatabaseItem service and then updates the JSON object with a Base64 encoded image.

DonBakke · February 2017

Anyways, when I was driving home I thought it would be better to have this SVM decoding in the standard procedure, because I think I will need it again. I'm not sure what AVM means (I consider myself still a OI rookie), but in my case I use the SVM and VM delimiters to have an array in an array.

It is AMV, not AVM. This stands for Associated MultiValue. OpenInsight allows you to define in the data dictionary multiple MV (MultiValue) fields as being "associated" to each other. This is the OI way of defining embedded child-table relationships using the MultiValue architecture. This association is really just a shared label (referred to as the AMV Group Name) that all relevant MV fields share. Our code looks for the AMV Group Name and uses this to name the object that contains the associated MV fields.

It basically represents a list of files that go with a product. For each file I want to store several properties, so now the list is something like this:

1:@svm:filename.jpg@svm:filesize:@vm:2:@svm:file2.jpg:@svm:filesize

This is just an example but you get the idea. So in my case I think it's perfectly fine to bind it to the json without having to add a name. It's just an array in an array.

Or is this already implemented and am I doing something wrong?

I understand your data structure. The problem is that the code used by HTTP_Resource_Services is designed to be data-dictionary driven. That is how it is able to create human readable JSON. It is also how it supports POST methods. In your structure above, there is no way for the code to know what the data in <1, 1, 1> means.

Another option for you to consider is to create a symbolic column that helps with some of the normalization of your data...or you can use it to create a partial JSON string so that you do not have to do this in your end point API code as I originally suggested. For instance, you could convert each group of @SVM delimited data into a JSON object and then keep the @VM delimiter. I haven't tested this kind of solution, but I think it could work. Then you would just call HTTP_Resource_Services and exclude the data columns that you are reworking as a calculated column.

MattCrozier · February 2017

always make sure that the Content-Type request header has an appropriate value (e.g., application/json or text/plain)

I'm not sure if this helps here, but I've found that we need to set 'charset=utf-8' in the Content-Type response header for our situation.

r.od.akker · February 2017

Another option for you to consider is to create a symbolic column that helps with some of the normalization of your data...or you can use it to create a partial JSON string so that you do not have to do this in your end point API code as I originally suggested. For instance, you could convert each group of @SVM delimited data into a JSON object and then keep the @VM delimiter. I haven't tested this kind of solution, but I think it could work. Then you would just call HTTP_Resource_Services and exclude the data columns that you are reworking as a calculated column.

So the calculated column would return json, right? But wouldn't that result in json encoded json being returned by the HTTP_Resource_Services? How would it know that it's already json?

I'm not sure if this helps here, but I've found that we need to set 'charset=utf-8' in the Content-Type response header for our situation.

I tried adding the charset, but it didn't make any difference.

DonBakke · February 2017

So the calculated column would return json, right? But wouldn't that result in json encoded json being returned by the HTTP_Resource_Services? How would it know that it's already json?

To be honest, I threw that in at the last moment so I did not think it through completely. However, I had considered the issue you are describing. This is why I suggested "partial" JSON. What your calculated column would produce is a JSON object that would become the "value" of a JSON name/value pair. In this case, the "name" would be based on the calculated column's name. It still might not work. I have not tried to test this.

That said, this really isn't my recommendation. I was merely trying to suggest one way to avoid re-writing a core Framework routine.

r.od.akker · February 2017

Stubburn as I am, I implemented this in HTTP_RESOURCE_SERVICES and HTTP_JSON_SERVICES. It seems to be working ok. Also I found out that it probably already worked from the beginning, but because I was returning ID fields (which have pos 0) my ItemRec got screwed up. Anyways, for the sake of sharing, this is my code.

In HTTP_RESOURC_SERVICES at line 373, I changed it to:



Case JSONType _EQC 'Array'
    // Count the number of members in the array then loop through to build the @VM value list.
    ColumnValue = ''
    NumValues   = SRP_JSON(hColumn, 'GETCOUNT')
    For ValueCnt = 1 to NumValues
    		val = ''
    		subColumn = SRP_JSON(hBody, 'GET', ColumnName : '[' : ValueCnt : ']')
    		subColumnType = SRP_JSON(subColumn, 'TYPE')
    		if subColumnType _EQC 'Array' then
    			NumSubValues = SRP_JSON(subColumn, 'GETCOUNT')
    			for SubValueCnt = 1 to NumSubValues
    				val := SRP_JSON(hBody, 'GETVALUE', ColumnName : '[' : ValueCnt : '][' : SubValueCnt : ']', '') : @SVM
    			Next SubValueCnt
    			val[-1, 1] = ''; // strip last @svm
    			SRP_JSON(subColumn, 'RELEASE')
    		end else
    			val = SRP_JSON(hBody, 'GETVALUE', ColumnName : '[' : ValueCnt : ']', '')
    		end
    	
        ColumnValue := val : @VM
    Next ValueCnt
    ColumnValue[-1, 1] = ''; // strip last marker
    GoSub Update_ItemRec

In HTTP_JSON_SERVICES there are two locations where basically the same change needs to be done. First one is in the SetHALItem label, at line 182.



If Index(Value, @VM, 1) then
    If SRP_JSON(hColumnArray, 'NEW', 'ARRAY') then
        NumValues = Count(Value, @VM) + (Value NE '')
        For ValueCnt = 1 to NumValues
            If Index(Value<0, ValueCnt>, @SVM, 1) then
                
                If SRP_JSON(hSubColumnArray, 'NEW', 'ARRAY') then
                    NumSubValues = DCount(Value<0, ValueCnt>, @SVM)
                    For SubValueCnt = 1 to NumSubValues
                        SRP_JSON(hSubColumnArray, 'ADDVALUE', Value<0, ValueCnt, SubValueCnt>)
                    Next SubValueCnt
                    SRP_JSON(hColumnArray, 'ADD', hSubColumnArray)
                    SRP_JSON(hSubColumnArray, 'RELEASE')
                end
                
            end else
            SRP_JSON(hColumnArray, 'ADDVALUE', Value<0, ValueCnt>)
            end
        Next ValueCnt
        If SRP_JSON(HALRootObj@, 'SET', Name, hColumnArray) else HTTP_Services('SetResponseStatus', 500, '')
        SRP_JSON(hColumnArray, 'RELEASE')
    end
end else
    If SRP_JSON(HALRootObj@, 'SETVALUE', Name, Value, Type) else HTTP_Services('SetResponseStatus', 500, '')
end

The same check for @VM is done in the SetHALCollectionEmbedded label. After this code edit, it's around line 472.

I am probably missing your point Don, about binding to names, so if this is a dumb thing to do, please let me know.

BTW, I am note sure how the basic plus syntax highlighter thingy should work; in my preview it screws up the line numbers. Maybe a nice topic for in the FAQ because I couldn't find it there :-).

Encoding issues

Comments