UTF-8 Question and Preview of v3.0.1

DonBakke · March 2017

We will soon be releasing v3.0.1 of the SRP HTTP Framework. Some of you may have picked up on this in our latest blog post regarding error management. The items managed by the setup record are increasing. This is what brings me to my UTF-8 question.

Since a fair number of our customers are running outside of the USA, we suspect many of you might be using UTF-8 in your desktop and web-based OI applications. We have one installation where we wrote the whole application and HTTP_MCP has been customized to always set the UTF-8 mode. My two-part question to those abroad is: 1.) Do you use UTF-8 mode in your web applications, and 2.) If your answer is yes, do you turn it off and on as needed or do you keep it on for the entire duration of the request?

The purpose for asking is because I'm considering adding UTF-8 mode as another configuration option in the setup form/record. HTTP_MCP would then set the mode accordingly as it does with the other configuration options. It's not a big deal to add it, but if customers find that they need to have more granular control over when UTF-8 mode is turned on then it might not be a worthwhile feature to add.

Here is a summarized list of changes coming in the next release:

Support for a designated routine to handle system error aborts. Previously this was just handled in HTTP_MCP in a minimalist way. A template routine has been provided so developers can customize how this works.
Support to set the debugger mode as enabled, disabled, or in intercept mode. A template routine has been provided for debugger intercept mode.
The logic that creates logs has been moved into its own service: CreateLogFile
Logging has been extended to support system error aborts, the debugger intercept, as well as custom log types that the developer may want create.
Logging is now governed by a flag rather than just the absence of a valid capture path.
Logging for requests and responses has been significantly updated to provide more detail and more useful information.
The filename for the log files has been overhauled to make it easier to sort by execution order. Included in the filename is the Windows ProcessID for the engine that created the log. This helps to understand when one engine aborted and another one was launched.
HTTP Services no longer relies upon querying the Registry for its stored information. This data is already passed in through the HTTP Request array so we are now just relying on this information. This helps customers who rename OECGI4 to OECGI3 but forget to update the code that looks for the Registry key.

MattCrozier · April 2017

Yes, we have implemented our web service as UTF-8. It calls many supporting routines from the OI application which is set to run in UTF-8 mode, thus those routines assume UTF-8 unless otherwise stated.

Likewise, we've to patched HTTP_MCP to turn on UTF-8 mode (I haven't found a way to determine this from the application properties). Also, we append "; charset=utf-8" to any textual Content-Type header.

It would be great to have this configurable.

HTH, M@

DonBakke · April 2017

Matt - Thanks for the feedback. Surprisingly you are the only one who has responded. I take it then that simply turning on UTF-8 mode at the beginning of HTTP_MCP is sufficient and if you needed to temporarily disable this within the execution of your web service you would manage that on your own? This is what we are doing with our applications, but I wanted to make sure I wasn't overlooking something else that could be put into the setup configuration.

MattCrozier · April 2017

HI Don - Adding charset=utf-8 to the Content-Type header would be the only other requirement we have - that affects HTTP_SERVICES and HTTP_RESOURCE_SERVICES.

The HTTP Framework itself doesn't need to be run in UTF-8 mode, so one alternative to turning it on in HTTP_MCP could be only turn UTF-8 mode on when getting data, database items, or constructing a response (again HTTP_SERVICES and HTTP_RESOURCE_SERVICES). I haven't tried it, but it should make the framework run slightly faster.

MattCrozier · June 2017

So far I've just been working with UTF-8 data for the HTTP responses. This works well with the HTTP Framework running mainly in ANSI mode (setting UTF8 mode only where required), and with a charset=utf-8 Content-Type header.

Just to update this thread, I'm now having to deal with UTF-8 data coming in the HTTP request. This just affects both the GET string (for queries) and the POST string (for generic data updates, ie not necessarily URL formatting).

So I've extended the DecodePercentString service in HTTP_SERVICES to handle the general case of any %hh encoding. The urlDecodeValue logic in the URL_FORMAT user defined conversion seems to do the trick, with the exception of converting a "+" to a space.

DonBakke · July 2017

Matt - Sorry for the long delay in getting back to this topic. Just an FYI, I am going to release 3.0.1 soon but I am not going to add anything new for UTF-8. I am still considering how I want to handle this.

If you have anything you would like to submit as a change to the official product please email this to me directly.

I also wanted to revisit charset=utf8. I get the impression you think it would be useful if the framework appended this automatically to the Content-Type header. How would that help you? I mean, doesn't this only benefit the web server?

MattCrozier · July 2017

Hi Don,

Just playing safe, really, as I don't know who/what our API will be serving to, and following W3.org's advise. I know there are some clients that don't render the response correctly without the charset=utf-8 header (eg Chrome - although an internet browser is not a good example of an API client ;). Postman seems to assume UTF8 responses though.

DonBakke · July 2017

I think I am not quite on the same page as you. Content-Type is both a request and response header. From prior discussions, I thought you came to the conclusion that appending charset=utf8 to the header value was necessary in order for your UTF8 encoded data to come into the request properly. Is this the case? If so, are you under the impression that if the framework does this for you upon processing the incoming request that this will help you?

Or are you merely talking about the response header? Perhaps you are simply suggesting that the framework be smart enough to append this for you either when UTF8 mode is enabled or when UTF8 characters are in the response body.

MattCrozier · July 2017

Sorry, yes I was on the wrong page - talking just about the response header.

For the request, I hadn't considered examining the Content-Type header for a charset=utf-8. I decode for any UTF-8 characters in the query strings regardless. Maybe I should check the Content-Type header first!
I see that Postman automatically sets charset=utf-8 if it sees any unicode in the request.

UTF-8 Question and Preview of v3.0.1

Comments