Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
UTF-8 Question and Preview of v3.0.1
We will soon be releasing v3.0.1 of the SRP HTTP Framework. Some of you may have picked up on this in our latest blog post regarding error management. The items managed by the setup record are increasing. This is what brings me to my UTF-8 question.
Since a fair number of our customers are running outside of the USA, we suspect many of you might be using UTF-8 in your desktop and web-based OI applications. We have one installation where we wrote the whole application and HTTP_MCP has been customized to always set the UTF-8 mode. My two-part question to those abroad is: 1.) Do you use UTF-8 mode in your web applications, and 2.) If your answer is yes, do you turn it off and on as needed or do you keep it on for the entire duration of the request?
The purpose for asking is because I'm considering adding UTF-8 mode as another configuration option in the setup form/record. HTTP_MCP would then set the mode accordingly as it does with the other configuration options. It's not a big deal to add it, but if customers find that they need to have more granular control over when UTF-8 mode is turned on then it might not be a worthwhile feature to add.
Here is a summarized list of changes coming in the next release:
Since a fair number of our customers are running outside of the USA, we suspect many of you might be using UTF-8 in your desktop and web-based OI applications. We have one installation where we wrote the whole application and HTTP_MCP has been customized to always set the UTF-8 mode. My two-part question to those abroad is: 1.) Do you use UTF-8 mode in your web applications, and 2.) If your answer is yes, do you turn it off and on as needed or do you keep it on for the entire duration of the request?
The purpose for asking is because I'm considering adding UTF-8 mode as another configuration option in the setup form/record. HTTP_MCP would then set the mode accordingly as it does with the other configuration options. It's not a big deal to add it, but if customers find that they need to have more granular control over when UTF-8 mode is turned on then it might not be a worthwhile feature to add.
Here is a summarized list of changes coming in the next release:
- Support for a designated routine to handle system error aborts. Previously this was just handled in HTTP_MCP in a minimalist way. A template routine has been provided so developers can customize how this works.
- Support to set the debugger mode as enabled, disabled, or in intercept mode. A template routine has been provided for debugger intercept mode.
- The logic that creates logs has been moved into its own service: CreateLogFile
- Logging has been extended to support system error aborts, the debugger intercept, as well as custom log types that the developer may want create.
- Logging is now governed by a flag rather than just the absence of a valid capture path.
- Logging for requests and responses has been significantly updated to provide more detail and more useful information.
- The filename for the log files has been overhauled to make it easier to sort by execution order. Included in the filename is the Windows ProcessID for the engine that created the log. This helps to understand when one engine aborted and another one was launched.
- HTTP Services no longer relies upon querying the Registry for its stored information. This data is already passed in through the HTTP Request array so we are now just relying on this information. This helps customers who rename OECGI4 to OECGI3 but forget to update the code that looks for the Registry key.
Comments
Likewise, we've to patched HTTP_MCP to turn on UTF-8 mode (I haven't found a way to determine this from the application properties). Also, we append "; charset=utf-8" to any textual Content-Type header.
It would be great to have this configurable.
HTH, M@
The HTTP Framework itself doesn't need to be run in UTF-8 mode, so one alternative to turning it on in HTTP_MCP could be only turn UTF-8 mode on when getting data, database items, or constructing a response (again HTTP_SERVICES and HTTP_RESOURCE_SERVICES). I haven't tried it, but it should make the framework run slightly faster.
Just to update this thread, I'm now having to deal with UTF-8 data coming in the HTTP request. This just affects both the GET string (for queries) and the POST string (for generic data updates, ie not necessarily URL formatting).
So I've extended the DecodePercentString service in HTTP_SERVICES to handle the general case of any %hh encoding. The urlDecodeValue logic in the URL_FORMAT user defined conversion seems to do the trick, with the exception of converting a "+" to a space.
If you have anything you would like to submit as a change to the official product please email this to me directly.
I also wanted to revisit charset=utf8. I get the impression you think it would be useful if the framework appended this automatically to the Content-Type header. How would that help you? I mean, doesn't this only benefit the web server?
Just playing safe, really, as I don't know who/what our API will be serving to, and following W3.org's advise. I know there are some clients that don't render the response correctly without the charset=utf-8 header (eg Chrome - although an internet browser is not a good example of an API client ;). Postman seems to assume UTF8 responses though.
Or are you merely talking about the response header? Perhaps you are simply suggesting that the framework be smart enough to append this for you either when UTF8 mode is enabled or when UTF8 characters are in the response body.
For the request, I hadn't considered examining the Content-Type header for a charset=utf-8. I decode for any UTF-8 characters in the query strings regardless. Maybe I should check the Content-Type header first!
I see that Postman automatically sets charset=utf-8 if it sees any unicode in the request.