Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
Efficiency in a SELECT for non keys
Can somebody please 'Tell him he's Dreamin' (reference for all my fellow Aussies our there!) or shine a light on my ignorance?
If I have a table , CLIENTS, that has the Code (key) and Category, as well as some other fields, and I want a distinct list of categories in use I would:
In SQL I would simple do something like
'Select distiinct category from clients'.
In OI
I do a Select, this generates a Keylist of Codes
Then I have to loop through the Keylist to get the categoiries list
Then I can use one of several mechanisms to get rid of nulls and/or duplicates (in the above loop or after)
Is there a more efficient way to do this sort of thing in OI without the multi steps?
If I have a table , CLIENTS, that has the Code (key) and Category, as well as some other fields, and I want a distinct list of categories in use I would:
In SQL I would simple do something like
'Select distiinct category from clients'.
In OI
I do a Select, this generates a Keylist of Codes
Then I have to loop through the Keylist to get the categoiries list
Then I can use one of several mechanisms to get rid of nulls and/or duplicates (in the above loop or after)
Is there a more efficient way to do this sort of thing in OI without the multi steps?
Comments
I was trying to be simple in my example but to give a little more color, the field is actually a UDF.
We are giving users the capability to use some UDFs in certain screens as extra grouping methods of their choice.
So 'Category' is in fact be a UDF sub catgory grouping here (or it could be a UDF Certifiate Status on another site for example).
The list we are trying to get will be for a dropdown list so the users can select a value they have currently used before for some sort of consistently since , as a UDF, we are not validating entries.
We have a maintenance screen that has the ability to lock out a UDF from client use (we might use it to code something specific for them for example), give it a proper site-specific label, and determine if it is meant to be unique (in which case we probably wouldnt offer a dropdown but probably instead enforce a uniqueness validation).
So the dropdown is meant to be frequently used and some of these tables are quite large.
We were trying not to index multiple fields (3 UDFs per table for maybe 6-12 tables for now). As noted, these may or may not be used site by site.
So we know how to get it to work unindexed (or indexed for that matter). I was wanting to understand if there were more efficient ways to do things unindexed.
Speaking of which then Don, what is your rule in determing whether or not to index?
And while I am at it, didnt I read somewhere you were supposed to be on holidays??
Description
Creates a distinct list of delimited values using passed values.
Syntax
newList = RTI_DistinctList( inList, inDelimiter)
Parameters
The function has the following parameters:
Parameter
Description
inList
An array of delimited values.
inDelim
The delimiter used in the inList parameter. Default is @vm.
Actually, Collect.IXVals would be your best option.
UDF? User Defined Field?
Yep, which is what Collect.IXVals is used for frequently by Revelation and developers.
Let's assume you do not intend to index the column. Your basic approach is correct, but there are ways of optimizing performance (more code) and ways of minimizing the code (less optimized performance). You can also mix and match for middle of the road performance.
I tend to look at everything I design for optimum performance, especially if this is a client facing feature, which it sounds like this is. I would do a standard select and then create a long delimited string of values using the method of creating a presized string of null characters and filling in the bytes. When finished I would strip off the remaining null characters. This will avoid memory swapping and generally be the fastest way to get your list of values created.
Note, this method assumes the overall length of your string isn't going to exceed the memory constraint of OI. While that is >1GB, my opinion is that If you have to exceed more than a few MB then this might need to be architected differently.
Barry mentioned the RTI_DistinctList function to remove duplicates. That will work, but I would personally use the SRP_Array Clean service which can do the exact same thing. RTI_DistinctList is based on .Net but SRP_Array is based on C++. I have not benchmarked the differences with these, but in other situations where Revelation and SRP have a similar function, ours tends to be faster.
I'll give you one use case: So the dropdown is meant to be frequently used and some of these tables are quite large. :) In all seriousness, this encapsulates much of what is my rule of thumb.
Yes you did...just got back earlier today.
Thanks for that.
I was counting RTI_DistinctList and SRP_ARRAY('CLEAN') as some of my 'several mechanisms' I could use to processing that list.
I was more interested in more efficient ways to get that list in the first place (for a unindexed columns).
Yes, Collect.IXVals would indeed be my best option. Exactly what I would be looking for an indexed field. Now if only there was a Collect.NonIXVals!!!!!
I have already learned my new thing for the day so it looks like I can pack up and head home now!
Yes, User Defined Field.
I concur.
This one is a new concept for me. We already have moved to 'text appending' delimited values to an array/list variable instead of doing a <-1> to save the parsing every time (noicble on large arrays) but I dont think I have ever created a presized string in OI. Actually, I am not quite sure what that sytax woul dbe. What sort of efficiencies would you expect from that?
Good to know. I tend to use the SRP 'toolkit' because of functionality or ease of use but tend to expect maybe a slight performance hit sometimes simply because it is non-native. Makes sense that something going off and doing some heavy processing can still be more efficient.
He has jokes!.
I honestly expected as much.
I hope it was a good break for you.
If you end up writing a routine that works well, you can certainly package it up in your own Collect.NonIXVals routine and just pass in arguments. Many of our best utilities were given birth this way.
The "text appending" vs. the <-1> approach probably produces the same performance, which is in the middle. Old school developers maintain a counter and then use < Cntr >. This is very inefficient.
Here is some sample (untested) code:
ReturnString = Str(\00\, 1000000) ; // Presize to 1MB. CharPos = 1 TotalLength = 0 ; // Represents the total length of the string being built. Select hTable EOF = False$ Loop Readnext ID else EOF = True$ Until EOF EQ True$ Read Record from hTable, ID then Category = Record<5> LenString = Len(Category) + 1 ; // Calculate the length of the category value and add one for the delimiter. If TotalLength + LenString GT 1000000 then // The presized string wasn't long enough. Increase the size or add logic to handle this in chunks. end else // "Fill in" the presized string by replacing null characters with real values. This will not change the size of the string so memory management is better. ReturnString[CharPos, LenString] = Category : @FM TotalLength += LenString CharPos += LenString end end Repeat // Remove the remaining null characters. ReturnString[TotalLength, 1000000] = ''
As your string gets bigger, the comparative performance results will be better.
That would be expected if the code was native to the engine. But in most cases, the engine is calling external DLLs (both .Net and C++ written), which is no different in principle than calling our DLL. At this point it is a matter of which library is quicker.
ReturnString[CharPos, LenString] = Category : @FM
is more efficient thanreturnString := category:@FM
Note, I'm clarifying not debating.
Yes, that has been my understanding. When you create a local variable, this is really a pointer to an address in memory with a preallocated size. When your local variable exceeds this size, a new memory address is allocated with a larger size and the pointer is shifted. While negligible for smaller operations, it can become noticeable when strings continuously become larger strings in a loop as we are considering above. By contrast, when the local variable is presized and all you are doing is updating specific bytes within that variable (and not using syntax that grows or shrinks the size), then you are keeping the same pointer and address so you save by not having the overhead of memory management.
Yeap. My tests were pretty conclusive about this.
Very informative. Picked up two good tips for my efforts!
Thanks for that legwork.