Efficiency in a SELECT for non keys

Opto_Will · August 2023

Can somebody please 'Tell him he's Dreamin' (reference for all my fellow Aussies our there!) or shine a light on my ignorance?

If I have a table , CLIENTS, that has the Code (key) and Category, as well as some other fields, and I want a distinct list of categories in use I would:

In SQL I would simple do something like
'Select distiinct category from clients'.

In OI
I do a Select, this generates a Keylist of Codes
Then I have to loop through the Keylist to get the categoiries list
Then I can use one of several mechanisms to get rid of nulls and/or duplicates (in the above loop or after)

Is there a more efficient way to do this sort of thing in OI without the multi steps?

DonBakke · August 2023

Is the Category column indexed?

Opto_Will · August 2023

Unfortunately It is not @DOnBakke. Otherewise BTree.Extract would be the way I would go.

I was trying to be simple in my example but to give a little more color, the field is actually a UDF.
We are giving users the capability to use some UDFs in certain screens as extra grouping methods of their choice.
So 'Category' is in fact be a UDF sub catgory grouping here (or it could be a UDF Certifiate Status on another site for example).

The list we are trying to get will be for a dropdown list so the users can select a value they have currently used before for some sort of consistently since , as a UDF, we are not validating entries.

We have a maintenance screen that has the ability to lock out a UDF from client use (we might use it to code something specific for them for example), give it a proper site-specific label, and determine if it is meant to be unique (in which case we probably wouldnt offer a dropdown but probably instead enforce a uniqueness validation).

So the dropdown is meant to be frequently used and some of these tables are quite large.

We were trying not to index multiple fields (3 UDFs per table for maybe 6-12 tables for now). As noted, these may or may not be used site by site.

So we know how to get it to work unindexed (or indexed for that matter). I was wanting to understand if there were more efficient ways to do things unindexed.

Speaking of which then Don, what is your rule in determing whether or not to index?
And while I am at it, didnt I read somewhere you were supposed to be on holidays??

BarryStevens · August 2023

RTI_DistinctList Function
Description

Creates a distinct list of delimited values using passed values.

Syntax

newList = RTI_DistinctList( inList, inDelimiter)

Parameters

The function has the following parameters:

Parameter

Description

inList

An array of delimited values.

inDelim

The delimiter used in the inList parameter. Default is @vm.

DonBakke · August 2023

Otherewise BTree.Extract would be the way I would go.

Actually, Collect.IXVals would be your best option.

I was trying to be simple in my example but to give a little more color, the field is actually a UDF.

UDF? User Defined Field?

The list we are trying to get will be for a dropdown list so the users can select a value they have currently used before for some sort of consistently since , as a UDF, we are not validating entries.

Yep, which is what Collect.IXVals is used for frequently by Revelation and developers.

So we know how to get it to work unindexed (or indexed for that matter). I was wanting to understand if there were more efficient ways to do things unindexed.

Let's assume you do not intend to index the column. Your basic approach is correct, but there are ways of optimizing performance (more code) and ways of minimizing the code (less optimized performance). You can also mix and match for middle of the road performance.

I tend to look at everything I design for optimum performance, especially if this is a client facing feature, which it sounds like this is. I would do a standard select and then create a long delimited string of values using the method of creating a presized string of null characters and filling in the bytes. When finished I would strip off the remaining null characters. This will avoid memory swapping and generally be the fastest way to get your list of values created.

Note, this method assumes the overall length of your string isn't going to exceed the memory constraint of OI. While that is >1GB, my opinion is that If you have to exceed more than a few MB then this might need to be architected differently.

Barry mentioned the RTI_DistinctList function to remove duplicates. That will work, but I would personally use the SRP_Array Clean service which can do the exact same thing. RTI_DistinctList is based on .Net but SRP_Array is based on C++. I have not benchmarked the differences with these, but in other situations where Revelation and SRP have a similar function, ours tends to be faster.

Speaking of which then Don, what is your rule in determing whether or not to index?

I'll give you one use case: So the dropdown is meant to be frequently used and some of these tables are quite large. :) In all seriousness, this encapsulates much of what is my rule of thumb.

And while I am at it, didnt I read somewhere you were supposed to be on holidays??

Yes you did...just got back earlier today.

Opto_Will · August 2023

@BarryStevens.
Thanks for that.

I was counting RTI_DistinctList and SRP_ARRAY('CLEAN') as some of my 'several mechanisms' I could use to processing that list.

I was more interested in more efficient ways to get that list in the first place (for a unindexed columns).

Opto_Will · August 2023

@DonBakke,

Yes, Collect.IXVals would indeed be my best option. Exactly what I would be looking for an indexed field. Now if only there was a Collect.NonIXVals!!!!!

I have already learned my new thing for the day so it looks like I can pack up and head home now!

UDF? User Defined Field?

Yes, User Defined Field.

I tend to look at everything I design for optimum performance, especially if this is a client facing feature, which it sounds like this is.

I concur.

I would do a standard select and then create a long delimited string of values using the method of creating a presized string of null characters and filling in the bytes. When finished I would strip off the remaining null characters. This will avoid memory swapping and generally be the fastest way to get your list of values created.

This one is a new concept for me. We already have moved to 'text appending' delimited values to an array/list variable instead of doing a <-1> to save the parsing every time (noicble on large arrays) but I dont think I have ever created a presized string in OI. Actually, I am not quite sure what that sytax woul dbe. What sort of efficiencies would you expect from that?

Barry mentioned the RTI_DistinctList function to remove duplicates. That will work, but I would personally use the SRP_Array Clean service which can do the exact same thing. RTI_DistinctList is based on .Net but SRP_Array is based on C++. I have not benchmarked the differences with these, but in other situations where Revelation and SRP have a similar function, ours tends to be faster.

Good to know. I tend to use the SRP 'toolkit' because of functionality or ease of use but tend to expect maybe a slight performance hit sometimes simply because it is non-native. Makes sense that something going off and doing some heavy processing can still be more efficient.

I'll give you one use case: So the dropdown is meant to be frequently used and some of these tables are quite large. :) In all seriousness, this encapsulates much of what is my rule of thumb.

He has jokes!.
I honestly expected as much.

Yes you did...just got back earlier today.

I hope it was a good break for you.

DonBakke · August 2023

@Opto_Will,

Now if only there was a Collect.NonIXVals!!!!!

If you end up writing a routine that works well, you can certainly package it up in your own Collect.NonIXVals routine and just pass in arguments. Many of our best utilities were given birth this way.

We already have moved to 'text appending' delimited values to an array/list variable instead of doing a <-1> to save the parsing every time (noicble on large arrays) but I dont think I have ever created a presized string in OI. Actually, I am not quite sure what that sytax woul dbe. What sort of efficiencies would you expect from that?

The "text appending" vs. the <-1> approach probably produces the same performance, which is in the middle. Old school developers maintain a counter and then use < Cntr >. This is very inefficient.

Here is some sample (untested) code:

ReturnString = Str(\00\, 1000000)   ; // Presize to 1MB.
CharPos      = 1
TotalLength  = 0    ; // Represents the total length of the string being built.
Select hTable
EOF          = False$
Loop
    Readnext ID else EOF = True$
Until EOF EQ True$
    Read Record from hTable, ID then
        Category                         = Record<5>
        LenString                        = Len(Category) + 1   ; // Calculate the length of the category value and add one for the delimiter.
        If TotalLength + LenString GT 1000000 then
            // The presized string wasn't long enough. Increase the size or add logic to handle this in chunks.
        end else
            // "Fill in" the presized string by replacing null characters with real values. This will not change the size of the string so memory management is better.
            ReturnString[CharPos, LenString]  = Category : @FM
            TotalLength                      += LenString
            CharPos                          += LenString 
        end
    end
Repeat

// Remove the remaining null characters.
ReturnString[TotalLength, 1000000] = ''

As your string gets bigger, the comparative performance results will be better.

...but tend to expect maybe a slight performance hit sometimes simply because it is non-native.

That would be expected if the code was native to the engine. But in most cases, the engine is calling external DLLs (both .Net and C++ written), which is no different in principle than calling our DLL. At this point it is a matter of which library is quicker.

AusMarkB · August 2023

So, @DonBakke, you're saying, the brackets operator is more efficient that straight string concatenation?
ReturnString[CharPos, LenString] = Category : @FM is more efficient than returnString := category:@FM
Note, I'm clarifying not debating.

DonBakke · August 2023

@AusMarkB,

Yes, that has been my understanding. When you create a local variable, this is really a pointer to an address in memory with a preallocated size. When your local variable exceeds this size, a new memory address is allocated with a larger size and the pointer is shifted. While negligible for smaller operations, it can become noticeable when strings continuously become larger strings in a loop as we are considering above. By contrast, when the local variable is presized and all you are doing is updating specific bytes within that variable (and not using syntax that grows or shrinks the size), then you are keeping the same pointer and address so you save by not having the overhead of memory management.

AusMarkB · August 2023

MattCrozier · August 2023

I did some benchmarking on these techniques a while ago. This showed (among other things) that manipulating a pre-filled string (using putBinaryValue, in my case) performed linearly as N increased, compared to order N-squared using concatenation.

The "text appending" vs. the <-1> approach probably produces the same performance, which is in the middle. Old school developers maintain a counter and then use < Cntr >. This is very inefficient.

Yeap. My tests were pretty conclusive about this.

AusMarkB · August 2023

Opto_Will · August 2023

Thanks @DonBakke and all who contributed.

Very informative. Picked up two good tips for my efforts!

Opto_Will · August 2023

Oh, and @MattCrozier, I liked your Graph!
Thanks for that legwork.

Efficiency in a SELECT for non keys

Comments