Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
Is having large records bad?
I have been told that having large records is bad (i.e., records with lots of data in them). Is this true?
What I am trying to do:
We have a table called CUSTOMER_NOTES. Each customer can have 1 or many notes, and they're all stored in this table. A note contains a fairly long string (let's say 2000 characters), among other things.
A note can be replied to 1 or many times. At the moment, we store the replies to a note in the original note's record (multi-value). But I've been told that this is bad, because the record's size will grow too large. Is this true? The proposed solution to get around this is to create a new record for each reply, and link a reply to the note it's replying to (though a pointer field)
For example:
Note 1
blah blah this is note 1
Note 2
blah blah blah this is note 2
* I am not sure why this is. I am guessing it's because the records are stored by OI far away from each other on the hard disk, so when you read records (keys), you need to fetch each one from a different "place" (page) on the disk, rather than one read getting a whole bunch.
What I am trying to do:
We have a table called CUSTOMER_NOTES. Each customer can have 1 or many notes, and they're all stored in this table. A note contains a fairly long string (let's say 2000 characters), among other things.
A note can be replied to 1 or many times. At the moment, we store the replies to a note in the original note's record (multi-value). But I've been told that this is bad, because the record's size will grow too large. Is this true? The proposed solution to get around this is to create a new record for each reply, and link a reply to the note it's replying to (though a pointer field)
For example:
Note 1
blah blah this is note 1
Note 2
blah blah blah this is note 2
- Solution 1: Store both note 1 and note 2 in the same OI record as separate values (multi-value)
- Solution 2: Store note 1 and note 2 in separate records, but store a reference to 1 in 2, and a reference to 2 in 1.
* I am not sure why this is. I am guessing it's because the records are stored by OI far away from each other on the hard disk, so when you read records (keys), you need to fetch each one from a different "place" (page) on the disk, rather than one read getting a whole bunch.
Comments
Questions like these are usually hard to generalize. "Large" can mean different things to different people. My first consideration what is the best design architecture and then I consider constraints (such as network lag, slow disks, etc.) and then try to optimize my design around those.
Large records can be bad. I would also argue that a record that gets sufficiently large probably needs a design overhaul. Regardless of the way you design the system you will want to configure the database table so it will handle your data as efficiently as possible. This is where you provide the Table Builder the Average Size, Frame Size, and Threshold information. You could also consider presizing the table with a sizelock that prevents shrinking.
Just because you can use a MultiValue field doesn't mean you should. I would design this so each note gets its own record. This would be a design architecture decision. Records that maintain data like this have a tendency to suffer from lock contention. I don't know what kind of use cases can exist, but if is possible for multiple users to add notes to the same customer at the same time then you'll want each note to be a new record so no one gets locked out. I would then create a relational index so that each child note relates back to the parent. This will automate the pointer system.
* I am not sure why this is. I am guessing it's because the records are stored by OI far away from each other on the hard disk, so when you read records (keys), you need to fetch each one from a different "place" (page) on the disk, rather than one read getting a whole bunch.
So your preference is based on platform related issues. This is one reason why it is hard to generalize. The engineer in me would really want to understand the network layout better and work toward a resolution. The benefits of this are far reaching.
But i think if I limit the amount of data returned to the last x amount days of notes by default, it won't be too slow. But it's still worrying that the system is going to get slower and slower with time.
What I am trying to do:
Create an OI form that when opened, displays a subset of all of a customer's notes, which the user can browse through, respond to, add another to, etc (like an email client)
It's unlikely that a customer would have so many notes within the last x days that the form would take a noticeably long time to read them all.
We have 5 million customers, and on average, a customer only has 9 notes...which is nothing. And that's 23 years worth of data.
We also store the data in Sybase, so I'm probably just going to read the data from there. OI is just so slow. reading 1000 records takes more than 3 seconds (hence my preference for creating as few records as possible*). Do you know if OI 10 fixes this bug?
*reading a lot of records is slow, but reading one record with lots of data in it is fast. That's the infuriating thing about OI.
Also note:
We're not creating this from scratch. The table already exists, and currently has 5 million records in it. Creating an index on a table like this sounds scary, as in our system, indexes on large tables have a tendency of breaking. Having to rebuild an index on a table with 5 million + records in it would take at least 7 hours....scary
What is the difference between a relation index, and a btree index?
CUSTOMER_NOTES_REPLIES has field CUSTOMER_NOTES_KEY
A relational index setup will maintain keys in CUSTOMER_NOTES_REPLIES_KEY for all records created in CUSTOMER_NOTES_REPLIES relating to the speicific customer note.
to browse all replies to a note just throw the CUSTOMER_NOTES_REPLIES_KEY field at the CUSTOMER_NOTES_REPLIES screen (CreateParam)
A relational index maintains a relationship between a source and target table using a "join" relationship that you define. The end result is that all relevant child Key IDs are automatically maintained in a target column in the source table. Thus, there is no searching needed. You just pull the list of Key IDs and you automatically have all of your children.
My original idea was to store everything thing in one table, and link notes via pointer fields...
Isn't one table simpler than 2 though?
That said, I still prefer a design where the data resides in one table and the relational indexes (aka pointers as you put them) in a different table. This will help avoid lock contention.
here is an article related to Don's 'relational indexes in a different table'
This principal applies also to OI9
We want to create a form that works like gmail. When you open the form, it will display a preview of all of customer's notes within the last 120 days in a grid. But we only want to show the most recent reply, like gmail. When you click on a row, it will load in a panel on the right the full reply chain and other info.
Let me explain more.
This company has been around for 23 years, and the notes system has been in place since then. So far, we have 5 million note records. The key to a note is:
customer_number*note_no
The note_no is a sequential key. We store the last note_no for a customer, so getting the last 120 days worth of notes is easy (read down from the last note no).
yes.
Le t me explain more of the context here.
We are remaking my company's callcentre system in OI. At the moment, it's a javascript app. As part of the redesign, I want to "fix" customer notes. This is the problem with the current system:
A reply to a note is stored within the note itself, which is bad for the reasons you outlined, but also bad because we enforce a limit on the size of the record. This means that users are prevented from writing more replies once the size limit has been reached, which infuriates them.
My idea to fix this was to make each reply its own note, but linked to the original note (through a pointer field, or as you suggest a relational index).
Current system design:
The key to the CUSTOMER_NOTES table is customer_number*note_no. So the key is a multi-part key. Also, the note_no is sequential, and we store the last note_no for each customer.
When I run OI on my workstation, reads and btree.index are really slow. But when I run OI on the same computer that the "oi server" is running on (not really sure what OI server means -- to be honest i don't understand the client-server architecture of oI, if someone could explain it that would be useful) it's fast.
That's what I would have expected. The problem isn't with OI or the server, but with your network. This is why my first response in this thread was this:
The engineer in me would really want to understand the network layout better and work toward a resolution. The benefits of this are far reaching.
This means you are running OI on the same server where the OI binaries (and presumably database tables) are stored. This implies you aren't routing anything through the network. OI isn't really client-server, which is why it is a bit hard to understand if you are familiar with traditional client-server systems. When you run OI from a workstation that is connected to the server via a network, you are running everything over the wire. This is why network latency can have such a huge impact upon OI performance.
One advantage of using relational indexes is that the system just maintains the list of keys for you. Relational indexes are updated immediately unless there is a lock on the source row. Then it just gets queued up like Btree and Cross-Reference indexes and updated as soon as possible.