Handling very large resultsets in DFC code

If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.

Here are some pointers to avoid problems in such a situation:

  1. Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
  2. Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
  3. Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
  4. Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
  5. Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.

It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s