Handling very large resultsets in DFC code

If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.

Here are some pointers to avoid problems in such a situation:

  1. Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
  2. Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
  3. Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
  4. Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
  5. Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.

It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.

Handy DQL: Listing Folder Paths for Objects

It is a common need to print folder paths of sysobjects (documents are also sysobjects) as a part of DQL query results. The ROW_BASED hint makes it convenient to do so as shown in the query below.

SELECT s.object_name, f.r_folder_path
FROM dm_sysobject s, dm_folder f
WHERE s.i_folder_id = f.r_object_id
AND f.r_folder_path != ' '
AND additional conditions
ENABLE (ROW_BASED)

Note that you can replace dm_sysobject with another custom type, which is a descendant (direct or indirect subtype) of dm_sysobject. Don’t forget to add conditions for the where clause lest you should retrieve a large number of objects (unless that’s what you need).