Handling very large resultsets in DFC code

If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.

Here are some pointers to avoid problems in such a situation:

  1. Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
  2. Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
  3. Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
  4. Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
  5. Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.

It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.

Handy DQL: Audit Report by Month

Recently, I received a request for a DQL as follows:

In an audit trail, I want to count the number of users who viewed (dm_getfile event)
all the documents that belong to a particular object type every month. These documents are in a specific folder of a cabinet.

I made couple of assumptions in coming up with the DQL, but these are easy to adjust to suit specific needs. When looking for documents in a folder, I assumed the containment to be recursive. If you don’t need to look in the subfolders, just remove “, descend” in the folder() predicate below.

The other assumption I made was that the period of check was 1 calendar year.
For example, create a report for 2009 by each month. Again, if the need is different, the just modify the date range condition.

Alter the following DQL, by replacing <OBJECT_TYPE>, <FOLDER_PATH>, <FOUR_DIGIT_YEAR> below.

SELECT DATETOSTRING(time_stamp, 'yyyy/mm') AS period, COUNT(distinct user_name) AS viewers
FROM dm_audittrail
WHERE object_type = '<OBJECT_TYPE>'
AND event_name = 'dm_getfile'
AND DATEFLOOR(year, "time_stamp") = DATEFLOOR(year, DATE('01/01/<FOUR_DIGIT_YEAR>', 'mm/dd/yyyy'))
AND audited_obj_id IN (
SELECT r_object_id FROM dm_sysobject (all)
WHERE folder('<FOLDER_PATH>', descend)
)
GROUP BY DATETOSTRING(time_stamp, 'yyyy/mm')

A sample result from this query is shown below:

period viewers
2009/01 3
2009/02 1
2009/04 6

Handy DQL: Listing Folder Paths for Objects

It is a common need to print folder paths of sysobjects (documents are also sysobjects) as a part of DQL query results. The ROW_BASED hint makes it convenient to do so as shown in the query below.

SELECT s.object_name, f.r_folder_path
FROM dm_sysobject s, dm_folder f
WHERE s.i_folder_id = f.r_object_id
AND f.r_folder_path != ' '
AND additional conditions
ENABLE (ROW_BASED)

Note that you can replace dm_sysobject with another custom type, which is a descendant (direct or indirect subtype) of dm_sysobject. Don’t forget to add conditions for the where clause lest you should retrieve a large number of objects (unless that’s what you need).

Handy DQL: What is being audited?

Whether we need to configure auditing for a new solution or figure out why some event was not audited, we may have to answer the question, “What is currently being audited in my repository?” Here are some handy queries to help you do just that.

List all configured audit events

This query is not very helpful other than getting the total count and the object ID’s. The queries that follow later join with various types to provide user-friendly information.

select r.registered_id, r.event,r.user_name
from dmi_registry r
where r.is_audittrail = 1

Across-the-board  audit configuration

select 'ALL',r.event,r.user_name
from dmi_registry r
where r.is_audittrail = 1
and r.registered_id = '0000000000000000'

Audit configuration for types

select t.name,r.event,r.user_name
from dmi_registry r, dm_type t
where r.is_audittrail = 1
and t.r_object_id = r.registered_id

Audit configuration for Sysobjects

select t.object_name,r.event,r.user_name
from dmi_registry r, dm_sysobject t
where r.is_audittrail = 1
and t.r_object_id = r.registered_id

Audit configuration for users

select t.user_name,r.event,r.user_name
from dmi_registry r, dm_user t
where r.is_audittrail = 1
and t.r_object_id = r.registered_id

Handy DQL: Count Objects by Month or Year

There are times when you want to count objects by month or year for a date attribute. The following DQL shows how to count objects of type dm_document by the month of their creation using the DATETOSTRING function.

select DATETOSTRING(r_creation_date, 'yy/mm') as period, count(*)
from dm_document
group by DATETOSTRING(r_creation_date, 'yy/mm')

You can add a where clause as usual if you want to add conditions to the query. You can also play with the date format but know that some format strings may not work as you might expect them to.

Handy DQL: Find DocApp containing a Type

If several customizations are present in a Documentum repository, it can get tricky to identify where a particular object type comes from. The following DQL can be used to identify the DocApps that contain an object type. Just replace my_object_type with the actual type name below.

select object_name from dm_application
where r_object_id in
(select c.parent_id
from dmr_containment c, dm_app_ref ar, dm_type t
where c.component_id = ar.i_chronicle_id
and ar.application_obj_id = t.r_object_id
and t.name = 'my_object_type')

It is straightforward to modify this DQL to identify containing DocApps for things other than object types.

[UPDATE] The DQL in the original form would only work with DocApps that haven’t been installed more than once in the same repository. The DQL has been corrected by replacing ar.r_object_id with ar.i_chronicle_id.