Handling very large resultsets in DFC code

If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.

Here are some pointers to avoid problems in such a situation:

  1. Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
  2. Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
  3. Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
  4. Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
  5. Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.

It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.


Java Evaluation of Docbasic Expressions

Huh? Exactly.

After an upgrade to SP5 from 5.3 SP3 we started getting regular crashes of Weblogic, which was hosting the WDK apps. After looking at various crashes we were still struggling to find something common among the crash instances. Finally, we discovered that every crash that left a dump file showed a stack trace of expression evaluation for conditional value assistance. It didn’t indicate the expression it was evaluating but it did show that it was going outside the JVM via JNI and crashing there.

Fast forwarding to the EMC response. They had encountered something similar in another case and the recommendation was to migrate docbasic expressions to Java implementation! Huh?

Well, Appendix B of DQL Reference describes this process in a reasonable detail. Apparently, for evaluating the conditions in value assistance (among other things) DFC uses a DLL which is a Docbasic runtime. However, it is possible to provide a Java implementation for expressions so that the need for JNI calls can be obviated. The methods available for this purpose are called dmc_MigrateDbExprsToJava and dmc_MigrateDbExprsToJavaForType.

If you follow this documentation in conjunction with the object reference it will all begin to make sense. The only additional worthwhile thing to know, and probably the biggest value addition offered by this post, is the following.

Once you have created the Java implementations for Docbasic expressions, you can disable and enable the use of these Java implementations. However, when we tried to disable the use of these expressions using dmc_SetJavaExprEnabled, the command failed. When we got to the logs, we found various errors but one caught our eye. It complained about missing a parameter named “enabled”. However, the reference names the parameter as “enable” (which turns out to be incorrect). Once we corrected the parameter name to be “enabled” in the command line, it started to work as expected.

In order to troubleshoot this issue, we wrote a small DFC client that iterated through docbase objects and exercised all the conditions present in value assistance. We embedded this client in a JSP so that we could reproduce it in the environment that was encountering the problem. Using this tool, we were able to reproduce the problem consistently. The tests following the change have been encouraging since we haven’t seen any more crashes.

Is Punctuation a Virtue?

Maybe not, if you use a comma or an apostrophe in a string argument to API calls through DFC.

This happened in DFC 5.3 SP2 environment in a call to the queue() method on a workflow item. The user name argument had a comma since the name was in the last name, first name format. Apparently, under the hood, an API call string is created and the string argument is not quoted. As a result, the name is split into two arguments and the following arguments all get shifted by one. This results in a type mismatch on certain arguments and an exception is thrown. The BAD DATE exception is a common symptom of this situation. Note that this problem has been fixed in DFC 5.3 SP3.

The research on Powerlink showed that this is a more common problem and can occur with any API call (as a support note explains). It is not clear if the problem has been fixed for all API calls in DFC 5.3 SP3.

The workaround for the problem is to enclose string arguments in single quotes, as in ‘last name, first name’. An apostrophe in the argument should be escaped with another, as in ‘O”Hare, Chicago’.