Handling very large resultsets in DFC code

If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.

Here are some pointers to avoid problems in such a situation:

  1. Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
  2. Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
  3. Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
  4. Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
  5. Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.

It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.

Advertisements

DM_OBJ_MGR_E_VERSION_MISMATCH when installing DAR

Recently, I had to script DAR installation for a bunch of DAR files as a part of an upgrade effort. The target repository version was 6.7 SP1. During DAR installation, I ran into the DM_OBJ_MGR_E_VERSION_MISMATCH error quite frequently. The message for this error looks like:

MSG: [DM_OBJ_MGR_E_VERSION_MISMATCH]error: “save of object xxxxxxx of type yyyyyy failed because of version mismatch: old version was zzz”;

When this error occurred, I got the same error even with dardeployer. I tried forcing full data dictionary publish and cache cleanup, but the error won’t go away. Eventually, I found a fruitful way to make the error go away consistently.

Each time the error occurred, bouncing the content server service for the repository got rid of the error.

xCP 1.6 Sample Application Tutorial – Mail Manager

I recently implemented the sample xCP 1.6 Mail Manager application following the tutorial, and it was an absolute pleasure going through it. There was no real problem encountered with the documentation or with the execution of the steps. The small issues that I had to deal with included some differences in the screenshots (such as Model Type field missing when creating a type in TaskSpace) and a missing step (I was expecting the step by the time I got there, so it’s easy to catch) in composing the application in TaskSpace.

I was expecting some issues as my Documentum environment is set up on Hyper-V VMs, which are not listed in the supported infrastructure. Also, I am using the 64-bit version of the Content Server.  No patches are installed though patch 6 is available at this time. My Documentum setup is summarized below:

Content Server Host

Content Server 6.7 SP1 (64-bit)
Oracle 11.2.0.2 (64-bit)
Windows 2008 R2 SP1 Enterprise (64-bit) (Hyper-V VM)

Application Server Host

Documentum applications 6.7 SP1
Tomcat 6.0.32 (64-bit)
Java 6 u33 (64-bit)
Windows 2008 R2 SP1 Enterprise (64-bit) (Hyper-V VM)

Client/Developer Desktop

Internet Explorer 9.0.9
Java 6 u33 (32-bit)
Windows 7 Ultimate Sp1 (64-bit) (Hyper-V VM)

The xCP 1.5 version of the tutorial may be found on the EMC Community Network. The xCP 1.6 version is available on Powerlink and in the download area.

While following this tutorial was easy, it took quite some patience to get all the products in the xCP bundle set up properly. For example, I installed BAM on Tomcat on the app server host and not on the Java Method Server, and that meant performing various installation steps manually. Spending time on the installation guides for all Documentum components is highly recommended.

What is xCP anyway?

EMC have been emphasizing case management and xCP (xCelerated Composition Platform) for a while now. The introductions and overviews talk about case-based business solutions, configuration rather than customization, pre-built components, and best-practice patterns. All that is fine, but what is the bottom line? What products/components are we talking about? Is this something completely new? Let’s take a look.

Some documentation indicates that xCP consists of

  • TaskSpace
  • Process Builder
  • Forms Builder
  • Process Engine
  • Business Activity Monitor (BAM)

Other documentation details two product bundles under the xCP umbrella – xCP Designer and xCP User. xCP Designer products are used for developing the solutions:

  • Process Builder – used by developers to build process templates
  • Forms Builder – used by developers to build user interfaces for processes
  • Process Reporting Services (should be named Reports Builder) – used by developers to build reports
  • Process Analyzer – used by business analysts to analyze processes before and after they have been built using Process Builder
  • Composer – used by developers to develop, package, and deploy artifacts such as content types

xCP User products are used by the end users of the deployed solutions:

  • TaskSpace – the main user interface for xCP solutions
  • Content Server – core platform
  • Process Engine – executes processes in coordination with the Content Server
  • Process Integrator – formerly Business Process Services, enables (mainly) inbound integration with processes. For example, a incoming Web Service request or an incoming email can initiate a process.
  • Business Activity Monitor (BAM) – collects and prepares process execution data and uses it for serving reports and alerts. In addition to the usual benefits of reporting, it helps with troubleshooting and provides access to historical data.

If we don’t count the core platform and Composer (a core development tool), xCP essentially consists of the former Documentum process suite. At 6.7 SP1, it is probably more robust, better documented, and has many useful features.

So, if you haven’t had a chance to work with it, it probably won’t be too difficult to learn. Your primary hurdle will likely be setting up all these products properly. In my experience, it wasn’t too bad, but it required a decent amount of time and patience with the installation guides. It was also handy to use 3 virtual machines (VMs) – one for Content Server, Process Engine, and database, second for application server (DA, Webtop, TaskSpace, BAM-server, Process Integrator), and the third one for development desktop (Composer, Process Builder, Forms Builder, Process Reporting Services, Browser).

Owner Minimum Permissions Feature Doesn’t Work in D6.6

Documentum Content Server version 6.6 introduced a new feature, “Owner Minimum Permissions”, which allows the minimum permissions for owners to be set across the repository. These permissions would override any ACL based calculations – no ACL could reduce the effective owner permissions to be more restrictive than  the ones set up in this configuration. Note that version 5.3 also has a similar behavior except that the owner minimum permissions are fixed (not configurable) – READ and all extended permissions other than EXTENDED DELETE.

It has been confirmed that this feature does not work as expected in version 6.6. At least for one operating system and database combination, these settings have no effect and the behavior is identical to 5.3. It is quite likely that this problem is not restricted to this one environment configuration.

Updated E20-120 Content Management Foundations Exam

The E20-120 Content Management Foundations Exam is being updated, with the changes going into effect on September 2, 2011. The following topics are being included in the updated exam:

  1. CenterStage search features
  2. Queues
  3. Configuration of workflow reports and dashboards
  4. Business Activity Monitor (BAM)
  5. TaskSpace
  6. Process Reporting Services (PRS)
  7. Integrating workflows with external systems
  8. High-Volume Server (HVS)
  9. Lightweight SysObjects (LWSO)

Among these new topics, only Lightweight SysObjects are discussed adequately in the book Documentum 6.5 Content Management Foundations. The exam candidates, who are using this book for preparation, and plan to take the test after September 2, 2011, will need to use reference documentation or other resources for learning the topics 1-8 listed above.

Handy DQL: Audit Report by Month

Recently, I received a request for a DQL as follows:

In an audit trail, I want to count the number of users who viewed (dm_getfile event)
all the documents that belong to a particular object type every month. These documents are in a specific folder of a cabinet.

I made couple of assumptions in coming up with the DQL, but these are easy to adjust to suit specific needs. When looking for documents in a folder, I assumed the containment to be recursive. If you don’t need to look in the subfolders, just remove “, descend” in the folder() predicate below.

The other assumption I made was that the period of check was 1 calendar year.
For example, create a report for 2009 by each month. Again, if the need is different, the just modify the date range condition.

Alter the following DQL, by replacing <OBJECT_TYPE>, <FOLDER_PATH>, <FOUR_DIGIT_YEAR> below.

SELECT DATETOSTRING(time_stamp, 'yyyy/mm') AS period, COUNT(distinct user_name) AS viewers
FROM dm_audittrail
WHERE object_type = '<OBJECT_TYPE>'
AND event_name = 'dm_getfile'
AND DATEFLOOR(year, "time_stamp") = DATEFLOOR(year, DATE('01/01/<FOUR_DIGIT_YEAR>', 'mm/dd/yyyy'))
AND audited_obj_id IN (
SELECT r_object_id FROM dm_sysobject (all)
WHERE folder('<FOLDER_PATH>', descend)
)
GROUP BY DATETOSTRING(time_stamp, 'yyyy/mm')

A sample result from this query is shown below:

period viewers
2009/01 3
2009/02 1
2009/04 6