How big is my document?

[UPDATE] In version 5.3 SP6 and later, the semantics of “first file” mentioned below is the file at page 0 of the attached content object (no matter how many times the content object has been replaced via “check in as same version”). I don’t have access to an environment where I could try to reproduce the behavior related to check in as same version discussed below. See the discussion in comments for more information on the current behavior.


Of course, I could retrieve the document using one of the available mechanisms (Webtop, DFC, API, etc.) and save it locally and then inspect the file size on the filesystem.

However, if all I wanted to know was the size of the document in the repository one would think it should be easy to tell from the metadata. Well, it is in most cases but if you really wanted to be sure it would take some effort.

The document object (of type dm_sysobject or a subtype) has two attributes r_content_size and r_full_content_size. Most of the time, you could check either of these to get the number of bytes in the content file. However, if the file size is more than 2GB, only r_full_content_size could tell you the correct size.

Is this always true? If you look up the object reference for these attributes, you will find the following in the definition:

Size, in bytes, of the first content file associated with the document.

The key word is “first” and it really means first ever. How can a document have a second content file associated with it. A common scenario is by adding a rendition. For example, you may have a Word document and then you add a PDF rendition. However, this is not a problem since your primary document is still of the size being reported by these atributes.

Now, consider the more interesting scenario. Suppose, you check out the document, make some changes or choose a different local file such that the file size is different and check it in as the same version. Since you are checking in as the same version no new sysobject will be created. However, a new content object (type dmr_content) will be created and associated with the sysobject. The first content object will be disassociated from the sysobject. So the file size is now different but the size attributes on the sysobject are still the same and potentially incorrect!

So where can we get the right information about the size? We need to get closer to the file and look on the content object (dmr_content). The content object is directly associated with the stored file and captures the size in content_size and full_content_size fields. As before, only full_content_size is accurate if the size is more than 2GB.

Identify the content object associated with the sysobject and look at its full_content_size attribute. The following DQL should do it:

select c.full_content_size
from dm_sysobject s, dmr_content c
where s.i_contents_id = c.r_object_id
and s.r_object_id = 'object_id_of_doc'

That’s it, right? Well, most of the time! There is one more exception to consider – when the content file is stored on external storage (It is not clear to me what identifies a storage as external). When the content file is stored on an external storage, the size attributes on the content object are not set. In this case, you cannot really tell the size of the content file without retrieving the file itself.

So how do you know whether the content file is stored on external storage? Look at dmr_content.data_ticket. It is 0 for external storage and for turbo storage. Thus if data_ticket=0 and storage_id does not indicate dm_turbo_store then it is using external storage.

Advertisements

7 thoughts on “How big is my document?

  1. How do I find the file size of a content file in Documentum? For one of my files, I see a content size of 1288 in Properties, but I don’t know if 1288 is KB or some other unit of measure.

    Also, is there one place to go to see the total size of the doc base?

    1. Amy, your questions are answered in the post above. In general, size is stored as number of bytes.

      As far as the total size of the docbase goes, check the filestores under Storage using DA. Size for each filestore is displayed there and you could add it up. If you want to query it:

      select name, current_use from dm_filestore

      and then add it up.

  2. Hello,

    I’ve tested your scenario on Documentum 6.5:

    “Suppose, you check out the document, make some changes or choose a different local file such that the file size is different and check it in as the same version. […] So the file size is now different but the size attributes on the sysobject are still the same and potentially incorrect!”

    and when I checked the value of “r_content_size” and “r_full_content_size” they contained the correct size of the new checked in file.

    Are you sure that in this sentence “Size, in bytes, of the first content file associated with the document.” the syntax “first content” refers to that ?

    1. Ana,

      This post was written based on the experience with the version 5.3. The object reference for 6.5 still says the same thing. If you discover different semantics for the “first content”, please share here.

      1. Hello again,

        I really don’t know what “first content file” actually means although I’ve searched everywhere. But I’ve noticed that this syntax is highly used in “DQL Reference Manual 6.5” and as a I understood is somehow associated with the page number 0 from dmr_content object. If this is correct, then it doesn’t matter how many times a content of an object is modified (by using checkin as same version, or DFC setFile()) because the content will always be added to the first content file and the value of r_content_size will be the correct size of the content.

  3. Ana,

    You are correct. This behavior is the same as you describe in 5.3 SP6. First content file should mean page 0 of the attached content object.

    I do not recall the exact version based on which I wrote this post. However, we were facing some metadata discrepancies at the time and, if I recall correctly, the EMC support engineer explained some of them away citing the above semantics of the “first file”.

    In any case I will update the post with a correction. Thanks for providing this valuable feedback.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s