[UPDATE] In version 5.3 SP6 and later, the semantics of “first file” mentioned below is the file at page 0 of the attached content object (no matter how many times the content object has been replaced via “check in as same version”). I don’t have access to an environment where I could try to reproduce the behavior related to check in as same version discussed below. See the discussion in comments for more information on the current behavior.
Of course, I could retrieve the document using one of the available mechanisms (Webtop, DFC, API, etc.) and save it locally and then inspect the file size on the filesystem.
However, if all I wanted to know was the size of the document in the repository one would think it should be easy to tell from the metadata. Well, it is in most cases but if you really wanted to be sure it would take some effort.
The document object (of type dm_sysobject or a subtype) has two attributes r_content_size and r_full_content_size. Most of the time, you could check either of these to get the number of bytes in the content file. However, if the file size is more than 2GB, only r_full_content_size could tell you the correct size.
Is this always true? If you look up the object reference for these attributes, you will find the following in the definition:
Size, in bytes, of the first content file associated with the document.
The key word is “first” and it really means first ever. How can a document have a second content file associated with it. A common scenario is by adding a rendition. For example, you may have a Word document and then you add a PDF rendition. However, this is not a problem since your primary document is still of the size being reported by these atributes.
Now, consider the more interesting scenario. Suppose, you check out the document, make some changes or choose a different local file such that the file size is different and check it in as the same version. Since you are checking in as the same version no new sysobject will be created. However, a new content object (type dmr_content) will be created and associated with the sysobject. The first content object will be disassociated from the sysobject. So the file size is now different but the size attributes on the sysobject are still the same and potentially incorrect!
So where can we get the right information about the size? We need to get closer to the file and look on the content object (dmr_content). The content object is directly associated with the stored file and captures the size in content_size and full_content_size fields. As before, only full_content_size is accurate if the size is more than 2GB.
Identify the content object associated with the sysobject and look at its full_content_size attribute. The following DQL should do it:
from dm_sysobject s, dmr_content c
where s.i_contents_id = c.r_object_id
and s.r_object_id = 'object_id_of_doc'
That’s it, right? Well, most of the time! There is one more exception to consider – when the content file is stored on external storage (It is not clear to me what identifies a storage as external). When the content file is stored on an external storage, the size attributes on the content object are not set. In this case, you cannot really tell the size of the content file without retrieving the file itself.
So how do you know whether the content file is stored on external storage? Look at dmr_content.data_ticket. It is 0 for external storage and for turbo storage. Thus if data_ticket=0 and storage_id does not indicate dm_turbo_store then it is using external storage.