On my Windows 8.1 laptop with a small SSD (100GB), disk space was running out. When I checked disk usage by folder using TreeSize, I found that C:\Windows\Installer was taking up over 22GB! A quick online research revealed that I couldn’t delete files from this folder safely. However, I also learned that it could be holding files that were no longer in use or needed.
On further search, I found this neat little utility called Patch Cleaner, which looks for such unused files and gives you the option to delete them or back them up to another location. When I ran it, I found that about 15GB out of those 22 were being hogged by the files that were no longer needed! All of these 400+ unused files are MSP files. I have moved these files to an external hard disk just in case there is an issue later. Restoring the files is as easy as copying them back to the installer folder.
Thank you Patch Cleaner for freeing up over 15% of my disk space!
This is a review of the book – Learning Alfresco Web Scripts, written by Ramesh Chauhan and published by PACKT. I did not receive any compensation for writing this review. I did receive an electronic copy for reviewing.
Alfresco web scripts are used to integrate clients with Alfresco via RESTful web services. They are an alternative to CMIS and SOAP and provide tighter integration and extra features.
I found the book easy to follow and an easy read in general. It is suggested for new alfresco developers and offers depth for experienced developers. Some chapters could be used as a reference as well.
The book is structured as follows. It starts with an overview of Alfresco web scripts. Then it gets hands-on right away with a simple web script. It gradually adds detail to the implementation before diving into the details of the architecture and implementation. Then it moves to more practical concerns such as deployment, troubleshooting, and configuration management using Maven. It ends with guidance on extending the framework.
In technical books, I always look for tips and information that comes from experience. Reference material alone doesn’t cut it as much of it may be available online, and the printed material becomes outdated quickly. This book does have tips on when and where to use specific choices, best practice recommendations for various options, and troubleshooting tips for various errors. It also has provides some general tips such as dealing with client limitations.
The approach to the technical material is in a form somewhat similar to a tutorial. It shows code samples, discusses the related concepts, and provides steps for trying it out.
I did notice some naive statements though the intent of the author is obvious in such situations. I could only smile at the statements to the effect that “every question has an answer” and “production servers cannot be restarted”.
On a more serious note, it would have been nice to use a single scenario throughout the book to tie all the examples together into one solution at the end. This was a feedback that I got on my first book, and it made a huge improvement in its second edition.
Overall, I feel that this book a good resource for anyone integrating systems with Alfresco as a back-end system.
If you have been working with Documentum for a while, particularly with administration tasks, you are likely to run into a need to handle very large resultsets in DFC code. The need may arise in a standalone piece of code, or in a custom job, for example. In a job, it may be prudent to do this anyway if you are not sure how big a resultset you may be dealing with.
Here are some pointers to avoid problems in such a situation:
Don’t keep the collection open for a duration longer than necessary. Try to keep the processing out of the loop that reads from the collection. This leads to holding data in memory for later processing, but you don’t want the process to run out of memory. So …
Don’t store large amounts of data in memory. This appears to be a bit of conflicting recommendation with #1. However, we can mitigate the issue in ways other than keeping the collection open. Usually, we can keep just the object IDs in memory from #1. Other metadata can be pulled as needed when processing the specific object. In addition, we can …
Batch it up and limit the batch size. In order to measure the actual resource usage and to deal with known quantities, limit the number of objects we would deal with in one batch. Given that we may be dealing with large resultsets, any help with performance would be welcome. Hints to the rescue …
Consider RETURN_TOP, OPTIMIZE_TOP, and ROW_BASED DQL hints. The TOP hints will use the batch size as an argument. However, know that OPTIMIZE_TOP is not useful if you are using ORDER BY in the query. ROW_BASED is great for fast retrieval and may be the only way if you have to join on repeating attributes. Sometimes, a handy mechanism to batch objects up is to …
Use object IDs with a mask to control the batch size. If there is no obvious grouping of objects for creating batches and you want to mix it up, you can use a mask on the object ID. For example, if you fix the first 14 characters in the object ID, it allows up to 256 objects in that batch.
It was a liberating experience the first time I did it. I could just limit the batch size with a configuration value and not worry about collections or memory running out.
Recently, I had to script DAR installation for a bunch of DAR files as a part of an upgrade effort. The target repository version was 6.7 SP1. During DAR installation, I ran into the DM_OBJ_MGR_E_VERSION_MISMATCH error quite frequently. The message for this error looks like:
MSG: [DM_OBJ_MGR_E_VERSION_MISMATCH]error: “save of object xxxxxxx of type yyyyyy failed because of version mismatch: old version was zzz”;
When this error occurred, I got the same error even with dardeployer. I tried forcing full data dictionary publish and cache cleanup, but the error won’t go away. Eventually, I found a fruitful way to make the error go away consistently.
Each time the error occurred, bouncing the content server service for the repository got rid of the error.
In my Hyper-V setup, I had spaces in the VM names (what we see in the server manager), which by default, also added spaces in folder names where the VHD and the other VM files were stored. When I started using PowerShell to work with the VMs, I had to use quotes around VM names because of the spaces. After some time, I decided to rename the VMs to have the same name as the network names of the VM (the names that can be pinged), which had no spaces. This renaming appeared to be a simple task because there was a right-click > Rename option.
I renamed the VMs easily and got rid of the spaces in the names. However, the folder paths and the VHD names remained unchanged and continued to use the old names with spaces. While my original problem had been resolved, the discrepancy in the VM name and the file/folder names was a new annoyance to me even though I could have ignored it quite easily. If I rename a folder/file (portion of path), I essentially get a new path for the files/folders.
Now, there may be situations where I would actually want to move the VM files from one location to another. Suppose that the disk on which the VM files exist is getting filled up and I need to expand the disk for one of the VMs. In this case, I would like to relocate the files to another drive. In another situation, I might want to distribute my existing VMs to different physical disks to minimize the I/O contention as multiple VMs run concurrently. In both these cases, I need to move the files for an existing VM from one location to another.
There are probably other alternatives to the approach that I adopted. The steps described below just summarize what worked for me and may or may not work in other situations. The following steps accomplished this change for me:
Shut down the VM
Back up the VM folder.
If there are any snapshots for the VM, delete the snapshots. This step may or may not be necessary, but I suspect that there must be references to the AVHD files which are likely to be broken by path changes.
Wait for merge to complete (A merge is indicated by a “Cancel Merge in progress …” button in the Actions pane for the VM. A merge combines the VHD and AVHD files into one VHD file. After the merge is complete, the cancel button disappears and the AVHD files also disappear from the disk.
Create the new VM folder, and move the VHD file to the new folder.
Rename the VHD file now, if renaming the VHD is desired for any reason.
In the Hyper-V manager, open settings for the VM.
Under Hardware > IDE Controller x > Hard Drive, change the path of the VHD file by browsing to the new location/name of the VHD file.
Under Management > Snapshot File Location, select the new path to the base VM folder. Carefully look at the path before changing it so as to keep it at the same level.
Stop Hyper-V service in Windows Services.
Move the remaining contents of the VM folder to the new location.
Hyper-V internally tracks the location of the VM folder/files using symlinks stored in C:\ProgramData\Microsoft\Windows\Hyper-V. The link to the VM under consideration needs to be updated.
Using Windows Explorer or Command line, do the following
Find the symlink with its name matching the VM ID. It is a long ID with xml extension. It can be matched easily by looking at the file with identical name (as the symlink) in the VM folder.
Move this link out of this folder as a backup.
Remove the file with the same name from the cache folder – C:\ProgramData\Microsoft\Windows\Hyper-V\Virtual Machines Cache
Start cmd (Run as Administrator), and run the following commands after inserting the correct VM ID and pathin the commands
cd C:\ProgramData\Microsoft\Windows\Hyper-V\Virtual Machines
When I started tinkering with Hyper-V, I was looking for some guidance on setting up a VM lab behind a cable modem and a router. While I found plenty of how-to posts on specifics of Hyper-V tasks, I found little in terms of networking and other best-practice concerns specific to a home network. Granted that the Hyper-V activities don’t have to be any different on a home network vis-a-vis another kind of network, but I didn’t know enough at the time to be assured about it. Now, I have learned some lessons the hard way, and with this post, I am trying to compile and share my thoughts and experience related to such a setup.
I had the following requirements for this lab setup:
The lab will run a Windows domain.
Some VMs will not be in the domain.
Every VM will be able to access the Internet.
I had numerous questions and dealt with several issues during this setup process. Addressing these questions and issues guided my decision making as discussed below.
One of the first questions that we need to answer is, “What hardware do I need?” The basic requirement for the hardware turns out to be that the CPU should support virtualization. Most servers today will probably meet this requirement, but it won’t hurt to check the CPU features on server under consideration.
Another item to address is the resource capacity such as the number of CPUs and the memory capacity on the motherboard. The answer really depends on the number of VMs you plan to use concurrently. I went with 2 quad-core CPUs and 32GB RAM capacity. I started with 16GB installed and later expanded it to full capacity. One thing to remember about server RAM installation is that there are usually restrictions on each memory stick capacity and the combinations in which the sticks could be installed. It is important to consult the server manual about the allowed combinations and the slots to use for those combinations while buying and installing memory.
As hard disks are cheap, we may consider loading up the server to utilize the available slots. However, the bigger question to answer is whether to RAID or not? If yes, then which RAID configuration? Is there a hardware RAID controller in the server? My server had a software RAID controller only, and I went with RAID 1 to keep identical copies on two disks. Basically, that gave me redundancy with no performance benefit. As the number of VMs grew, I realized that they were all trying to access the same disk resources simultaneously. Further, I was also restricted in the kind and number of disks I could install in the remaining slots. After some research and heartburn, I rebuilt the setup without any RAID. My thought process was that this was just a lab setup, and I figured out a backup approach to ease the restore process should there be a disk failure.
One of the first questions on the software side is, “Which virtualization software should I use?”. This one was easy as I wanted to learn Hyper-V. The VMWare HyperVisor was another option. The free HyperVisor license has some limitations, though no big problem for this setup.
Once decided about Hyper-V, the next question was about the host OS edition (OS installed on the physical server). Initially, I went with the 2008 R2 Server Core with Hyper-V. This provided a minimal text-based interface for managing the server, and I was using the Hyper-V Manager client (remote management) on Windows 7 to manage the VMs. It was a nightmare to make the client connect, and every so often it would stop connecting to the Hyper-V server. I found a utility that temporarily eased my pain, but I was amazed that there was a need for someone to create a utility to configure the client and server for remote management. On the other hand, I have toyed with VMWare HyperVisor a little bit on another server, and the remote management just works without additional setup.
Another issue that I ran into was that certain Windows updates would keep failing on the server. Therefore, I ended up rebuilding the setup with full Windows 2008 R2 Server with Hyper-V role as the host OS. Now, I manage the VMs on the server via Remote Desktop using Server Manager on the host OS (no remote management). In this setup, I have not encountered any of the issues mentioned above.
My key networking concern was about the co-existence of the existing network and a separate Windows domain for the VMs. While it was a bit confusing earlier, the setup turned out to be quite simple. Each VM participates on two networks – a private virtual network (say, with IPs 10.0.1.*) and an external virtual network (say, with IPs 192.168.1.*). In other words, there are two virtual NICs (Network Interface Cards) on each VM. One is connected to the private virtual network for the Windows domain, and the other is connected to the existing network. The connection to the existing network is needed by each VM to connect to the Internet. There are probably other network configurations that would work, but this is what I set up.
Note that the virtual NIC (VNIC) for the private virtual network is not needed on any VM that would not participate in the Windows domain. I did encounter some cryptic issues in connecting to the Internet from the VMs in the Windows domain, even when everything seemed to be working fine. Finally, I ended up with static IP configuration for the VNICs connected to the private virtual network, and DHCP-provided IP addresses for the VNICs connected to the external virtual network. However, I set up static IP reservation for these VNICs in the router. With this setup, all VNICs get fixed IP addresses, and all network connectivity is working properly.
My server hardware has two NIC cards. I have dedicated one to the external virtual network and use the other for connecting to the Hyper-V host.
Windows VM Cloning
Once we have created a VM with the base OS install, it’s easy to create another VM by copying its virtual hard disk (VHD) file. We need a separate install and VHD copy for each OS type/edition. For example, we need separate VHDs for Windows 2008 R2 Server, Windows 7, and Ubuntu Server. For all Windows VMs cloned by copying the VHD file, it will serve you well to run SysPrep with the Generalize option selected. This needs to be done right after the cloned VM is started up for the first time. This process prevents duplication of internal IDs assigned by Windows. Until I learned about SysPrep, the cloned VMs were unable to participate in the Windows domain security properly.
One bottleneck mentioned above is the contention on the disk resources. Therefore, I decided to distribute the VHDs among all the hard disks on the server. This approach allows multiple VMs to be working off separate disks concurrently. Very likely, there will still be multiple VHDs on each disk (once there are more VMs than disks), but that is better than all VHDs being on one disk.
Linux is not an “enlightened” (I do not like this term, but I am using the Hyper-V jargon) guest OS for Hyper-V. Basically, it is not designed to fully participate in the Hyper-V virtualization scheme. I tried a few recent versions of Fedora as the guest OS, but couldn’t get the mouse to work on it. I also ran into some networking issues. On the other hand, setting up a 64-bit Ubuntu VM was a piece of cake. The mouse worked and the network configured automatically.
The following diagram (click to enlarge) summarizes my setup.
I will update the post with additional details if/when I recall them.