h1

Avamar Tidbits

May 18, 2009

I recently have been working with Avamar.  If you are not familiar with Avamar, it’s EMC’s client-server backup and restore solution that includes a unique global data de-duplication technology.  What makes Avamar so unique is that it identifies redundant data at the source, not only reducing the amount of backup data traveling across your network, but drastically decreasing the amount of time for backups to complete. 

To learn more of the basics of Avamar, see the following blog’s:

http://jasonnash.wordpress.com/2009/03/16/avamar-and-vmware/

http://mitchellzblog.blogspot.com/2008/12/what-exactly-is-avamar.html

So, as I was saying, I have been recently working with Avamar and I have come across some things that I have found quite interesting.  I thought I would share some of these factoids, let me know what you think!

One area of great concern when backing up to disk are the multiple points of failure that exist.  To battle these concerns, Avamar provides fault tolerance at several different levels:

  • Avamar ensures protection from disk and data corruption through the use of RAID (redundant array of independent disks).  The type of RAID depends on the particular node type.   The first thing you must know is that all storage nodes have 6 physical disks. Beginning with Avamar release 4.0, two sizes are supported: nodes with 1TB of licensable capacity or 2TB of licensable capacity. Single-node servers with a capacity of 1TB actually have higher performance disks than the 2TB flavor.  The 1TB storage nodes are 6 300GB 15k SAS drives setup in RAID-5, configured into 4 Luns (virtual disks).  If you opt for the 2TB capacity, the storage node will have 6 SATA disks configured with 3 RAID-1 Luns. Just FYI, the Utility node and NDMP accelerator nodes both only have 2 146GB physical disks configured with RAID-1.
  • In an multi-node system (1 utility node & 3 storage nodes or more), Avamar provides failover and fault tolerance across all nodes using RAIN (redundant array of independent nodes).  If a node failure occurs, the Avamar Server continues to function, during this time, backup data for recovery will be reconstructed on the remaining nodes using parity. Once the failed node is replaced (a spare node is always included in a RAIN configuration), the capacity across all disks can be rebalanced using a very simple process.  
  • Stand-alone configurations or what is referred to as 1×2 configurations (which are made up of 1 utility node and 2 storage nodes) do not have the luxury of the RAIN configuration to protect from a node failure so EMC requires that the Avamar be configured with Replication.  Replication is the 3rd type of fault tolerance and is optional for RAIN configurations but come in handy when you have multiple remote offices that you want to replicate to a centrally managed location for protection against server failure.  If your stand-alone Avamar Server or 1×2 Avamar Server were to fail, your backups would be unavailable until the failed node was replaced and the disaster recovery Avamar (that you were replicating too) was replicated back to the production site.  In summary, to take advantage of the RAIN fault tolerance, it is important that you remember that you MUST have a Utility Node and atleast 3 Storage Nodes.  As for the stand-alone Avamar Server or 1×2 Avamar Server, RAIN is not available and it is highly recommended to use replication to a second Avamar Server onsite or preferably offsite for fault tolerance.
  • Avamar protects itself from operational failures through the use of Checkpoints, or read-only snapshots of the server taken twice a day that can be used for server rollbacks.  The checkpoints are actually hard-links to all of the stripes that are validated by performing hash file system (HFS) checks.  HFS checks are ran once a day during the morning cron jobs. If there are backups running when a checkpoint begins, they will be suspended and the system becomes read-only until the checkpoint is complete then the system is made read-write and backups will resume.  Avamar retains the last 2 checkpoints and the last validated checkpoint.  Checkpoints can be created manually at anytime by Administrators as well as deleted. 

Another area that I find quite interesting is actually the key feature of Avamar, the De-duplication process.  If you are like me, you probably wonder how the de-duplication process works.    I will step you through the process that the Avamar server goes through during a backup. 

  1. During a scheduled backup or a backup on demand, the administrator server generates a work order then either pages the client or the client checks in periodically to pickup the work order.
  2. The client’s local file cache is checked to see if the files being backed up have been backed up before, in turn, skipping files that have been previously backed up.
  3. If there is no match in the local cache, the file is divided into variable or different sized chunks.
  4. Each data chunk may be compressed.  A compressed data chunk is then hashed into an atomic hash.
  5. Atomic hashes are combined to create composites.
  6. The atomic and composite hashes are compared to the entries in the client’s hash cache to determine if it has been stored before.
  7. If there is no match, the hash cache is updated and sent to the Avamar server to check if it is present.
  8. If there is no match on the Avamar server, then the hash and the data are both sent to the Avamar server.
  9. This process continues until a single root hash for the backup is created, a single root hash is actually a full point in time backup that through the series of hashes, links to all the files and data that comprise the backup.

Another topic that I have come across with the Avamar that seems to be a hot topic are the differences between backing up databases as opposed to file systems. There are more unique data located inside databases than in file systems.  EMC estimates that the daily change rate is typically around 3% versus 0.3% for file systems.   The first backup or initial backup usually sees a initial change rate of 65% for databases as opposed to 35% for file systems.  EMC also estimates that 100GB/hour is typical for database backups.  It is a best practice to limit Avamar to databases 500GB in size or less due to the fact that backup windows are usually around 10 hours.  Keep in mind, if you are using replication, only backups that are complete are replicated, unfinished backups are replicated the next day.   Avamar will still find and eliminate redundant data speeding up backup speeds and reducing overall storage requirements by tenfold; the drag is waiting for the entire databases to be read.  A helpful thing to consider, offload backup processing to another machine may reduce CPU utilization on the database server performing the backup.

One last interesting tidbit for the today’s blog is the subject of when to schedule your backups.  We discussed earlier that when the checkpoints and HFS check occurs, backups are paused, but backups cannot be run during garbage collection, in fact, garbage collection will not start if backups are running.  Garbage collection is the process the Avamar goes though to remove expired and unused chunks.  The daily maintenance schedule that includes the garbage collection process starts at 6am.  The morning maintenance flows as follows: garbage collection runs for 2hrs then stops no matter whether all expired data is remove or not, it is then followed by a checkpoint, HFS check and a second checkpoint. 

To ensure backups and daily maintenance do not affect each other, I would suggest to start backups after 8pm for a duration for no more than 10hrs; be sure to configure backups to stop by 6am.  After the first initial backup is performed for each client, this should not be an issue because most backups will complete in the first 2 to 3hrs.  If replication is being used, overlapping this with backup activities is unlikely to affect the amount of time required to perform replications and will only have a slight impact on backup performance.  For the record, 18 simultaneous backups can be performed per storage node, unless a checkpoint or HFS check occur.     

Well that is it for today, be sure to check back from time to time for updated interesting facts about Avamar.

Advertisements

3 comments

  1. Thanks for the information. We are now using Avamar at a big client but we’ve been running into problems with a very large HPUX server which houses several 400g+ databases in completely different filesystems. We are finding that just to back up one of the database, with the db offline, the backup is taking over 9 hours to backup the 5 filesystems! Are you using RMAN to get the database backup down to 2-3 hours? Because the backups are taking so long, we are having to run Avamar processes pretty much all day, starting at 6pm, and these run into the garbage collection time. In addition, the avtar.bin process seems to be taking a very large chunk of server capacity when it runs–its continuously the top process in cpu and memory usage, never mind the disk i/o. To minimize it’s footprint, I’ve set up separate Avamar cache files for each Avamar policy but I’m not sure it’s going to be helping until we let it run for a few days. Just curious if anyone else out there has seen Avamar perform in this way.


  2. Nice dox, simple and easy to understand . Being a new in Avamar , its good stuff.


  3. cool stuff , very neatly explained



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: