Considerations when using the Site Snapshot Tool
Establish an appropriate snapshot frequency
The SST allows you to set a frequency at which snapshots of your origin site will be taken. It is important that you establish a reasonable frequency based on the complexity of the targeted site, and how often that site actually changes, (i.e., whether a new snapshot is actually merited). For example, a site that is targeted with a highly regular frequency (for example, daily, at a peak usage time), that is not actually updated that frequently may result in an excessive amount of files included within the NetStorage Storage Group targeted by the snapshots (and possibly exceed the size limit of the Storage Group). In addition to establishing an appropriate frequency, the SST offers additional features that can be incorporated to streamline your snapshots.
Recursion limits, input files, and cookies
The SST downloads URLs and the various files related to URLs. In a snapshot, recursion refers to following actual links — not a directory tree. SST can download the HTML and FTP symbolic links it finds within a URL, as well as the links it finds in those URLs; it can also download images and style-sheets referenced in the URLs.
- There is a recursion level limit in NetStorage: The recursion maximum for a storage group is ten levels. Recursion limits apply to directory depth and following links.
- Take caution when using this option: Depending on the complexity of the target URL, a considerable amount of content may be downloaded with this option enabled. Ensure that this will not require more space than you currently have available.
- Default recursion depth: If omitted, the default recursion depth is set to 10.
- Maximum recursion depth: 10
Know your site complexity
It is unreasonable to assume that a single configured snapshot can appropriately traverse a complex commerce property or web site. With the above points in mind, when incorporating recursion, you should:
- Plan Multiple, Complementary Snapshots: Establish separate snapshots for individual complex pages within your site. This will allow more granularity of control, and promote proper recursion in a snapshot.
- Use Multiple NetStorage Sub-directories: Generate multiple sub-directories within your target Storage Group, and send individual snapshots to a different directory. This will avoid an over-abundance of files in a single directory, allowing for better management.
You can ignore robots
SST can ignore robots - origin files meant to prevent spiders from downloading objects you do not want them to download. SST will obey the robot and potentially not download files you want, unless you tell SST to ignore it.
Using the sst command interface, “more than one configuration” simply means issuing multiple commands via the interface.
Using the SST interface in Control Center however, you can set up a number of different configurations at the same frequency. For example, you could set up three configurations that all download weekly on Sundays at 2:00 a.m.
To illustrate using pseudo-code, you might create three different commands or configurations which, taken together, download your entire site.
-- get the host www.example.com and its page requisites and links -- get the menu objects specified in menus.txt -- get the art and image objects needed for dynamically created pages
How long will the first snapshot take?
Your initial snapshot takes longer than subsequent snapshots.
Downloading an entire site the first time can require a significant amount of time as much as a day or possibly more. Subsequent downloads can take much less time if you download only those files that have been modified since the previous download.
Therefore it is strongly recommended that you take this time requirement into consideration for your first snapshot download, and plan accordingly.