Archiving Policy for User Groups

 

The Guillimin Data Storage and Archival (DataSTAR) resource consists of multiple high performance parallel file systems interconnected with a large capacity and high throughput tape library infrastructure.  Access to the tape library system is permitted on a case-by-case basis to designated researcher groups either through RAC allocation or purchase of storage capacity within the tape library system.

The following describes the data archival policy and instructions on how to interact with the tape library system.  In order to have data archived into our tape library, users should follow the instructions below.  Please be aware of the limitations:

 

1. Data archiving location:

We have created archiving spaces where specific project or research groups can upload their data. There are three types of file spaces for each project group (see section 2 for policies affecting each space):

     a. /gs/archive/<RapID>/working_area:

The working_area directory is a temporary working space for users to upload not-ready/incomplete data. Data in this directory will not be archived.

     b. /gs/archive/<RapID>/1copy:

The 1copy directory is for the completed, to-be-archived data. Data may be moved here by the user from the working_area directory, or directly from their project space. The archiving policy will archive data to tape and then delete the data from the disk location b. when the archive process has completed.

     c. /gs/archive/<RapID>/2copy:

The 2copy directory is for the completed, to-be-archived data to be archived twice on two different tapes for redundancy. Data may be moved here by the user from the working_area directory, or directly from their project space. The archiving policy will archive data to tape and then delete the data from the disk location c. when the archive process has completed.

     d.  Users also may request to keep a second copy off-site.  Please contact This email address is being protected from spambots. You need JavaScript enabled to view it. if such external copies of the data archive are required.

 

2. Archiving Policy:

One or two copies of the data will be stored on tape depending on whether the data was placed in the 1copy or 2 copy directory (see section 1).  There are few restrictions for one to work in the data-archiving space:

  1. The project archiving location is hosted on /gs file system under /gs/archive/<RapID> with 5TB of space for each project group. This space is only for users to work on data to be archived or restored. It is not intended as additional disk storage.

  2. The data under /gs/archive/<RapID>/working_area will not be archived, but this space can be used for preparing data to archive. This includes organizing data into suitably sized .tar.gz archives, for example.  Once the data preparation is completed, it should be moved to either to the 1copy or 2copy directories where it will be archived to tape.

  3. When preparing data for archiving, files must be organized into large .tar.gz archives with the size of at least 1 GBytes.  Files with a size below 1 GBytes will not be archived.  This minimum size is required to enable efficient movement of data to and from the tape library system.  

  4. Our archiving task begins at 10AM every day. It will archive all the content in the 1copy and 2copy directories. A maximum of 5 TB can be archived per day.

  5. The 1copy will archive data to only 1 tape. In this mode of operation there is no redundancy so data may be lost if the tape becomes damaged. On average, the failure rate of tapes in the library is about 0.1% per year.

  6. The data in 2copy directory will be archived on two different tapes. This offers some redundancy at the cost of requiring twice as much storage space within the tape library.

  7. Periodically, the list of archived data will be sent to either the Principal Investigator or designated contact person who is responsible for the research group archive usage.

 

3. Restoration Policy:

Users are able to submit requests for data restoration by submitting a support ticket request via email to This email address is being protected from spambots. You need JavaScript enabled to view it. . User should provide a list of data files to be restored.  We will then retrieve the archived data files and restore them by default to the directory /gs/archive/<RapID>/working_area/restore/ or upon request directly back to their project space. Note that the 5TB quota on the archive space also applies to restored data. The quota can be temporarily adjusted upon request.