воскресенье, 22 декабря 2013 г.

Interesting backup solution: Google Drive and BSDIFF algorithm in action

I recently wondered if simple files history tracking system with using cloud storage exists and decided to code a working prototype of my own. Let's see how it turns out.


Requirements


Complete solution has to provide following abilities:
  • Uploading file on remote cloud storage
  • Making a snapshot of differences performed to file and uploading it on remote cloud storage
  • Getting any version of file from cloud storage
It has to:
  • Be much simple, fast and portable
  • Not to blow the cloud storage
  • Not to store its data locally
  • Run and initialize on any machine for any user

Choosing instruments


  • Development environment: Microsoft Visual Studio 2012, Windows Presentation Foundation + C# Language
  • Cloud storage: Google Drive as the most popular and simple one. Google Drive API is available on NuGet
  • Binary difference algorithm: BSDIFF as the most used and approved

An idea of the way it must work in


Adding a file for tracking


First of all we need to upload a file to cloud storage. To achieve this, program performs the following steps:
  1. Check authorization state and require singing in Google Account via OAuth 2.0 if it hadn't taken a place
  2. Create a separate directory in cloud for file (directories are recognized by ID, the same uploaded files names are OK)
  3. Create and execute API upload request with chosen file stream passed
  4. Serialize and upload file tracking information - it is needed to not to store tracking info locally
Sample code of upload request is given below:
DriveService driveService = await TryGetAuthorizer();
//Instantiate file entity
File fileToUpload = new File
{
    Title = fileTitle,
    Parents = new ParentReference[] { new ParentReference { Id = parentFolderGoogleDriveId } }
};
var insertUploadRequest = driveService.Files.Insert(fileToUpload, uploadFileStream, fileMimeType);
insertUploadRequest.ChunkSize = FilesResource.InsertMediaUpload.MinimumChunkSize * 2;
IUploadProgress uploadProgress = insertUploadRequest.Upload();
It's some kind of surprise that working with directories in Google Drive API is performed as working with files having a special MIME type "application/vnd.google-apps.folder". Uploading a file to directory is made by adding target directory ID in "parents" parameter.

Performing differences snapshot


Trivial uploading to cloud is far not enough for files history tracking. Program also must be able to make a snapshot with uploading not new version of file, but only differences descriptor between last uploaded and current versions. The steps of making a snapshot are:
  1. Download source file from cloud
  2. Download all existing differences descriptors and apply them to source file in order from the first performed to newest
  3. Make new differences descriptor between last patched file from cloud and current on user's PC using binary diff-algorithm BSDIFF for .NET
  4. Upload the latest descriptor (patch) on cloud in according file's directory
This approach of making a track of differences descriptors provides an ability of recovering any version of file without using a lot of cloud storage space - certain version is acquired by applying a patches queue to source file.

Implementation


As mentioned before, the solution has been developed with Microsoft Visual Studio 2012 using WPF and C#. Its source code is available on GitHub repository (Windows Filse Time Machine for Google Drive). Let's have a look on screenshots:

Google Drive after adding a file for tracking. Solution and file specific directories are created automatically. File directory contains target file and its additional info


Any action is performed with authorization check. This window appears if it's absent in the moment


A file can be added just by dragging it into main application window


This is what the tracked files list interface looks like


Adding a snapshot for file


Getting a certain version of file

Комментариев нет:

Отправить комментарий