Difference between revisions of "Algorithm"

From Rsyncrypto
(Move page from old installation)
 
m (make sure the equations are displayed)
 
Line 22: Line 22:
  
 
= The Decision Function =
 
= The Decision Function =
In the current version, rsyncrypto uses the same decision function used by gzip. This triggers if the sum of the past 8196 bytes divides by 4096 with zero remainder. In other words, if <math>P_i</math> denotes the plain text at position i and n the current position, then the decision function will trigger if <math>\left( \sum_{i=n-8196}^n P_i \right) \equiv 0 \pmod {4096}</math>
+
In the current version, rsyncrypto uses the same decision function used by gzip. This triggers if the sum of the past 8196 bytes divides by 4096 with zero remainder. In other words, if <math>P_i</math> denotes the plain text at position i and n the current position, then the decision function will trigger if <math>\left( \sum_{i=n-8196}^n P_i \right) \equiv 0 \pmod {4096}</math>

Latest revision as of 05:49, 23 April 2019

Rsyncrypto Algorithm

Rsyncrypto was designed as part of the backup service offered by Lingnu Open Source Consulting. The challenges such a service poses require an answer to two basic needs:

  • Transfer incremental data over poor upload bandwidths
  • Encrypt the data in a way that does not allow the server to decrypt it

Rsyncrypto solves this problem by performing an encryption that maintains a "bounded change" property - re encrypting the same file after a minor change in the plain text should produce a minor change in the cypher text.

Basics

In an attempt to reinvent as little as possible, rsyncrypto is based on industry standards wherever they apply to its unique requirements. In particular, broadly speaking, rsyncrypto uses the standard method of public key encryption, where each file gets its own, randomly generated, symmetric key (called "session key"), and that key is encrypted using a public key into the encrypted file. In addition to that, the algorithms used are as standard as possible. Rsyncrypto uses RSA for the public key encryption and AES for the symmetric key encryption.

Further more, the basic schema of encryption is largely based on CBC mode encryption.

What isn't the same

Straight out CBC does not provide the bounded change property. When re encrypting a slightly changed plain file, the CBC-encrypted file will be different from the point of the change until the end of the file. This is where rsyncrypto deviates from the industry standard.

In particular, rsyncrypto uses a decision function to look at a bounded window of history of plain text. Based on the history, the decision function will "trigger". It is important to understand that, because the decision function looks only at the plain text as input, even if the file changes in length, once the file before and after the change turn the same, sooner or later the decision function will start triggering at the same points along the data. This will happen even without having the old file, encrypted or otherwise, to compare against. This has the very desirable property that rsyncrypto needs very little state about each file in order to repeatedly encrypt it in an rsyncable way. Current version of rsyncrypto needs just 68 bytes of state (assuming 256 bit AES key) in order to encrypt the file again. The old version of the file itself need not be kept.

Plain CBC uses the previous cypher block as a basis for encryption of the current block. This is the mechanism by which a change to the plain text propagates until the end of the file. Since the first encryption block does not have a "previous block", an initial vector (IV) is used as a pseudo previous block for the encryption.

The entire rsyncrypto can be summarized with one sentence. Rsyncrypto uses the standard CBC encryption mode, except every time the decision function triggers, it uses the original IV for the file instead of the previous cypher block.

The Decision Function

In the current version, rsyncrypto uses the same decision function used by gzip. This triggers if the sum of the past 8196 bytes divides by 4096 with zero remainder. In other words, if denotes the plain text at position i and n the current position, then the decision function will trigger if