PEA
file
format specifications version 1.5
PEA file extension
Pea (.pea file
extension),
acronym for
Pack, Encrypt, Authenticate, designs a file format focused on data
security, aiming to provide archiving,
compression and
multi volume file
split (spanning) feature in a single passage,
along with
flexible schemes of optional checksum
/ hash integrity
check
and authenticated
file encryption (AES in EAX or HMAC mode, alternatively
Twofish and
Serpent in EAX mode); PEA file format specifications are
released under
public domain.
PEA
specifications document
PEA file
format specifications and implementation notes (pdf)
PEA file format
compression specs
Pea compression is
optional, at current level of implementation are
defined only following levels: PCOMPRESS0 (store only, no compression),
and PCOMPRESS1..3 based on deflate (reference zlib's compres/uncompres
algorithm code), respectively at compression level 3, 6 and 9.
PEA file format
encryption and hashing specs
PEA format security model
acts
at 3 levels: objects (input files and folders sent to .pea archive),
volumes (output archive file that can be spanned to user defined size)
and streams (the actual output data stream that is formed by multiple
input files and can be written to multiple output volumes);
each one of those levels can
be omitted as needed by the user.
- Object level integrity checking is performed to
detect errors with object level granularity on raw input data and all
associated data (name, size, attributes, date-time);
- Current implementation allows: Checksum (Adler32,
CRC32, CRC64), Hash (MD5, SHA1, RIPEMD-160, SHA256, SHA512, Whirlpool,
SHA3-256, SHA3-512, BLAKE2S, BLAKE2B)
- Volume level integrity check is communication
oriented and allow to discard single corrupted volumes in order to
minimize, in case of error, the retransmission overhead;
- Current implementation allows same Checksum and
Hash algorithms featured by Object level check
- Stream level check offers wide choice of
algorithms
up to authenticated encryption, protecting privacy and authenticity of
a group of objects sharing same security needs, including tags
generated by object level checks;
- Current implementation allows same Checksum and
Hash algorithms featured by Object and Volume levels, plus
Authenticated encryption schemes: HMAC mode AES128, EAX mode
AES128,
AES256, Serpent128, Serpent256, Twofish128, Twofish 256, triple cascade
encrypttion combining AES, Twofish, and Serpent each 256 bit in EAX
mode.
PEA file format volume
spanning specs
Arbitrarily sized
volume
spanning allows the archive to be splitted in volumes of arbitrary
size, with the only constrain of volumes being at least 10 byte bigger
than volume control tag to allow passing (through archive's header)
minimum needed information to the extraction application.
PEA specs revisions
PEA file format standard, as defined in version 1 revision 5
specification, can
store a single stream containing unlimited objects, each up to 2^64
byte
in size; current Pea executable supports 1.5 file format
specifications (practically, archives are memory and filesystem-limited
rather than format limited) and is backward compatible with previous
revisions of the format.
PEA 2.0 file format specifications extend the concepts behind PEA 1.x
file format and can store an unlimited number of stream, but the format
is not actually supported by current Pea archiving utility.
PEA format
specifications table: max file size, compression, security...
Here, a brief table
of
features and limitations applying to file format and to current
implementation:
Feature
|
PEA
file format
|
Current
utility implementation
|
Archive
|
|
PEA archive maximum size is
unlimited, nohigher limit is set by
the format design for maximum archive size, only filesystem size
limitations applies
|
Maximum PEA archive size is
limited to 16 YB (yottabyte), up to 999999 volumes of
2^64-1
byte each.
Please note under currently understanding using 128 bit block
encryption it would be safe not to
encrypt more than 2^64 byte with same key, better staying one or more
orders of magnitude below.
|
Stream
number
|
1.3: single stream;
2.0 unlimited number of
streams;
|
Single stream (1.3 file
format)
|
Output
|
Security
|
Optional Authenticated
Encryption, at stream level only. HMAC mode: AES128, EAX mode:
AES 128 or 256bit, Serpent 128 / 256, Twofish 128 / 256, triple cascade
encrypttion: AES+Twofish+Serpent, Twofish+Serpent+AES,
Serpent+AES+Twofish each 256 bit in EAX mode
|
Integrity
check
|
AE tag (see security section) or
hash or checksum
at
stream level, plus hash or checksum for input objects, and for output
volumes.
Currently supported: Adler32,
CRC32, CRC64 checksum algorithms; MD5, SHA1, RIPEMD-160, SHA-2 and
SHA-3 families, and Whirlpool hash algorithms.
|
Error
correction
|
No scheme featured at current
level of development
|
Communication
recovery
|
Independent volume control
check
allow to identify corrupted volumes (first volume may be needed to know
volume check algorithm)
|
No specific tool
developed;
volume check is done during extraction and then, allowing to repeat
download only of corrupted volumes
|
Data
recovery
|
Stream control tags allow
to
recognize correct streams, if better granularity is needed object
control tags allow to recognize correct objects; input object names and
POD trigger allow to identify objects and stream between the archive
data;
|
No specific tool developed
to
try error resistant data extraction, however object check errors are
reported to identify corrupted and non corrupted data if the extraction
is successful
|
Support
for multi volume output
|
Native, requires a single
pass. Raw file spanning compatible with Unix split command, and
applications like HJSplit and 7-Zip.
|
Volume
number
|
1..unlimited
|
1..999999 (6 digit counter
string in
output
file name, after .pea file extension)
|
Volume
size
|
Volume tag size +1..
unlimited;
first volume must contain at least 10 byte of data to allow parsing of
the archive header, to allow unpacking application to calculate
volume tag size
|
Volume tag size +1..
2^64-1
(qword variable) ; first volume must contain at least 10 byte of data
|
Compression
|
Native, requires single
pass;
schemes:
PCOMPRESS0: no
compression;
PCOMPRESS1..3 based on deflate using zlib's compres/uncompres, level 3,
6 and 9 respectively
|
Solid
archive
|
Not implemented
compression
modes featuring the possibility of creating solid archive
|
Input
|
Input
types
|
1.3: files and dirs;
2.0: files, dirs, metadata
stored as messages triggers
|
Files and dirs (1.3)
|
Maximum number of files/ objects in a PEA archive
|
1..unlimited, theoretically a
PEA archive can accept an unlimied number of input files
|
Host system memory limited
(input object list is stored in a dynamic array of strings)
|
Maximum size of input file for PEA archive
|
0..2^64-1 16 EB maximum size for
each input file
|
0..2^64-1 16 EB maximum size,
likely limited by underlying filesystem technology
|
Input
object qualified name size (size 0 mean that archive object is a
trigger, no input object mapped to the archive object)
|
1..2^16-1 64 KB of characters
under any encoding
|
1..32K (exceeding needs,
longer
values are considered errors)
|
Metadata
|
Objects attributes and
last
modification time, optionally comments and any kind of meta content
using messages
|
Save object attributes and
object last modification time. Restore only object attributes (on
Microsoft Windows), nothing on *x
|
Triple cascaded
encryption: AES, Twofish, Serpent each 256 bit in EAX mode
PEA supports multiple chained encryption, cascading AES,
Twofish, and Sepent, 256 bit in EAX mode
- Each cipher is separately keyed through PBKDF2 or
scrypt (default)
- KFD option
- with PBKDF2
key schedule of each cipher is based on a
different
hash primitive which is run for a different number of iterations:
Whirlpool x 25000 for AES, SHA512 x 50000 for Twofish, SHA3-512 x 75000
for Serpent (Whirlpool is significantly slower than SHA512 that is
slower than SHA3-512). PEA format revision 1.4 introduced variable,
user defined number of KDF
rounds for the triple cascaded encryption, up to 25 million rounds for
each of the 3 algorithms - also, please note rounds are based on 512
bit hash primitives, which are more resources intensive than 256 bit
counterparts.
- with scrypt KDF the key schedule work
load not only impacts on the CPU but also on memory, in order to
increase resilinece to dictionary attacks. Requiring 64 MB up to 1 GB
RAM (depending on the KDF workload option) for each instance severely
increases the requisites to build an hardware setup for brute forcing
the password, making it difficult to implement such a machine with ASIC
or FPGA.
- key schedule of each cipher is provided a separate 96
byte pseudorandom salt
- password is modified when provided as input for key
schedule of each cipher; modification are trivial xor with non secret
values and counters, with the sole purpose to initialize the key
derivation with different values and be a further factor (alongside
different salt, and different hash / iteration number) to guarantee
keys are a statistically independent
- Password verification tag is the xor of the 3
password verification tags of each encryption function, and is written
/ verified after all 3 key initialization functions are completed
before verification
- Each block between password verification tag and
stream authentication tag is encrypted with all 3 ciphers
- A 1..128 bytes block of random data is added after
password verification tag in order to mask exact archive size (this is
the first block to be encrypted/decrypted)
- Each cipher generate its own 128 bit sized stream
authentication tag, tags are concatenated and hashed with SHA3-384; the
SHA3-384 value is checked for verification, this requires all the 3
tags to match to expected values and does not allow ciphers to be
authenticated separately
Multiple encryption, if correctly implemented, is meant under current
understandings to:
- Provide a larger keyspace than each single cipher,
but smaller than the sum of the lengths of keyspaces due possibility of
meet-in-the-middle type of attacks. However, such large keyspace may be
overkilling even in event of
significant quantum computing advancements: Grover's quantum algorithm
which is the best-possible known attack for NP-complete problems
provides a quadratic speed-up over a classic computing. Under those
assumptions, as a role of thumb, a quantum computer will be able to
brute force a 256 bit keyspace not faster than a classic machine can
brute force a 128 bit keyspace, which is currently considered safe by a
wide margin.
- provide a security margin even in case all but one of
the algorithms
used as cipher (or key schedule hash) is compromised by a breakthrough
in cryptanalysis, which seems unlike due the amount of theoretical work
and real life testing behind mainstream primitives available today.
Drawbacks of multiple encryption are:
- The inherent added complexity makes multiple
encryption more prone to implementation errors
- Performing multiple algorithms requires more
computing power and consequently reduces performances.
Performance penalty for cascaded encryption may be decisive for some
classes of applications, but in case of file archiving as for PeaZip,
where many other operations (potentially far slower as read / write to
disk) are involved, the performance hit is quite reasonable:
- Test machine: notebook with Intel Core i7-8565U CPU,
4 physical cores with hyper-threading (8 logical cores), 8 GB RAM, 512
GB PCIe NVMe SSD, NTFS filesystem
- Benchmark creation of PEA archive from 100MB input:
- 7 seconds archive creation, 3 seconds archive
extraction with AES256
EAX, deflate compression, CRC32 and SHA3-256 integrity checks
- 8 seconds archive creation, 4 seconds archive
extraction with Serpent
256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks
(slower than AES and Twofish)
- 10 seconds archive creation, 6 seconds archive
extraction with
AES+Twofish+Serpent 256 EAX, deflate compression, CRC32 and SHA3-256
integrity checks – the purposely slower key schedule, employed at
startup for multiple encryption modes, also account for the extra time
For a more complete
explanation and discussion of the pea format
specifications please see the documentation about Pea archive format
design (.pdf).
Use cases for PEA
archive format
|
When
it is recommended to use PEA format: it is a good choice for reasonably
fast archiving and backup, when
it is needed to guarantee confidentiality (data cannot be accessed
without password), integrity, and autenticity - data can only be
modified by recipient knowing the password, as data is subject to
password-dependent, cryptographically strong verification. |
|
|
|
|
|
|
|
.PEA
|
Author:
Giorgio Tani, 2006
no maximum number of input files
no maximum archive size
2^64 bytes max size for each input file
|
|
|
SPEED
Pea format features average speed, due lightweight, quick Deflate-based
compression algorithm,
and efficient encryption
and hashing algorithms.
|
|
|
|
|
|
|
|
COMPRESSION
RATIO
Pea format features moderate compression, due to fast
Deflate-based compression, comparable with compression ratios of
GZ and classic ZIP format, making it suitable to archive or backup
large quantities of data in reasonable time.
|
|
ADVANCED
OPTIONS
Pea format lacks some features of competing formats, but
offers advanced security focused characteristics, as AES-based
authenticated encryption (can be optionally be replaced by Serpent or
Twofish EAX mode authenticated encryption), and triple cascade
encryption..
|
|
|
|
|
|
|
|
Synopsis: Pea file format
specifications. What .pea file extension stands for? What are pea file
format features in terms of compression ratio, compression speed,
advanced authenticated encryption options?
Topics: pea file
extension specs, pea authenticated encryption
PeaZip > FAQ >
What is PEA file format, features, specs
|