Introduction
- The ZipArchive Library allows creating seekable compressed data. Such data is organized
in blocks where some blocks are considered to be synchronization blocks. The decompression
can then start from any of the synchronization blocks.
- Only deflate compression method supports creating seekable data. The deflate method
is used by default by the CZipArchive class (see CZipArchive::SetCompressionMethod()).
- The seekable compressed data cannot be encrypted, because encryption is applied
after data is compressed.
- The seekable compressed data can be created in segmented archives.
- The disadvantage of creating seekable compressed data is a degraded compression
ratio, because each synchronization block starts compression with an empty dictionary
(otherwise the decompression could not start there). You can however adjust the
frequency of creating synchronization blocks to find a balance between a compression
ratio and seeking granularity (see the next paragraph).
Enabling Seeking Feature in the ZipArchive Library
To use the seeking feature, you need to make sure that
_ZIP_SEEK
is defined in the file
_features.h. It is disabled by
default. Rebuild the ZipArchive Library and your application, if you modify this
definition.
Necessary Code Setup
To use the seeking feature, it is needed to include the proper header and use the
ZipArchiveLib
namespace as provided in the sample code below.
Sample Code
#include "DeflateCompressor.h"
using namespace ZipArchiveLib;
All the following samples assume that the above declarations are already made.
Creating Seekable Data
To create seekable compressed data, you need to set the appropriate option for the
ZipArchiveLib::CDeflateCompressor compressor before
compressing a file. The option responsible for controlling creation of synchronization
blocks is the
ZipArchiveLib::CDeflateCompressor::COptions::m_iSyncRatio
member variable. It determines how often the synchronization blocks are created.
See Compressing Data for some more information about
setting compressors options.
Immediately after a file is compressed, you can retrieve the array of the offsets
pairs that describe the location of synchronization blocks and corresponding offsets
in uncompressed data. Use the ZipArchiveLib::CDeflateCompressor::GetOffsetsArray()
method for that. You should save the array returned by this method to a buffer,
because the next compression operation will invalidate this object (also the next
decompression operation may invalidate it). You can use the CZipCompressor::COffsetsArray::Save()
method for that. You will need this array when you will be performing seeking in
the compressed data later.
Sample Code
CZipArchive zip;
zip.Open(_T("C:\\Temp\\archive.zip"), CZipArchive::zipCreate);
CDeflateCompressor::COptions options;
options.m_iSyncRatio = 1;
zip.SetCompressionOptions(&options);
zip.AddNewFile(_T("C:\\Temp\\file1.dat"), CZipCompressor::levelBest);
const CZipCompressor::COptions* pOptions = zip.GetCurrentCompressor()->GetOptions();
ASSERT(pOptions && pOptions->GetType() == CZipCompressor::typeDeflate);
const CDeflateCompressor* pCompressor = (CDeflateCompressor*)zip.GetCurrentCompressor();
CZipCompressor::COffsetsArray* pArray = pCompressor->GetOffsetsArray();
ASSERT(pArray);
CZipAutoBuffer buffer1;
pArray->Save(buffer1);
zip.AddNewFile(_T("C:\\Temp\\file2.dat"), CZipCompressor::levelBest);
CZipAutoBuffer buffer2;
((CDeflateCompressor*)zip.GetCurrentCompressor())->GetOffsetsArray()->Save(buffer2);
options.m_iSyncRatio = 0;
zip.SetCompressionOptions(&options);
zip.Close();
Determining Statistics of the Compressed Data
To find the balance between the compression ratio and the frequency of creating
the synchronization blocks, you can use the
CZipCompressor::COffsetsArray::GetStatistics()
method to gather information about block sizes. You can then adjust the
ZipArchiveLib::CDeflateCompressor::COptions::m_iSyncRatio
value and see how the block sizes change with respect to the compression ratio
(see
CZipFileHeader::GetCompressionRatio()).
Seeking in Compressed Data
To perform seeking in compressed data, you will need an offsets array (CZipCompressor::COffsetsArray)
created during compression. You can load previously saved array with the CZipCompressor::COffsetsArray::Load() method.
Retrieve the desired offsets pair (CZipCompressor::COffsetsPair)
from the array and use it as an argument to one of the CZipArchive::ExtractFile()
methods.
The seeking operation causes CRC value to be ignored while decompressing data. It
has the same effect as calling the CZipArchive::SetIgnoredConsistencyChecks()
method with the CZipArchive::checkLocalCRC argument
for the current file.
Sample Code
zip.Open(_T("C:\\Temp\\archive.zip"));
CZipCompressor::COffsetsArray offsets1;
offsets1.Load(buffer1);
CZipCompressor::COffsetsPair* pPair = offsets1.FindMax(10 * 1024 * 1024);
ASSERT(pPair);
zip.ExtractFile(0, _T("C:\\Temp"), true, NULL, ZipPlatform::fomRegular, pPair);
zip.Close();
Multiple Seeking in Data
You can perform multiple seek and extract operations on a file that is opened for
decompression. This is possible only using the advanced decompression method (see
Extracting Data and Testing Archives for more information). To seek,
use the
CZipArchive::SeekInFile() method and then you
can start decompressing a file with calls to the
CZipArchive::ReadFile()
method.
Sample Code
zip.Open(_T("C:\\Temp\\archive.zip"));
CZipCompressor::COffsetsArray offsets2;
offsets2.Load(buffer2);
zip.OpenFile(1);
CZipCompressor::COffsetsPair* pPair1 = offsets2.FindMin(100 * 1024);
ASSERT(pPair1);
CZipCompressor::COffsetsPair* pPair2 = offsets2.GetAt(offsets2.GetSize() - 1);
ASSERT(pPair2);
CZipAutoBuffer buffer;
buffer.Allocate(64 * 1024);
zip.SeekInFile(pPair1);
DWORD read = zip.ReadFile(buffer, buffer.GetSize());
zip.SeekInFile(pPair2);
read = zip.ReadFile(buffer, buffer.GetSize());
zip.CloseFile();
zip.Close();
The offsets array (CZipCompressor::COffsetsArray) created
during compressing data is necessary when decompressing data, because it contains
locations of synchronizations blocks and the decompression can start only from those
blocks. You can preserve this array in multiple ways (e.g. as a file in archive
or inside another file). One way is to store the array for a particular file in
central extra data of this file. For more information about using extra data, please
refer to Providing Custom Data: Extra Fields. It is recommended that you use
the ZIP_EXTRA_ZARCH_SEEK identifier for extra data.
To save an offsets array to a buffer or to load an array from a buffer, use the
corresponding method:
When saving, the offsets array tries to use 4 bytes for offsets. However, when any
of the offsets does not fit into 4 bytes then 8 bytes are automatically used for
each of the offsets. When loading, the library automatically detects the number
of bytes used previously during saving. To use 8 bytes for offsets, the ZipArchive
Library must be compiled with the Zip64 support (see Zip64 Format: Crossing the Limits of File Sizes and Number of Files and Segments
for more information about Zip64 support).
Sample Code
CZipArchive zip;
zip.Open(_T("C:\\Temp\\archive.zip"), CZipArchive::zipCreate);
CDeflateCompressor::COptions options;
options.m_iSyncRatio = 10;
zip.SetCompressionOptions(&options);
zip.AddNewFile(_T("C:\\Temp\\file1.dat"));
CZipExtraData* extra = zip[0]->m_aCentralExtraData.CreateNew(ZIP_EXTRA_ZARCH_SEEK);
((CDeflateCompressor*)zip.GetCurrentCompressor())
->GetOffsetsArray()->Save(extra->m_data);
zip.Close();
zip.Open(_T("C:\\Temp\\archive.zip"));
ASSERT(zip.GetCount() == 1);
CZipExtraData* extraData = zip[0]->m_aCentralExtraData.Lookup(ZIP_EXTRA_ZARCH_SEEK);
ASSERT(extraData);
CZipCompressor::COffsetsArray offsets;
offsets.Load(extraData->m_data);
zip.Close();
See Also API Links