Compressing Data
Applies To: All. Bzip2 algorithm is available in the Full Version only.

Introduction

Compression Algorithms

Files inside zip archives can be compressed using different algorithms. The ZipArchive Library supports deflate and bzip2 algorithms to be used during compression of zip archives. You should choose the compression algorithms depending on your needs. To select it, use the CZipArchive::SetCompressionMethod() method before compressing a file. There is no need to use this method when decompressing. The decompression process automatically detects the algorithms used.
Sample Code
CZipArchive zip;
// create a new archive
zip.Open(_T("C:\\Temp\\test.zip"), CZipArchive::zipCreate);
// set the compression method
zip.SetCompressionMethod(CZipCompressor::methodBzip2);
zip.AddNewFile(_T("C:\\Temp\\file1.dat"));
zip.Close();

Deflate

It is the most frequently used algorithm in zip archives and supported by all standard zip utilities. The implementation of this algorithm is provided by the Zlib library (see Acknowledgements: Credits and Used Third-Party Code Licensing Information for more information).

Bzip2

It compresses files more efficiently than the deflate algorithm, but is slower. It is supported by PKZIP since version 4.6 and by WinZip since version 10.0. Earlier versions of these programs will not decompress archives that use the bzip2 algorithm. Also, at the time of writing, Windows Compressed Folders are not capable of extracting such archives. The implementation of this algorithm is provided by the bzip2 data compressor (see Acknowledgements: Credits and Used Third-Party Code Licensing Information for more information).

Enabling Bzip2 Functionality

The bzip2 algorithm is available in the Full Version of the Library only and is enabled by default. If you don't need it, you can disable it by commenting out the _ZIP_BZIP2 definition in the file _features.h.

Using External Bzip2 Library

The ZipArchive Library comes already with source files for bzip2 algorithm from the original bzip2 library distribution, but you can also use the bzip2 library if it comes with your system - it is usually true for Linux/OS X systems. To use the bzip2 sources that come with the ZipArchive Library, make sure, that _ZIP_BZIP2_INTERNAL is defined in the _features.h file while compiling the library. Undefine it, to use the bzip2 library that comes with your system.

Compiling with Bzip2 under Linux/OS X

Easy Single File Compression

To quickly add a file to an archive, use the CZipArchive::AddNewFile(CZipAddNewFileInfo&) method or one of its overloads. You need to specify the file to compress. Additionally, you may specify:
Sample Code
CZipArchive zip;
// create a new archive
zip.Open(_T("C:\\Temp\\test.zip"), CZipArchive::zipCreate);
// simple add with the default compression level
zip.AddNewFile(_T("C:\\Temp\\file1.dat"));
// add a file and specify its name inside the archive
// to be different from original
zip.AddNewFile(_T("C:\\Temp\\file2.dat"), _T("renamed.dat"));
// add a file without compression
zip.AddNewFile(_T("C:\\Temp\\file3.dat"), 0);
// add a file with default compression and
// without the path information
zip.AddNewFile(_T("C:\\Temp\\file4.dat"), -1, false);
zip.Close();
/* the resulting archive has the following structure:
\-
|--Temp
| |-file1.dat
| |-file3.dat
|-file4.dat
|-renamed.dat
*/

Callbacks Called

The methods for easy compression can call the following callbacks to notify about the progress: To read more about using callback objects, see Progress Notifications: Using Callback Objects.

Easy Multiple Files Compression

To quickly add a file to an archive, use one of the CZipArchive::AddNewFiles() methods. You need to specify the directory that contains the files to compress. Additionally, you may filter the files and specify:

Using Filters

To have more control over which files are added to an archive, you can use the filters with the CZipArchive::AddNewFiles() method.
Sample Code
// use the namespace where the default filters are located
using namespace ZipArchiveLib;
// create a custom filter that accepts files depending on their size.
class CSizeFileFilter : public CFileFilter
{
ZIP_FILE_USIZE m_uMinSize;
ZIP_FILE_USIZE m_uMaxSize;
public:
CSizeFileFilter(ZIP_FILE_USIZE uMinSize, ZIP_FILE_USIZE uMaxSize, bool bInverted = false)
:m_uMinSize(uMinSize), m_uMaxSize(uMaxSize), CFileFilter(bInverted)
{
}
bool Accept(LPCTSTR, LPCTSTR, const CFileInfo& info)
{
// There is no need to check for a directory,
// because the CFileFilter by default does not handle directories
// (see the HandlesFile() method).
/*if (info.IsDirectory())
return true;*/
// evaluate files based on their size
return info.m_uSize >= m_uMinSize && info.m_uSize <= m_uMaxSize;
}
};
void EasyMultiCompress()
{
CZipArchive zip;
// create a new archive
zip.Open(_T("C:\\Temp\\test.zip"), CZipArchive::zipCreate);
CGroupFileFilter groupFilter;
// add files if their size is not larger than 20kB
groupFilter.Add(new CSizeFileFilter(0, 20 * 1024));
// there is no need to release memory for filters
// - CGroupFileFilter will take care of that
// add files, if their extension is NOT .tmp, .dat or .zip
// (the filters are set to work in the inverted mode).
groupFilter.Add(new CNameFileFilter(_T("*.tmp"), true));
groupFilter.Add(new CNameFileFilter(_T("*.dat"), true));
groupFilter.Add(new CNameFileFilter(_T("*.zip"), true));
// skip traversing temporary directories
groupFilter.Add(new CNameFileFilter(_T("*tmp*"), true, CNameFileFilter::toDirectory));
groupFilter.Add(new CNameFileFilter(_T("*temp*"), true, CNameFileFilter::toDirectory));
zip.AddNewFiles(_T("C:\\Temp"), groupFilter);
zip.Close();
}

Filtering Directories

To match directories, use the _T("*") pattern in the name filter (ZipArchiveLib::CNameFileFilter). The _T("*.*") pattern would only match directories with the dot character in the name. Also, use the ZipArchiveLib::CNameFileFilter::toAll type.

To ignore empty directories with this filter, include the CZipArchive::zipsmIgnoreDirectories in the iSmartLevel parameter of the CZipArchive::AddNewFiles() method.

Sample Code
CZipArchive zip;
zip.Open(_T("C:\\Temp\\test.zip"), CZipArchive::zipCreate);
// to include empty directories, use the following filter
CNameFileFilter filter(_T("*"), false, CNameFileFilter::toAll);
// This will include empty directories
zip.AddNewFiles(_T("C:\\Temp\\Input1\\"), filter);
// This will exclude empty directories.
// The CZipArchive::zipsmIgnoreDirectories flag would be unnecessary,
// if the filter was using CNameFileFilter::toFile.
zip.AddNewFiles(_T("C:\\Temp\\Input2\\"), filter, true, -1, true,
CZipArchive::zipsmSafeSmart | CZipArchive::zipsmIgnoreDirectories);
zip.Close();

Additional Considerations

Callbacks Called

When adding multiple files, the following callbacks are called: To read more about using callback objects when performing multiple operations, see Progress Notifications: Using Callback Objects.

Advanced Compression: More Control Over How Data is Written

The CZipArchive::AddNewFile(CZipAddNewFileInfo&) method and its overrides do most of the work for you, however you may want to have more control over this process. To manually compress a file follow these steps:
Sample Code
CZipArchive zip;
// open an existing archive
zip.Open(_T("C:\\Temp\\test.zip"));
// specify a template for the file to be added
CZipFileHeader templ;
templ.SetFileName(_T("data.txt"));
// set the desired attributes
templ.SetSystemAttr(FILE_ATTRIBUTE_READONLY);
// open the new record in the archive;
// set the maximum compression level
zip.OpenNewFile(templ, 9);
LPCTSTR data1 = _T("This is data\r\n");
LPCTSTR data2 = _T("to be written");
// write data
zip.WriteNewFile(data1, (DWORD)(_tcslen(data1) * sizeof(TCHAR)));
zip.WriteNewFile(data2, (DWORD)(_tcslen(data2) * sizeof(TCHAR)));
// close the new record
zip.CloseNewFile();
// close the archive
zip.Close();

Adding Directories

You can add a directory in two ways: The ZipArchive Library treats files ending with a path separator like directories.
Sample Code
CZipArchive zip;
// create a zip archive
zip.Open(_T("C:\\Temp\\test.zip"), CZipArchive::zipCreate);
// add an existing directory (no files from that directory will be added)
zip.AddNewFile(_T("c:\\windows"), CZipCompressor::levelStore);
// add a non-existing directory
CZipFileHeader header;
// you can skip the next line if you add a path separator to the end of the file name
header.SetSystemAttr(ZipPlatform::GetDefaultDirAttributes());
header.SetFileName(_T("empty dir"));
header.SetModificationTime(time(NULL));
zip.OpenNewFile(header, CZipCompressor::levelStore);
zip.CloseNewFile();
zip.Close();

Other Functionality

Adding Files From Other Archives

If you wish to add to your archive files from other archives and you would like to avoid extracting and then compressing them again, use one of the following methods:
Sample Code
// create a new archive
CZipArchive zipDest;
zipDest.Open(_T("C:\\Temp\\testDest.zip"), CZipArchive::zipCreate);
// open an existing source archive
CZipArchive zipSource;
zipSource.Open(_T("C:\\Temp\\test.zip"));
// add files from the source archive to the destination archive
CZipIndexesArray indexes;
indexes.Add(0);
indexes.Add(1);
zipDest.GetFromArchive(zipSource, indexes);
zipSource.Close();
zipDest.Close();

Multithreaded Compression

Although compression to a single archive from multiple threads is not possible, you can perform multithreaded compression to some extent using the following steps:

Finalizing Archives and Preventing Archive Corruption

During an archive modification, the central directory is removed from the archive and kept in memory. It is written back when you call CZipArchive::Close(). However, if a crash occurs before the central directory is written, the archive will be unusable. You can request writing the central directory back to the archive after each modification with the CZipArchive::SetAutoFinalize() method or perform it manually with the CZipArchive::Finalize() method. You should use the finalizing methods sparingly otherwise the performance can be degraded.

The CZipArchive::Finalize() (called manually or automatically) will not execute when there are any pending changes. See Modification of Archives: Replacing, Renaming, Deleting and Changing Data for more information.

To flush file buffers alone without writing the central directory to the disk, call the CZipArchive::FlushBuffers() method.

When removing files, you can remove them only from the central directory for safety. See CZipArchive::RemoveFile for more information (set the bRemoveData parameter to false).

Segmented Archives

If you finalize a segmented archive in creation, it will not be closed, but its state will be changed from "an archive in creation" to "an existing segmented archive". Finalize a segmented archive, when you have finished adding files to it and you want to begin extracting or testing it. This means that you can finalize a segmented archive only once. However, if after finalizing a segmented archive it turns out that the archive is one segment only, the archive is converted to a normal archive and you can use it as such. If you want to know what is the state of the archive after finalizing it, call the CZipArchive::GetStorage() and then the CZipStorage::IsSegmented() method. The method will return true if the archive was converted to a normal archive.

Committing Modification Changes

To prevent archive corruption you may also want to adjust the commit changes mode. See Modification of Archives: Replacing, Renaming, Deleting and Changing Data for more information.

System Compatibility

Setting Compressor Options

You can adjust the options of the Deflate or the Bzip2 compressor by calling the CZipArchive::SetCompressionOptions() method providing as an argument an appropriate options object derived from the CZipCompressor::COptions class.
Use ZipArchiveLib::CDeflateCompressor::COptions and
ZipArchiveLib::CBzip2Compressor::COptions, respectively. Please refer to the sample code below and the documentation of these classes.
Sample Code
// These headers needs to be included.
#include "DeflateCompressor.h"
#include "Bzip2Compressor.h"
// use the following namespace or prefix the classes with its name
using namespace ZipArchiveLib;
void SetOptions()
{
CZipArchive zip;
CDeflateCompressor::COptions deflateOptions;
// set a larger buffer for deflate compression / decompression
deflateOptions.m_iBufferSize = 4 * 65536;
zip.SetCompressionOptions(&deflateOptions);
CBzip2Compressor::COptions bzip2Options;
// set a smaller buffer for bzip2 compression / decompression
bzip2Options.m_iBufferSize = 65536;
zip.SetCompressionOptions(&bzip2Options);
// ... Process files
}

Additional Considerations (Windows Only)

When your system utilizes large amount of memory while extensive file operations, see Modification of Archives: Replacing, Renaming, Deleting and Changing Data for a possible solution.

See Also API Links

Article ID: 0610231446
Copyright © 2000 - 2022 Artpol Software - Tadeusz Dracz