Segmented Archives: Splitting and Spanning
Applies To: All

Introduction

The ZipArchive Library can create segmented archives using the following methods: splitting, binary splitting and spanning. The differences between splitting and spanning are summarized below:

Splitting Spanning
Destination media not limited to any removable
Archive Structure splits into volumes
(usually in the same folder)
spans multiple disks
Naming extension is based on the volume number,
(it is possible to implement a custom naming scheme)
each volume has the same name
Single Volume Size declared by the user when creating an archive auto-detected from the free space on the current disk
Callback not needed, but possible needed for changing volume

Conversion Between Split and Spanned Archives

To convert between split and spanned archives, it is enough to change the names of volumes and copy the volumes to appropriate locations.

Limits in Number of Volumes

Zip format has the following limits on the number of volumes:

Splitting Spanning
Standard Zip Format 65,535 999
Zip64 Format 4,294,967,295 - 1 4,294,967,295 - 1

Splitting: All Volumes in One Folder

The volumes of a split archive are usually located in the same folder. You need to specify a size of a single volume when creating a split archive. Internal zip structures such as file headers, are not split across volumes in regular split. This may result in a volume size being slightly smaller from the declared size, when the structure could not fit entirely into the current volume and it was stored in the next volume instead. If the declared volume size is too small to hold an entire internal structure, this particular volume will be enlarged. It is recommended to use volumes sizes not smaller than 64KB.

Under Linux/OS X, when you are opening an existing split archive, use CZipArchive::zipOpenSplit mode when calling the CZipArchive::Open(LPCTSTR) method. This is caused by the lack of the implementation of the ZipPlatform::IsDriveRemovable() function and the device containing the archive is always assumed to be removable.

Sample Code
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
CZipArchive zip;
// specify the segment size to be 1MB
zip.Open(zipFileName, CZipArchive::zipCreateSplit, 1024 * 1024);
zip.AddNewFile(_T("C:\\Temp\\big.dat"));
zip.Close();
// the segmentation type will be auto-detected as splitting
// (the archive is on a non-removable device)
zip.Open(zipFileName);
// under Linux/OS X, call instead: zip.Open(zipFileName, CZipArchive::zipOpenSplit);
zip.ExtractFile(0, _T("C:\\Temp"), false, _T("big.ext"));
zip.Close();

Using Callback with Split Archives

Using callback with split archives is not necessary, but possible. This is useful when you e.g. need to have the possibility to prompt a user for a location of a volume or perform some other actions.

When the callback is set, the CZipCallback::Callback method will be called every time a volume changes.

Sample Code
class CSplitCallback : public CZipSegmCallback
{
bool Callback(ZIP_SIZE_TYPE)
{
switch (m_iCode)
{
case scVolumeNeededForRead:
case scVolumeNeededForWrite:
case scFileNameDuplicated:
{
if (m_iCode == scFileNameDuplicated)
{
// it can happen only when writing an archive;
// delete the file, if it already exists
// it would be more optimal to check for the file existence
// when scVolumeNeededForWrite was called to save one turn, but
// this code is provided to illustrate the possible events
if (!ZipPlatform::RemoveFile(m_szExternalFile))
{
_tprintf(_T("Removing of the existing file failed."));
return false;
}
}
// it would be possible here to change the filename of the archive volume
// and assign to m_szExternalFile
break;
}
case scFileCreationFailure:
_tprintf(_T("Could not create the file. \
Check, if you have write permissions to the given location.\r\n"));
// abort processing
return false;
case scFileNotFound:
_tprintf(_T("The given volume could not be found.\r\n"));
// abort processing, although we could ask a user here
// to provide the location of our volume
return false;
default:
_tprintf(_T("An unexpected code detected.\r\n"));
// abort processing
return false;
break;
}
return true;
}
};
void SplittingWithCallback()
{
// this code is identical to the previous sample with the
// exception of setting the callback
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
CZipArchive zip;
CSplitCallback callback;
// set the callback before creating the archive;
// note the second parameter value
zip.SetSegmCallback(&callback, CZipArchive::scSplit);
zip.Open(zipFileName, CZipArchive::zipCreateSplit, 1024 * 1024);
zip.AddNewFile(_T("C:\\Temp\\big.dat"));
zip.Close();
return;
// under Linux/OS X, call instead: zip.Open(zipFileName, CZipArchive::zipOpenSplit);
zip.Open(zipFileName);
zip.ExtractFile(0, _T("C:\\Temp"), false, _T("big.ext"));
zip.Close();
}

Custom Naming Scheme of Volumes

You can implement a custom naming scheme of volumes for split archives. In order to do that: If the last volume name is different from the archive name, you can retrieve it when closing the archive (it is the return value of the CZipArchive::Close() method).
Sample Code
class CCustomNamesHandler : public CZipSplitNamesHandler
{
public:
CZipString GetVolumeName(const CZipString& archiveName,
ZIP_VOLUME_TYPE uCurrentVolume,
ZipArchiveLib::CBitFlag flags) const
{
CZipString szExt;
if (uCurrentVolume < 1000)
szExt.Format(_T("vol%.3u"), uCurrentVolume);
else
szExt.Format(_T("vol%u"), uCurrentVolume);
if (flags.IsSetAny(CZipSplitNamesHandler::flExisting))
{
// change the extension, if archive name is the name of an existing archive
CZipPathComponent zpc(archiveName);
zpc.SetExtension(szExt);
return zpc.GetFullPath();
}
else
{
// otherwise, just append the extension
return archiveName + _T(".") + szExt;
}
}
};
void CustomNaming()
{
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
CZipArchive zip;
CCustomNamesHandler namesHandler;
// set a custom names handler before creating of the archive
zip.SetSplitNamesHandler(namesHandler);
// specify the segment size to be 1MB
zip.Open(zipFileName, CZipArchive::zipCreateSplit, 1024 * 1024);
zip.AddNewFile(_T("C:\\Temp\\big.dat"));
// get the last volume name - needed for opening of the archive
CZipString szLastVolumeName = zip.Close();
if (szLastVolumeName.IsEmpty())
{
_tprintf(_T("An unexpected error ocurred.\r\n"));
return;
}
// set a custom names handler before opening of the archive
zip.SetSplitNamesHandler(namesHandler);
// under Linux/OS X, call instead: zip.Open(zipFileName, CZipArchive::zipOpenSplit);
zip.Open(szLastVolumeName);
zip.ExtractFile(0, _T("C:\\Temp"), false, _T("big.ext"));
zip.Close();
}

Binary Split

The binary splitting produces archives with the internal structure of a single-segment archive, but splits the archive into multiple files. Here is the comparison between the regular splitting and the binary splitting:

Regular Splitting Binary Spanning
Internal Archive Structure Multi-segment. Each volume is logically represented inside of the archive. Single-segment archive.
Volumes Extension Replaced with z%.2u pattern to create volume filenames (e.g. archive.z01). Consecutive numbers (%.3u pattern) are appended as an extension to an archive filename (e.g. archive.zip.001).
Last Volume's Filename The same as the filename of the archive provided to the CZipArchive::Open(LPCTSTR) method (does not contain a volume number). The filename is formed as any other volume name (contains a volume number).
Default Name Handler CZipRegularSplitNamesHandler CZipBinSplitNamesHandler
Opening of Existing Archive The mode is automatically detected. You need to open the last volume. You need to specify CZipArchive::zipOpenBinSplit when calling the CZipArchive::Open(LPCTSTR) method. You need to open the last volume.
Sample Code
CZipString zipFileName = _T("C:\\Temp\\test.zip");
CZipArchive zip;
// specify the segment size to be 1MB
zip.Open(zipFileName, CZipArchive::zipCreateBinSplit, 1024 * 1024);
zip.AddNewFile(_T("C:\\Temp\\big.dat"));
// get the last volume name - needed for opening of the archive
zipFileName = zip.Close();
if (zipFileName.IsEmpty())
{
_tprintf(_T("An unexpected error ocurred.\r\n"));
return;
}
// the segmentation mode needs to be specified
zip.Open(zipFileName, CZipArchive::zipOpenBinSplit);
zip.ExtractFile(0, _T("C:\\Temp"), false, _T("big.ext"));
zip.Close();

Spanning: Use on Removable Media

Sample Code
#include <conio.h> // for _getch()
class CSpanCallback : public CZipSegmCallback
{
bool Callback(ZIP_SIZE_TYPE)
{
switch (m_iCode)
{
case scVolumeNeededForRead:
case scVolumeNeededForWrite:
_tprintf(_T("Insert the disk number %d\r\n"), m_uVolumeNeeded);
break;
case scFileNameDuplicated:
_tprintf(_T("The file with the given name already \
exists on the disk.\r\n"));
break;
case scCannotSetVolLabel:
_tprintf(_T("Cannot set the disk volume label. \
Check if the disk is not write-protected.\r\n"));
break;
case scFileCreationFailure:
_tprintf(_T("Could not create file. \
Check if the disk is not write-protected.\r\n"));
break;
default:
_tprintf(_T("An unexpected code detected.\r\n"));
return false;
break;
}
_getch();
_tprintf(_T("...\r\n"));
// return false here to abort processing
return true;
}
};
void Spanning()
{
LPCTSTR zipFileName = _T("a:\\test.zip");
CZipArchive zip;
CSpanCallback callback;
// set the callback before creating the archive
zip.SetSegmCallback(&callback);
zip.Open(zipFileName, CZipArchive::zipCreateSpan);
zip.AddNewFile(_T("C:\\Temp\\big.dat"));
zip.Close();
// the callback is already set
// and the segmentation type will be auto-detected as spanning
// (the archive is on a removable device)
zip.Open(zipFileName);
zip.ExtractFile(0, _T("C:\\Temp"), false, _T("big.ext"));
zip.Close();
}

Detecting Last Disk in Drive

When extracting a spanned archive, you need to insert the last disk into the drive before opening the archive. The central directory written on it and the extraction starts from reading the central directory. There is no simple way to detect, if the right disk is in the drive, but the ZipArchive Library throws the CZipException with the CZipException::cdirNotFound code, when the archive you are trying to open does not have the central directory. In case of a spanned archive, it may mean that a user has not inserted the last disk into the drive.

Recovering from Invalid Disk Inserted

Invalid Last Disk

To recover from the situation when a user does not insert the last disk:

Invalid Last Disk

To recover from the situation when a user does not insert a correct disk during extraction:

Callbacks Called

While processing a segmented archive the following callbacks that are called are the most important: To read more about using callback objects, see Progress Notifications: Using Callback Objects.

See Also API Links

Article ID: 0610051553
Copyright © 2000 - 2022 Artpol Software - Tadeusz Dracz