Posted By: David Faust
Data volumes are exploding, and, as a result, database file sizes are proportionately increasing. Managing large files remains an important parameter in sizing disk arrays, which can be quite expensive for high availability systems. Simply moving large data sets for archiving and backup purposes becomes a challenging task. Compressing data is a valuable technique to reduce this data storage challenge. By directly reducing the data record size, file sizes can be substantially reduced.
c‑treeACE now supports Data Compression. Recent challenges of larger HUGE files, downsizing needs, and migration compatibility from other systems that support data compression, as well as specific customer requirements resulted in this core database necessity.
As low-level data records are written to and read from disk, c‑treeACE now intervenes just before they are passed to the operating system’s file system, and will “compress” before writing and “un-compress” after reading each data record.
The default compression algorithm comes from the standard zlib library, written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program.
c‑treeACE also supports a proprietary run-length encoding (RLE) option; yet perhaps even more important is the ability for the user to implement a Call-Back where you can define your own compression technique.
With c‑treeACE compression enabled, testing has observed up to 80:1 compression ratios. This of course depends on the type of data. Text data and sparsely populated tables (lots of zeros and spaces in the data) compress much better. In fact, the FairCom RLE algorithm is best suited to this type of data. More complex data, such as binary values, tend to compress better with zlib compression.
Enable c‑treeACE Data Compression
There are two ways to enable compression support with c‑treeACE: through a server configuration keyword or programmatically at file create time.
Easy Server Setup
The c‑treeACE configuration accepts a COMPRESS_FILE option where you can specify which files to compress by name. Using wildcards, you can specify only the subsets of files you wish to apply compression to. A default compression type is specified with the CMPREC_TYPE option.
Any new files created matching the filename specification will be created with compression enabled. For existing files, it will be necessary to either recreate them, or in some cases, compact the file with the c‑treeACE compact utility or API call to enable the compression (if the filename matches the name specification).
c‑treeACE file compression requires an extended file header, XCREblk.
To enable compression for files created directly in the application use the ctSetCompress() API call. This creates the file to be compression ready. To enable compression, then OR in the ctCompressRec mode into the splval member of theXCREblk extended header.
The ctSetCompress() API can specify the exact level of compression for the chosen compression algorithm. zlib compression can be tuned for speed vs. size. RLE compression can also be configured for a range of values.
Using programmatic routines allow you to very fine tune the compression attributes of the chosen compression algorithm. It also allows you to override the default server configuration option. For example, choose fast performing RLE compression for your very space padded files, and use an optimized zlib compression for other files.
Custom Compression Options
Many customers have invested heavily in tuning particularly well performing compression algorithms. c‑treeACE is always designed for maximum flexibility and features an ability to load your custom compression algorithms. A compression type of “USER” enables custom support. Specify the compression shared library the server should load with the CMPREC_DLL option. Refer to the User Defined Compression documentation for implementing your algorithms for use with c‑treeACE.