The headline here is that BLOBspace BLOBs can now be compressed. We made additional enhancements to partition BLOB compression and the auto-compress feature in general. Details follow.
First some terminology. A partition BLOB column is a TEXT or BYTE column that is stored in a DBspace, in the same RSAM partition as the home row. A BLOBspace BLOB column is exactly the same data type (TEXT or BYTE), but the BLOB data is stored in a BLOBspace. The BLOB data is exactly the same; the main difference is in where it’s stored. By default, TEXT and BYTE columns are partition BLOBs (stored in-table). In order to store the BLOB data in a BLOBspace one must use the optional IN <BLOBspace name> clause when creating the column.
In HCL Informix technology version 12.10.xC9, for example, if you create a table with a partition BLOB column like so...
…then load it with 5000 rows and compress that table, intuitively you might expect any new rows you insert from then on to have their partition BLOBs compressed, just as the home rows would be compressed. Unfortunately, that isn’t the case. Newly inserted partition BLOBs will be uncompressed, necessitating periodic compression operations to ensure uncompressed BLOBs do not remain in the table. This deficiency has been corrected in version xC10. Once a compression dictionary has been created for a BLOB column, all new inserts will utilize that dictionary.
This change also helps improve the effect of the auto-compress feature on partition BLOBs. Automatic Compression allows you to mark a table “compressed” from its inception, even though at that time the table contains no rows from which a dictionary could be built. Once a minimum number of rows has been inserted (the threshold defaults to 2000) a compression dictionary is automatically built for the table and from that point on, newly-inserted home rows are compressed. Partition BLOBs were exempt, however; no dictionaries were built by the auto-compress feature for text or byte columns. Starting in xC10 those dictionaries are created, and the change described in the previous paragraph ensures that the dictionaries are used with all subsequent partition BLOB inserts.
Until 12.10.xC10, BLOBspace BLOBs were never compressed under any circumstances. Now they are treated just like any other BLOB column in the table. If you compress the table like so…
… all BLOBs will be compressed, including those stored in BLOBspaces. If you compress only BLOBs:
… both BLOBspace BLOBs and partition BLOBs will be compressed.
BLOB page size matters
There’s one thing to note here: because BLOBspace BLOBs are stored in BLOBpages that have a configurable size, and only one BLOB can be stored on a BLOBpage regardless of how much of the page it utilizes, you have to think a bit about your BLOBpage size in order to benefit from BLOBspace BLOB compression. If you've sized your BLOBpages large enough that your uncompressed BLOBs never spill over onto multiple BLOBpages, compressing them won't save you any space. You'll still be using the same number of BLOBpages. In this kind of case, where your BLOBpage size was configured perfectly to start with, you'd need a smaller BLOBpage size to take advantage of compression. Since you can't modify a BLOBspace's BLOBpage size you'd have to start over with a new BLOBspace and migrate the data over.
The ideal scenario for BLOBspace BLOB compression is one in which your uncompressed BLOBs tend to use multiple BLOBpages. In this case compressing those BLOBs is likely to free up BLOBpages that can be reused for future inserts, saving you space.
Change in auto-compress behavior
Prior to 12.10.xC10, the auto-compress feature would neglect to compress several thousand rows at the beginning of the table. This was less a defect than a consequence of the order of operations during a table’s initial load:
1) Wait until the table contains enough rows to build a decent compression dictionary.
2) Once rows are over the threshold, build the dictionary.
3) Compress any new inserts using the new dictionary.
There was no attempt to go back and compress the original rows that were sampled when we built the dictionary. It’s an extra operation that had to be performed by a separate session from the inserter, potentially clashing with said inserter, and was deemed not worth the trouble. A compression dictionary isn’t built instantaneously, however, and depending on how fast you're loading you might end up with tens of thousands of uncompressed rows at the head of the "compressed" table.
We decided to correct this deficiency in 12.10.xC10. After the compression dictionaries are automatically created we now run a separate, internal “compress” operation in the background, which picks up any uncompressed rows. The only thing to keep in mind is that this background compress operation, which runs in a separate thread from the one creating/loading the table, will keep the table open until the operation completes. As a consequence, an attempt to drop or alter the table immediately after loading it may fail in some cases with a locking or non-exclusive access error.
If necessary, the autocompress feature can be configured a bit in 12.10.xC10. Background compression and automatic BLOBspace BLOB compression may be switched off, for example, using a (currently undocumented) configuration parameter:
Note that the AUTOCOMPRESS parameter may be modified on the fly.
John (JC) Lengyel
Lead Engineer at HCL
Connect with me on LinkedIn
Informix is a trademark of IBM Corporation in at least one jurisdiction and is used under license.