February 7, 2018
7 Tips for a Fast Data Load
Note: These tips were provided by FairCom Vice President of Engineering Services Randal Hoff.
Once you have c-treeACE integrated into your application, you will be faced with a real-world problem: loading massive amounts of data into your database. This can be a time-consuming task.
The first temptation might be to write a script that inserts your data into the database one record at a time. Fortunately, c-treeACE offers many features that streamline this process. Instead of records trickling in one at a time, your c-treeACE database can be gulping multiple records simultaneously.
In one customer case where we’ve used this process, the time to load billions of records–with several indices–went from approximately 2 weeks to less than 2 days.
The following tips can be used to speed up the process of inserting data into your c-treeACE database:
1. Turn off transaction processing
Transaction processing control can be turned off during these steps. Assuming you have the data preserved where you can start over in the event of a problem, you don’t need transaction processing control for this process.
If you desire to have transaction processing control down the road, then ensure you create the data and index files with TRNLOG file mode active. Once you create the file initially with TRNLOG enabled, you can speed up the operations by disabling TRNLOG programmatically as indicated in the c-treeACE Programmer Reference Guide topic titled Transaction Processing On/Off: http://docs.faircom.com/doc/ctreeplus/29980.htm
Or you can call the cttrnmod program, explained in the topic titled cttrnmod – Change Transaction Mode Utility: http://docs.faircom.com/doc/ctreeplus/49162.htm
Don’t forget to turn transaction processing back on after you have completed the data load. (The topics cited above explain how to do this.)
2. Use SHARED MEMORY protocol
If at all possible, run the data load program on the same machine hosting the c-treeACE data. This will allow the c-treeACE Server to use the shared memory communication protocol which is much faster than TCP/IP.
If you need to use TCP/IP, increase the number of threads loading data to multiple threads per CPU core to compensate for the network latency.
3. Use Direct I/O (V11 and later only)
When using c-treeACE V11 and later, please review Direct I/O support. This will provide some help when building and working with larger files: http://docs.faircom.com/doc/v11ace/66369.htm
4. Multi-thread the inserts
The next way to boost performance is to use one of the non-relational c-tree APIs, such as the ISAM or c-treeDB API.
If you can break the data coming into the program into multiple chunks, these APIs allow you to take advantage of multi-threading to do the inserts. A good rule of thumb is to use one to two threads for each virtual CPU core.
5. Disable indices using CTOPEN_DATAONLY file mode
You can drop the index support when you are doing the data load. This will get the data into the data file in the fastest manner and will avoid the time it takes to update your indices on the fly.
See Opening a Table in the c-treeDB C API Developer’s Guide: http://docs.faircom.com/doc/ctreedb/23603.htm
6. Insert in batches
With c-treeACE V10 and newer, you can use batch inserts. This is quicker than individual adds because we can maximize the OS packet size and get the maximum amount of data fed into the c-treeACE Server process with each batch call.
Review c-treeDB batches here: http://docs.faircom.com/doc/ctreedb/15406.htm
7. “Rebuild” to create indices
Once you have all of the data loaded into the data files, do a rebuild to generate the indices. This is the fastest way to build the indices because you now have all the data in the c-tree data files, so the indices can be built from scratch with a known set of data. To generate your indices, use the function call ctdbRebuildTable discussed here: http://docs.faircom.com/doc/ctreedb/48516.htm
Or you can call the ctrbldif program discussed here: http://docs.faircom.com/doc/ctreeplus/31093.htm
To improve the performance of an index rebuild through the Server, increase these two settings in your ctsrvr.cfg file:
The tips given above should help you complete the data load process in much less time than a single-threaded program using ctdbWriteRecord() inserts.