< All Blogs
minute read

7 Tips for a Fast Data Load

Blog isometric illustration
Note: These tips were provided by FairCom Vice President of Engineering Services Randal Hoff.

Once you have FairCom DB integrated into your application, you will be faced with a real-world problem: loading massive amounts of data into your database. This can be a time-consuming task.

The first temptation might be to write a script that inserts your data into the database one record at a time. Fortunately, FairCom DB offers many features that streamline this process. Instead of records trickling in one at a time, your FairCom DB database can be gulping multiple records simultaneously.

In one customer case where we’ve used this process, the time to load billions of records--with several indices--went from approximately two weeks to less than two days.

The following tips can be used to speed up the process of inserting data into your FairCom DB database:

1. Turn off transaction processing

Transaction processing control can be turned off during these steps. Assuming you have the data preserved where you can start over in the event of a problem, you don't need transaction processing control for this process.

If you desire to have transaction processing control down the road, then ensure you create the data and index files with TRNLOG file mode active. Once you create the file initially with TRNLOG enabled, you can speed up the operations by disabling TRNLOG programmatically as indicated in the FairCom DB Programmer Reference Guide topic titled Transaction Processing On/Off: http://docs.faircom.com/doc/ctreeplus/29980.htm

Or you can call the cttrnmod program, explained in the topic titled cttrnmod - Change Transaction Mode Utility: http://docs.faircom.com/doc/ctreeplus/49162.htm

Don’t forget to turn transaction processing back on after you have completed the data load. (The topics cited above explain how to do this.)

2. Use SHARED MEMORY protocol

If at all possible, run the data load program on the same machine hosting the FairCom DB data. This will allow the FairCom DB Server to use the shared memory communication protocol which is much faster than TCP/IP.

If you need to use TCP/IP, increase the number of threads loading data to multiple threads per CPU core to compensate for the network latency.

3. Use Direct I/O (V11 and later only)

When using FairCom DB V11 and later, please review Direct I/O support. This will provide some help when building and working with larger files: http://docs.faircom.com/doc/v11ace/66369.htm

4. Multi-thread the inserts

The next way to boost performance is to use one of the non-relational c-tree APIs, such as the ISAM or FairCom DB API.

If you can break the data coming into the program into multiple chunks, these APIs allow you to take advantage of multi-threading to do the inserts. A good rule of thumb is to use one to two threads for each virtual CPU core.

5. Disable indices using CTOPEN_DATAONLY file mode

You can drop the index support when you are doing the data load. This will get the data into the data file in the fastest manner and will avoid the time it takes to update your indices on the fly.

See Opening a Table in the FairCom DB C API Developer's Guide: http://docs.faircom.com/doc/ctreedb/23603.htm

6. Insert in batches

With FairCom DB V10 and newer, you can use batch inserts. This is quicker than individual adds because we can maximize the OS packet size and get the maximum amount of data fed into the FairCom DB Server process with each batch call.

Review FairCom DB batches here: http://docs.faircom.com/doc/ctreedb/15406.htm

7. “Rebuild” to create indices

Once you have all of the data loaded into the data files, do a rebuild to generate the indices. This is the fastest way to build the indices because you now have all the data in the c-tree data files, so the indices can be built from scratch with a known set of data. To generate your indices, use the function call ctdbRebuildTable discussed here: http://docs.faircom.com/doc/ctreedb/48516.htm

Or you can call the ctrbldif program discussed here: http://docs.faircom.com/doc/ctreeplus/31093.htm

To improve the performance of an index rebuild through the Server, increase these two settings in your ctsrvr.cfg file:

MAX_HANDLES http://docs.faircom.com/doc/ctserver/52143.htm

SORT_MEMORY http://docs.faircom.com/doc/ctserver/27977.htm

The tips given above should help you complete the data load process in much less time than a single-threaded program using ctdbWriteRecord() inserts.

Written by:
No items found.
Last Update:
March 22, 2024
Tags:
FairCom DB