Posted By: Evaldo de Oliveira
Sometimes old technology is not compatible with the new. Consider the historic data your organization holds. You may have decades’ worth of data about customers, their buying patterns, inventory, and seasonal sales variations…the sort of information gold you would love to be mining.
This data was often gathered by legacy applications running on mainframe computers. Alas, these applications may not have been written with ease of access in mind. The data may have been stored in a format that resembles a flat file more closely than a relational database. It may not have conformed to the modern notion of tables with a rigid schema of consistent rows and columns. Developers, pressed for space—a very expensive commodity in the early days of IT—may have crammed dissimilar records into the same file. The thought of directly accessing this data from a SQL application was out of the question.
Many of those applications ran on character-based “green screen” terminals. In some cases, it was easier to collect the data straight from the screen than to upload it from the unstructured data files. To capture the data, a program would emulate a terminal and navigate through the application, reading the data as it appeared on the screen, a process lovingly referred to as “screen scraping.”
Some organizations have written custom programs to access this data. These programs read the legacy data in its native format and reformat into a more useful structure. Some of these programs use techniques similar to “Extract, Transform, Load” (ETL) to extract the data, transform it into the desired format, and load it into its destination, typically an RDBMS. They may be run nightly, weekly, or monthly to export data from its original source—with the result that the exported data is never fully up-to-date.
Many companies have completely rewritten their applications simply to be able to gain access to their legacy data. This process can be extremely time-consuming and often very costly. Worst of all, it is very risky due to several factors.
Just as the information has accumulated over the years, so has the business logic around it. Your applications have gone through numerous generations of upgrades to serve your needs. If they are custom applications written in-house, you have invested time, effort, and money in fine-tuning them to your needs. They have undergone years of testing, assuring that they do what they are supposed to do. Your staff has invested years of their time in learning these applications.
Let’s examine this option and a preferable approach for addressing legacy data:
- Cut over to a whole new system all-at-once in a single “big bang”
- Update one module at-a-time in a “phased migration”
The Big Bang
The first thought that comes to mind when some companies look at legacy applications is to rewrite them. This means a long process of developing, testing, and deploying. The new application—or, in most cases, the new suite of applications—will be written in a shiny new language (often replacing COBOL with Java or C#). The data will be moved to an RDBMS, so it will need to be reformatted to fit relational schema.
The move to a new relational database will make the data accessible to many modern applications. Unfortunately, it will no longer be accessible to the existing legacy applications. This makes it necessary to cut over to the new database—and the new application—all-at-once. If the legacy application includes a suite of programs (e.g., accounting, G/L, inventory, etc.), they will all need to be updated at the same time—in one “big bang.”
Needless to say, the cost of designing, coding, and testing a suite of new applications is non-trivial. A considerable risk is encountered when cutting over to a brand new version of every mission-critical application. Employee retraining will also be an expense to be considered.
All in all, there is considerable possibility that the big bang will blow up in your face.
Learning Curve: Steep
New technology has allowed a new approach. Many legacy applications were written in COBOL. Some may view this language as extinct, but, to paraphrase Mark Twain, the rumors of its death are greatly exaggerated. In fact, more than 75 percent of the world’s business data is still being processed by COBOL applications. Although today’s developers may scoff at this language, 5 billion lines of COBOL code are written each year.
Most COBOL compilers support the ability to load alternate file systems. Micro Focus provides the ExtFS extension, which can be dynamically loaded at runtime. ACUCOBOL-GT provides a mechanism that allows a program to interface with an external file system simply by setting the DEFAULT_HOST configuration variable. isCOBOL, which is available with an alternate files system, can be further upgraded using an entry in the iscobol.properties file.
All of these COBOL compilers allow the file system to be upgraded without altering the COBOL source code in the original application. The implication is that no changes need to be made in the business logic. Testing is minimized because the application has not changed. Employee retraining is completely eliminated.
If the replacement file system is designed right, it can offer considerable advantages to the application. We have developed a file system based on the same ISAM access that COBOL uses, making it completely compatible with the COBOL runtime. The result is that no changes need to be made with the program’s source code.
We have greatly enhanced the file system’s core ISAM technology to support relational access through a variety of SQL interfaces. They have also built in some welcome features, such as online transaction processing (OLTP) supporting fully ACID (atomicity, consistency, isolation, durability) transactions. These are important, because COBOL is often used in mission-critical applications where data consistency is mandatory.
Working with real-world customer systems, this file system replacement has been able to enhance the performance and reliability of the legacy applications. The addition of SQL access is frosting on the cake.
This approach places modern SQL APIs (including JDBC, ODBC, and ADO.NET) over the legacy data. This achieves the goal of opening up the data to modern reporting, analytical, and Business Intelligence applications, such as Tableau, Oracle BI, Grape City, and Crystal Reports.
Remember that the SQL APIs are added with no change to the data. The existing applications can continue to read and write in exactly the way they always have. As new modules are developed, they have full read and write access to the same data.
This implies that there is no urgency to make application changes. Applications can be a updated in a phased process, accomplished one module at-a-time. Meanwhile, the existing legacy applications can continue to use the same data that the new modules access.
Compared to the big bang approach, companies see reduced chances of experiencing data corruption and loss, they greatly reduce capital costs, and they continue running their applications with no downtime.
Cost: Amortized over time
Learning Curve: Mitigated
A Third Option
With a robust file system and SQL/non-SQL access, some companies are asking, “why migrate at all?” It could be argued that the file system replacement creates a third option: simply keep the applications your business relies upon…at least until you are ready to change. That means you can wait until you have a solid business case for replacing the old system. When you do make a change, you can take your time to get it right because your legacy system will be performing like a champ.
Learning Curve: None
Companies are starting to learn that it is not necessary to sacrifice their legacy applications merely to be able to access their legacy data. If the problem is the data, the best approach is to focus on the data. After all, we’re all living in the era of the data-driven economy.