Posted By: David Faust
If you are keeping up with the news from the NoSQL marketplace, it seems the latest thing in NoSQL is, well, SQL. This comes as no surprise to those of us who remember that one definition of “NoSQL” is “Not only SQL.” Until recently, NoSQL vendors have been focused on the many important aspects of NoSQL, such as flexibility and scalability. Now they are learning there is a place for traditional SQL in their NoSQL world. Their success in addressing that challenge will determine if NoSQL can evolve into the mainstream.
Exploring the Schemaless
One of the big advantages of NoSQL is it allows organizations, who are faced with an onslaught of new data coming in from new sources, to make sense of this data. Social media, Internet of Things (IoT), and other big data sources are now streaming data by the terabyte. This data can be explored to reveal trends and correlations that may not have been observed before.
When it comes time for consistent, reliable information, it is time to apply a schema.
In a Forbes article with the provocative title Is Your Data Lying To You?,* Oliver Ratzesberger of Teradata suggests that the schemaless world of NoSQL is valuable for new data coming in from new sources. He asserts that once exploration is done, it is important to look at structure:
“The fact is that schema less/free is great for an initial prototype, but once we move past the prototype stage, the lack of schema quickly becomes a governance nightmare.”
Schema Gives Data Its Meaning
We must distinguish between experimental uses of raw data and more formal disciplines. We can glean many insights from unstructured data, but repeatable results require more structure.
In an article titled Database schemas still needed, despite Hadoop and NoSQL pretensions,* David A. Teich offers this valuable advice: “It’s not until some structure is applied that the data can become real information.”
Indeed, for data to be meaningful, we must define what it means. A column of numbers does not have meaning until we define the units. Does an entry of “1” mean “1 hour,” “1 minute,” or “1 second”? If that number is measuring the duration of a customer support call, is it the total time, including the time the customer spent on hold, or is it only the time spent talking to an agent? If we are going to make any definitive statements about the data, we must know these answers.
During the exploration phase, it was fine to look at unknown data and notice vague trends. For example, if we see a correlation between call duration and customer satisfaction, we don’t need to know exactly how duration was measured to sense a trend. It is when we apply more formal analysis that we need to precisely define the data.
In his article, Ratzesberger* argues that an evolution toward schema is necessary for any data product development process:
“Schema is not just a nuisance. It’s not there to be painful, it’s there to control structure and actually reject mismatches along the way.”
A Change in Schema Can Render Your Database Useless
Schema has another important role: it doesn’t just define the data – it provides a lens through which we can examine incoming data to ensure it complies with our model. If someone changes the source of our data, we need to catch that change and react accordingly.
Oliver Ratzesberger* tells the horror story of a project at eBay in the mid-2000s when hundreds of experiments were rendered useless because the incoming data changed without any controls or governance to catch the change.
In Search of a Query Language
Timothy Stephan of Couchbase is talking about the need to have an effective language for querying and analyzing NoSQL data. In an article in DataInformed, he argues that What NoSQL Needs Most Is SQL.* As you may guess from the title, the language he recommends is SQL. His company has gone as far as partnering with Metanautix to allow NoSQL data to be accessed by standard SQL queries.
Of course, the trick the NoSQL world is struggling to master is adapting a structured query language to an unstructured world.
Developers Need More than a Language
In the decade that FairCom has been providing NoSQL/SQL integration, we have seen two types of customers. The groups can be distinguished by asking the question “Which came first, the data or the app?”
Customers who are developing a database for a specific application (“the app came first”) may be able to design their data store around a relational architecture, complete with schema. They are able to achieve the benefits of a rock-solid, engineering-level database that can be accessed through a variety of interfaces, both SQL and NoSQL.
Another set of customers have existing data stores and must design their applications around them (“the data came first”). This group may be accessing non-relational data gathered from diverse sources such as social media, documents, Internet of Things, etc. In some cases, they may need to deal with legacy data that was never intended for SQL access.
FairCom has focused on providing the tools for both sets of developers. All developers can benefit from features that boost data integrity, such as ACID transaction processing. For those accessing non-relational data through relational APIs, we have developed a suite of tools to make the job possible.
Multiple Record Types in the Same Table
Modern NoSQL data often contains a variety of record types in a single table. Big Data, gathered from multiple sources, is an excellent example. This mix of formats is at odds with the relational model, which requires all records in a table to conform to a single schema.
FairCom engineers overcame this problem with a unique Multiple Record Type Table (MRT Table), which provides robust SQL access to this “mixed” data, eliminating the need to reformat your NoSQL data for SQL access.
In an MRT Table, the record structure varies from record to record so there are multiple schemas in a single table. When accessed via SQL, the MRT Table appears as multiple “virtual” tables, one for each schema. The SQL application is unaware of the non-relational structure of the underlying table because it simply sees multiple tables.
SQL Types SDK
Existing NoSQL data and index files may include field types that are incompatible with SQL access. Field types such as times and dates are the most common examples of data that may pose compatibility concerns.
FairCom developed the SQL Types SDK to enable developers to convert non-standard data types to SQL-compliant equivalents on-the-fly. It intercepts certain operations and runs custom code to manipulate record or field data to make it compatible with SQL. All of this happens transparently as the data is being accessed.
ACID-Compliant Transaction Engine
Transaction control is often thought of as part of the relational model. In reality, it does not have to be added at the SQL level. Transaction processing is built into c‑treeACE at the lowest level, making it available to relational and non-relational APIs.
As the NoSQL market addresses the need for data integrity, we see a trend toward integration of SQL and NoSQL features, particularly ACID (Atomicity, Consistency, Isolation, Durability). The need for a transaction to be reflected consistently throughout the system is a property of ACID. The requirement that data cannot be allowed to remain inconsistent for any length of time, however short, is referred to as “immediate consistency.”
Many NoSQL databases sacrifice immediate consistency to make it easier to scale horizontally. To achieve high throughput, they add nodes (horizontal scaling), each with its own copy of the data. The nodes are synchronized after the transactions are over (“eventual consistency”).
Instead of ACID-compliance, those databases offer BASE (Basically Available, Soft-state, Eventually-consistent), which can result in data that is out-of-sync between nodes. This state is acceptable for some data, such as analytical data that will be used for statistics where inconsistencies may average out across many nodes.
By using a NoSQL database that offers ACID transaction processing, you don’t have to settle for “eventual consistency” when you implement a NoSQL solution.
FairCom has developed a long list of features that thrust NoSQL technology squarely into the mainstream of the enterprise:
row-level and key-level locking
triggers and procedures
Because these features are available all the way down to the NoSQL level, developers are able to mix and match any combination of SQL and NoSQL APIs without sacrificing the features required by mission-critical enterprise applications.
The FairCom Perspective
We are delighted to see the industry embracing our worldview. For over a decade, our flagship c‑treeACE database has integrated SQL and NoSQL into a high-performance engineering-level database solution.
With substantial experience in providing an integrated solution, FairCom has focused on providing the tools developers need to fully take advantage of NoSQL data through SQL APIs. Features such as Multi-Record Tables and the SQL Types SDK allow relational access to data that would not otherwise be accessible though SQL.
SQL: Welcome to the world of NoSQL!
- Is Your Data Lying To You? by Oliver Ratzesberger of Teradata
- What NoSQL Needs Most Is SQL by Timothy Stephan of Couchbase
- Database schemas still needed, despite Hadoop and NoSQL pretensions by David A. Teich