As businesses grow, and the amount of data increases, software developers are faced with a challenging problem — how to store and process more data without losing performance?
The idea that traditional databases are no longer viable for some business cases has dawned upon many. Significant changes are required to maintain the needed capabilities of systems that used to be applicable but have suddenly lost the ability to handle the same processes on time.
And so, we are starting to see a natural shift towards more optimal technologies — the so-called NoSQL databases — designed specifically to handle certain tasks with greater proficiency. Although many would argue that NoSQL DBs limit users greatly in their functionality, the idea to sacrifice some flexibility for performance is the desired effect, not a drawback.
There are several considerations on whether to migrate to NoSQL:
– Data complexity
– Development time
– Business needs
Understanding which of these we can sacrifice, and which are critical for our system, will determine whether NoSQL is needed, and in what way.
· Deciding on a DB
Of course, less functionality means that we need to know which database to utilize in a specific scenario. If our business case suggests looking up a value based on an index, we would use a Key-Value DB. If we need complex objects with nested data, perhaps a Document database would be appropriate. Analyzing the relational properties between objects would infer a Graph database. In fact, there are dozens of different database types, that supply optimizations to specific functionality.
· The drawbacks
Does that mean that if I choose a NoSQL DB, I’ll lose my ability to use relationships, or things such as transactions or stored procedures?
Depending on our choices, we may end up losing important functionality. Some DBs provide support for several of the features we would normally find in an RDBMS. However, they are not optimized to work in a way that makes these options efficient. Let’s take MongoDB for example — it would be much faster to read and write documents to Mongo than to MSSQL. Our storage requirements would be less. Searching for data within the document would be much faster. And we could still create relations between documents by key. But executing complex queries that need to build those relations would result in much slower performance than it would in a relational database.
But then what if we require multiple efficient features from our database? How do we choose a single one?
Well, we don’t necessarily have to. In a case where the requirements are to find certain text within a document and then detect all related stats to those documents, we could use one DB to manage our data search needs to find the text, and another DB to satisfy our stat aggregation needs.
So, does that mean that the complexity of the system is greatly increased?
Not at all. While the infrastructure may need some adjustment to meet the new requirements, our code can remain relatively simple. We just need to start thinking a bit differently and model our data in ways that are more suitable to such an approach.
No more stored procedures, no more ACID, no more transactions, no more guarantees for data with missing relations. How could we possibly expect our data to be consistent and correct? How can we safely manipulate our data?
Everything we’re missing can be managed by our backend in the way we want it — it may be more work, but it also gives us much greater control to execute everything we want exactly the way we want it. We may need to sacrifice more development time to ensure we get the performance benefits of NoSQL DBs. But that mindset comes from a certain complacency — SQL DBs had it all, which meant we didn’t have to do all the extra coding before, therefore we are now wasting time doing it. We still have to do the work in SQL — design schemas, adjust for data types, decide on foreign keys, configure relationships, create stored procedures, account for rollbacks, table locks, dirty reads, erroneous writes, non-unique hashes, and many, many more.
We are not sacrificing that much to be unable to get back in a few lines of code. Throwing away many of the above will result in easier development, and new ways to think about our data.
· Schemas — redesign or start from scratch
Migrating to NoSQL means adjusting not just the functionality of our system, but more importantly — the way the data itself is structured.
And this may come as a surprise to some, however, in NoSQL, we don’t need a schema. We don’t need strict rules to store our data. We don’t need constraints or even tables. There is no requirement to structure data. That doesn’t mean we cannot use these methods. Only that we are not forced to.
That leaves the only obstacle in the migration — how was the old model implemented?
In any case, the schema will require alterations — my advice is to completely rewrite everything to fit the new technology. However, if you’re limited on resources and your old implementation allows for it, you can keep old business logic, and only rewrite some parts of your data transformation processes, and even get away with unoptimized models and a vanilla NoSQL configuration.
However, we still need to migrate the data.
Fortunately, many NoSQL DBs support dedicated migration tools. There are ways to migrate schemas into a contextual structure in some DBs. Tables can be converted into collections. Records can be made into objects, arrays, even complex structures that were previously unavailable.
With the proper tool and a compatible design on the old DB, migrating from SQL to NoSQL could even be a pleasant endeavour. Tools that are sophisticated enough will automatically build all required structures, and transfer all data straight into the new solution, without much more than the press of a button.
Relational databases, which have a more complex model, however, will require some manual power.
That’s where tools, such as Object Mappers come in. By utilizing a Relational Mapper, we can programmatically extract the data from the RDB and transform it into the desired object, that will fit the model. Then, if we were to use a document database as the new destination, a Document Mapper can help us adjust to fit the document DB’s requirements and insert the new object.
Object Mappers enable us to connect to the databases, create the methods we need to: extract the data from the source, transform it so it meets our new model requirements, and load the data into our new solution.
· Making up for missing RDBMS features
There are many factors to consider when migrating to NoSQL, which stem, not just from missing functionality, but from the complete overhaul in the way we think about data in a non-relational, non-ACID way.
The way relationships are implemented in NoSQL databases results in some side effects. When migrating data, a strategy needs to be put in place to identify and handle cases where data may become incomplete or inconsistent. Orphan records, missing data, missing record identity — these are all issues that were taken care of by default in older implementations.
The biggest concern to developers is perhaps the lack of an ACID (Atomicity, Consistency, Isolation, Durability) transactional process. This is indeed the tricky part and the thing that requires the greatest amount of consideration, development time, and code. Keep in mind that to be able to replicate this behaviour, we may need to drastically alter our system.
NoSQL DBs rely on eventual consistency, meaning that data may not be immediately available or correct whenever a process completes. It will only be complete at an unknown time. Therefore, we have no way of knowing whether some data is consistent with the processes we’ve executed on it just by looking at it. This can be overcome by designing code to emulate a transactional process, or by adjusting the system’s architecture to make it event-based to keep track of when and if data is updated. Of course, all these need to be considered before starting the database implementation.
Luckily, some features can be replicated or can exist out-of-the-box (although they are not strictly enforced). Data types, indexes, data structures, security features — can increase productivity and consistency greatly, even if they no longer look the same.
In the end, losing even a few milliseconds on a query may cascade into significant losses down the line. NoSQL DBs give great options in vast areas of data management and should definitely be considered when facing expanding system requirements.