In our previous posts, the ReachForce development team has previously demonstrated how a CI/CD process works in a serverless environment and how to incorporate additional components in the deployment process using JSON and NodeJs ,but one piece that has traditionally been missing from a CI/CD process is how you handle the data stores, which will be covered in this blog post. Database Challenges
Those who have been developing software for a while will recall that databases were late to the party when agile methodologies gained popularity. The reasons for this are varied, but in my experience I tend to include the fact that Database Administrators are somewhat more conservative than other software specializations, the lack of database focused automation tools, and the fact that databases have to be concerned with protecting the data in addition to the structure. What that means is that unlike other specializations, you can’t just wholesale replace an executable or other deliverables when you make a change, you have to build everything up and continually rely on the database being in the state that you expect it to be in before you are able to upgrade to the next version. The insidious thing with these challenges is that they feed on each other, so in order to solve one of the issues, you have to solve them all.
Now that we have identified a few challenges, what do we do to solve them? Well, the first is that you have to set up a way to easily deploy DDL and DML changes to your database. There are several common options available in both the Free/Open Source Software and paid world and examining them is beyond the scope of this article. I have personally used Redgate, FlywayDB, and Liquibase often in conjunction with an automation product like Jenkins. Once that is set up, you have a way to easily deploy all of your changes to every environment.
Now comes for the hardest part - you have to enforce discipline and not allow engineers to manually make changes to the databases as that can cause the automated database upgrade process to fail. Software engineers don’t like things to be complicated or time consuming, so the automated process must be easy to run and keep everything up to date with minimal effort on the part of the database team. And we have to write our update scripts in such a way that you maintain the integrity of the data - think alter tables instead of drop and recreate. Now that we have this process set up and running correctly, our databases are in a known state which allows us to do other fun things with the data. Unfortunately, in my experience this is the easy part. Now it is time to move data around and make business decisions based on that data.
As helpful and time saving as running a database as a service is (be it MySQL, Oracle, Redshift, or Mongo), the solutions for keeping them up to date in AWS are essentially identical to the way we would do it if we were managing the database directly on bare metal. The more challenging and interesting processes come in when we start needing to move data around and answer business questions based on it. AWS allows for the centralized management of permissions which simplifies the interconnecting of various resources to process your data and draw conclusions from it, e.g. moving data from our transactional database to a data warehouse. The next step for processing the data is to write the code that does the movement and processing. AWS supports this step via a wide variety of tools. The most data specific tool for this is data pipeline, but it works best in a batch environment. Adding in some Kenisis based tools can help to facilitate streaming processing of data. With both of those (as well as other AWS tool sets), we can export the code in json format which then allows for the automation of creating the environment through your automation engine (e.g. Jenkins). It is even possible to separate the code from some variables and store them in different files so you have one set of files that contain all of the needed code with a separate “values” file that holds things that are environment specific, such as database connection strings or S3 paths.
Now the tearing down and rebuilding of an environment can be automated and require no user intervention. In addition, it can happen rapidly and enable additional tests and assurances so that what is going to production is exactly the same as has been deployed in the lower environments. CI/CD now includes data and its processing.