Grant Fritchey of scarydba.com, is a Data Platform MVP, author of multiple books, international speaker and active blogger.
We recently asked Grant for his insight on the challenges facing organizations today in data management. Here's what he shared:
Can you tell us about your background and interest in data management? How did you become an expert in your field?
I started working on databases at the very beginning of my career in IT. I was doing typing to augment my income while I worked in the film industry in New York. One morning I was asked if knew enough about databases that I could help the company build one. I said "Sure" and went to the book store. I read all weekend and I knew enough to get started. From there though, it was another 10 years of work, primarily in development, before I really knew enough to transition to being a full-time data professional.
What are some of the biggest challenges or oversights facing organizations in data management today?
I would say there are two giant issues in front of everyone today. First up, they can see this whole thing around data analysis and data science and they can see it working. Unfortunately, they either don't know how to get it implemented within their organization, or their datasets are so disjointed and dirty that they can't successfully make it work. This leads to my second giant issue, and that is that there are not enough people who both understand the business and understand how to manipulate data in order to arrive at analysis that truly makes a positive impact on the bottom line.
What advice do you find yourself repeating over and over again to organizations about how they manage data?
Stick to the fundamentals. Everyone sees the new shiny stuff being done, sometimes successfully, sometimes not, and they all want in on it. However, not all of us have data that works well in unstructured or semi-structured formats. Sometimes, the old-fashioned relational database actually is the right answer. Further, designing proper data structures to ensure data integrity leads to cleaner data which leads to successful implementation of data analysis. To do data science well, you have to have the fundamentals first.
What are your best practices for building and managing a database?
That will be truly difficult to answer meaningfully here. We have to talk about so much. Design, backups, high availability, disaster recovery, structured data, unstructured data, semi-structured data, transaction volume, throughput, latency, data loss. There's just too much to mention. However, I'll go back to the fundamentals. Just because you've moved to the cloud (which I advocate for pretty vigorously) doesn't mean you're done in terms of high availability and disaster recovery. Everyone knows of the recent outage with AWS on the Eastern Seaboard of the U.S. because of the companies and services that were offline. However, a lot more companies were affected by that outage than you know about. Why don't you know about them? Because they set up their services in more than one data center or even in more than one service. When the one data center went down, they failed over in a classical DR scenario and just kept going. It's always about the fundamentals.
How should organizations approach backing up their data? What are the dos and don'ts? What are the most common oversights and mistakes you see organizations making with data backup?
You must have backups in place and you must test your backups if you care about your data. The single best way to test backups is to restore the database. I'd say this applies to everyone whether you're a mom and pop grocery store or the best funded startup in Silicon Valley. If you can't restore your database in the event of a catastrophe, your business may be finished. Far too many people are relying on some type of replication or mirroring or even that they have a RAID array instead of having tested backups. Then corruption hits their database or the server goes away and suddenly they no longer have a database. Do the backups and do the testing or risk utter failure.
How can organizations cleanse their data more effectively and efficiently? Why is this an important part of data management?
The best way to clean your data is to correctly design your database up front. Ensure only numbers get into numbers columns and dates go into date columns and you're going to be a lot happier. Now that advice doesn't work well if you're looking at an IoT process that's collecting telemetry data from a hundred thousand different locations at once. That's a case where completely unstructured data is the way to go for data collection. However, when you start consuming that data, again, we're back to putting numbers where numbers go and dates where dates go. Get that right and the amount of cleaning after the fact you do is radically reduced. Want to reduce it further, where appropriate, use relational databases. They've been designed and functioning for years to ensure that correct values are stored by enforcing the relationships between tables, thereby helping to keep the data clean. After that, there are services that will help you clean your data, especially stuff like addresses and spatial data. Getting that stuff right is worth the extra overhead.
The reason you're going to clean your data is because you want it to accurately reflect the business. With this accuracy, you can move into analysis and arrive at insights that can make your business billions of dollars.
What are your go-to tools and resources for smarter data management?
I'm absolutely prejudiced here. I work for a software company, Redgate Software, that specializes in tools for the DBA and the database developer. We have management for the complete lifecycle of your database. We're focused on managing and automating your database builds and deployments in a DevOps process. This means you can deliver changes into production faster and safer.
That said, the one tool that I think people are under-utilizing, at least within the Windows space where I do most of my work, is PowerShell. You should be automating as much as humanly possible in order to do more, faster. PowerShell is the way to get that done. Oh, and by the way, many of the Redgate tools work extremely well with PowerShell.
What types of training do you think more data managers should have?
I think more people should have the one thing that I don't have enough of, training in statistics. The data we're working with may be extremely accurate information from health care, banking or manufacturing. However, the way it gets maintained, especially in regards to query optimization, within most database systems is through statistics. To better understand how this works you need a good grounding in the math. Further, more and more companies are moving further into data analysis and data science. The foundation for these processes is again statistics. You'll be able to better serve your business if you speak and understand the language of analysis. That starts with statistics.
What trends or innovations in database management are you following today? Why do they interest you?
If you can't tell from the rest of the interview, it's data analysis. Now, I'm not going to become a data scientist at my age. However, I can learn enough about what they do to better support them. I can get them better structures and more appropriate access. I can learn to tune queries for their particular needs. There's a lot there.
My second passion though is in the cloud. I think the data professional's job is changing and a lot of it is because of services like Azure SQL Database or Azure SQL Data Warehouse. These take most of the more mundane tasks of deciding where to put a disk away from the DBA. Instead, we get to focus on the exciting stuff like database design, query tuning and similar functionality. Also, as I mentioned earlier, moving to the cloud doesn't change some of the fundamentals. We'll still need to configure our DR to work successfully with the cloud and similar issues. However, the work is very different and there's so much to learn. That's why I'm very excited about it. I truly enjoy learning new skills and sharing them with others.
Request your free data analysis to better understand the current health and completeness of your marketing data. Get a free data assessment.