Why Data Enrichment Should Be Your First Step After ETL

The first step in taking on a big data initiative is usually the process of offloading the data from legacy systems and other software applications and putting it into the big data infrastructure of choice, such as Hadoop. That means a process called ETL, or Extract, Transform, Load. ETL is a technical term for all the little diddies that have to be done to get data off its original source system and into something else. In this case, that means an infrastructure designed for big data. ETL is the beginning of making your data accessible for marketing endeavors like lead generation, lead nurturing, and content to drive conversions.

Offloaded Data is Generally a Mess

There are numerous proprietary systems for getting ETL done so that you can crank up your big data projects. Many are quite good. But even the best leave data, well, a bit dirty. The fact is, ETL isn't an exact science. That's because every source system, as well as every destination system, has unique formatting. Without getting into the technical nitty gritty, suffice it to say that your ETL should always be immediately followed with data enrichment.

The enrichment process assures that the data is cleansed, completed, validated as not corrupted, and usually removes any duplicated data you may have. For example, say you created a spreadsheet for your 2010 budget, and you saved it out as a completely new document each time you worked on it, for a total of about 20 spreadsheets, only the last of which is worth anything. What if each of your company's 2,000 employees did similar things with their documents and files? Eliminating those duplicates through data enrichment not only improves the quality of any data analysis you run on the data, it also saves a ton of valuable storage space!

Types of Data That Need Enriching After ETL

Data enrichment Data enrichment is more than just data cleansing and de-duplication. It also involves filling in the gaps with information from other sources, such as their social media profiles and online profiles.

Aside from these types of documents and files (text documents, spreadsheets, audio and video files, presentations, etc.), there are several types of data that are particularly in need of data enrichment directly following the ETL process. For example:

Geographic data like postal codes, county names, political districts, etc. tend to change or become outdated relatively quickly. This includes data from things like social media, mobile apps, and customer or marketing databases.
Behavioral data like purchase histories, credit scores, and channels of communication.
Demographic data such as income levels, marital statuses, educational levels, how many children people have, etc.
Psychographic data like a person's hobbies, personal interests, political leanings, etc.
Census data, like household data and data pertinent to a person's community.

Not only do these types of data tend to become outdated rather quickly, but it's also the kind of data that is often incomplete, inaccurate, or both. For example, what about people who deliberately answer census questions wrong to throw off the government? What about people who lie about their income levels or how much education they attained? These types of data are ripe for data enrichment, which can help validate data and fill in the blanks of incomplete data sets.

Data enrichment assures that you start your big data analytics with the most complete, accurate, current information possible. To learn more about how ReachForce SmartForms can help you optimize lead generation and improve your impact on revenue, sign up for a free trial and get a demo today.

data management

New call-to-action

Recent Posts

Subscribe to the Blog