Data enrichment is a general term that refers to processes used to enhance, refine or otherwise improve raw data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It also shows the common imperative of proactively using this data in various ways..
HOW IT WORKS?
Although data enrichment can work in many different ways, many of the tools used for this goal involve a refinement of data that might include small errors. For example, a common data enrichment process could correct likely misspellings or typographical errors in a database through the use of precision algorithms. Following this logic, data enrichment tools could also add information to simple data tables.
Another way that data enrichment can work is in extrapolating data. Through methodologies such as fuzzy logic, engineers can produce more from a given raw data set. This and other projects can be described as data enrichment activities.
HOW IT CAN HELP YOU?
Every organization has its own unique goals for adding value to its data, but many of the tools for enriching data are universal in their refinement of content and documents to weed out errors and inconsistencies. This is something any enterprise can appreciate. From ensuring the accuracy of algorithms to adding new data to tables to correcting typographic or spelling errors, these tools are designed to improve quality across all data fronts. Some processes are more complex than others, though, and require more complex tools. Your organization may be interested in refining and simplifying your data, for example. Or you may plan on migrating all of your data to a new content management system after removing inconsistencies and errors to improve quality and simplify the accessibility of all your raw data. Whatever your goals, it’s all a matter of developing a strategic plan and implementing the right tools to take you there.
TYPES
The most obvious enrichment example is address correction. When you enter your address on some US e-commerce sites, the site corrects it by standardizing street, city, and state fields, and adding the last four digits of the zip code. ETL vendors tout many possibilities beyond address correction. One lists these types of information that can be added, or “augmented“, to a demographics database, presumably from databases that vendor can provide:
- Geographic: such as postcode, county name, longitude and latitude, and political district
- Behavioral: including purchases, credit risk, and preferred communication channels
- Demographic: such as income, marital status, education, age and number of children
- Psychographic: ranging from hobbies and interests to political affiliation
- Census: household and community data
Enrichment isn’t limited to demographics. Data quality tools like this one allow the definition of rules that integrate into the ETL stream for any data source:
- Matching incoming records with existing data, like identifying to which insured member a claim applies
- Correcting invalid data based on other data in the record, like correcting an out-of-bounds hand-entered measurements based on an independent automated data feed