Imagine trying to cross the ocean with a boat that has holes in it. You’ll get wet. You might even sink. You certainly won’t make it across smoothly.
The chances of this happening are quite small, as any sensible person would thoroughly check their boat before embarking on such an endeavor.
But what about the CRM data your business uses to contact leads, segment customers, and make strategic decisions? Do you ever check if that has holes in it?
Dirty data negatively affects workflows, marketing efforts, and your customers’ experience. It can even get you into legal trouble.
But what exactly is dirty data?
Dirty data, or unclean data, is data that is in some way faulty: it might contain duplicates, or be outdated, insecure, incomplete, inaccurate, or inconsistent. Examples of dirty data include misspelled addresses, missing field values, outdated phone numbers, and duplicate customer records.
When ignored, dirty data can cause serious issues for your business. It can jeopardize the customer experience, lead to the misrepresentation of business results, and negatively impact strategic decisions.
Data can get dirty when it’s entered, stored, or used incorrectly. Oftentimes, this comes down to human error or a lack of standardization rules for data entry, but technical issues can also lead to dirty data.
Duplicate data refers to records that partially or fully share the same information. They come about when the same information is entered multiple times, sometimes in different formats. A typical duplicate dirty data example is when one customer exists in your CRM multiple times. This often happens because the customer’s name is written slightly differently each time.
Because customer information is scattered across different records, duplicate customer data leads to:
Insecure data is data that is not encrypted or access controlled. It’s accessible by anyone in your company and—in worst case scenarios—even by third parties. Insecure data constitutes not just a privacy risk, but also a legal threat as companies risk being non-compliant with laws such as GDPR and CCPA.
An example of dirty data that’s incomplete would be if your newsletter sign-up form has a field for the lead’s first name, but the field isn’t a required field. Leads are then able to sign up without leaving their name, which would render your personalized email campaigns less effective.
Inaccurate data is data that contains mistakes. An example of inaccurate data would be a customer entering their last name on one of your forms, but making a typo. In this case, you have the customer’s last name but it’s inaccurate. It’s a dirty record.
Another example would be if a sales representative logs an incorrect phone number for a lead in Salesforce. In this case, it’s crucial to improve Salesforce data to continue the conversation with this lead.
Outdated data is inaccurate not because it was entered incorrectly, but because it used to be accurate and now it isn’t anymore. A typical example of dirty data that’s outdated is if your CRM still lists a customer’s old address after they’ve moved.
Other examples of outdated data are:
Incorrect data is data that falls outside of previously specified parameters. As such, it is easier to prevent. An example would be if a customer enters their birthdate using a dropdown menu. Your system will likely only allow them to select one out of 12 months, one out of 31 days, and perhaps they also won’t be able to select a birth year that would make them older than 130 years.
Inconsistent data is also known as data redundancy. It occurs when companies store the same information in different places without syncing that information. A prime example would be a company storing customer information both in its CRM and in its email marketing tool.
All of the above types of dirty data create risks for your company, so cleaning data and avoiding these situations is crucial.
Here’s how to process data from dirty to clean:
Before you start to data clean, define what a clean data set looks like for your company and which best practices should be followed to keep your data as clean as possible.
Having a data quality strategy includes defining a way to standardize data as soon as it enters your system. List all the ways you are gathering data right now, what the points of entry are for that data, and how you’ll ensure that all of that data is input in the same way, regardless of the point of origin.
Once you’ve established your company’s data quality rules and are sure that all new data will be entered in a standardized way, it’s time to perform an audit of your existing data. Unfortunately, finding all dirty data is not easy, and while you should aim for 100 percent detection, know that you’re likely to miss some issues. That’s why it’s important to do an audit not just once, but regularly.
One way to make this process easier is to continuously gather feedback from the various departments within your company that work with data. This type of feedback shows you where dirty data is causing issues in day-to-day activities.
An example: Your marketing team shares that it has spotted how first names in personalized emails sometimes lack capitalization. This tells you that first name values are not always formatted in the same way—probably because email subscribers don’t always bother capitalizing their own names.
Once you have an overview of your dirty data, start the cleaning process. Data cleansing can be a gruesome, time-consuming task. There are different ways to go about it, each with its own pros and cons.
Manually cleaning data should only be done sparingly. It’s okay to clean up a record you need to use right now, but manually cleaning all data your company owns is an impossible task.
Not only would it take forever, but you’re also bound to miss things and make mistakes, causing even more errors.
Using Excel formulas can speed up the cleaning process, but it’s still quite manual. You need to build the formulas yourself, and some data issues might be too complicated to solve with an Excel formula.
On top of that, Excel can’t handle massive sets of data, so you’d have to work in bits and pieces, taking note of which data sets you’ve already cleaned.
Lastly, you’re forced to upload static data sets into Excel. When you import customer data on Monday, it’s likely already outdated by Friday.
If you don’t want to allocate internal time to your data cleanse, hiring a data consultant can be a good option. Data consultants are specialists who do more than just clean up your dirty data. They can also run an audit for you and help improve your existing data processes so there’s less chance of dirty data being created in the future.
The downsides to hiring consultants include the high costs and the fact that you’ll likely have to give them access to all of your data, which may lead to some privacy concerns.
As data management is an ongoing project, you could hire one or more developers who dedicate themselves fully to keeping your data clean. Since these people will work in-house, they’ll likely be more loyal to your company than an outside consultant would be, and they’ll be able to become more familiar with your offer.
Plus, hiring someone for an ongoing project such as data maintenance is often cheaper.
There’s a variety of tools out there that help you identify and clean dirty data. These tools are often cheaper than hiring a consultant or a dedicated developer, and they don’t make human mistakes.
However, not all of these tools are created equal. Pick one that can spot data mismatches, check formatting (of dates, for example), and recognize which fields to merge.
You’ll also want to run a few tests on small data samples to make sure the tool works the way it’s supposed to. If you don’t do this and let it loose on your entire database, you risk ending up with larger problems than you started with.
Hopefully, you already have database management in place. If not, it’s high time to set it up. While you’ll likely need to clean your data at regular intervals, it’s bad practice to let issues build up until they undermine the overall quality of your database.
As a company, you are constantly gathering, organizing, storing, and manipulating new data. Ongoing data management includes the processes and practices needed to safeguard the quality of that data and prevent it from getting dirty.
With the volume of data companies gather and handle nowadays, it’s practically impossible to avoid some of that data getting dirty. Different types of dirty data will have different consequences for your business, As such, you’ll want to clean records on a regular basis to avoid issues escalating.
You can clean data manually, use Excel, hire a third party, build an in-house team of data cleaners, and/or rely on specialized software.
Want to learn more?
For a step-by-step guide to cleaning your CRM data, check out our eBook: “The Dirt on Data Quality.”