How to perform Data Quality Checks on Amorphic?
info
- Follow the steps mentioned below.
- Total time taken for this task: 10 Minutes.
- Pre-requisites: User registration is completed, logged in to Amorphic and role switched
- A Dataset is created. If you don't have a dataset, click here for instructions
What are Data Quality Checks on Amorphic?
- Amorphic data quality checks help you to ‘unit-test’ data to find error early before the data gets fed to consuming systems or machine learning algorithms.
- Using Amorphic data quality checks, users can create a set of constraints (rules) on columns of structured datasets.
- When DQ rules are executed, it provides a report of constraints - succeeded/failed.
- A data quality check execution is considered a failure even if one constraint is a failure.
Amorphic data quality checks page consists of options to list or create a new Data quality check. You can sort through the data quality checks list using entities like name, created by, creation time etc.
Create a new Data Quality Check
To create a data quality check follow the below steps:
- Goto
home -> Datasets -> Data Quality Checks -> ➕ Create a new Data Quality Check
- Enter the required information and click submit.
- Make sure to change
<your-userid>
before clicking submit.
Data Quality Check Name: dq_retail_sales_'<your_userid>'
Description: Perform Data quality check on sales data
Domain: workshop
Dataset Name: retail_sales_transformed_<your_userid>
Auto-Constraint Suggestions Enabled: Yes
Keywords:
Below Image shows an example of constraint types. As we have enabled auto-constraint suggestions, we can pick from the list. Click Create Data Quality Check
- Once the Data Quality Check is created successfully, click the Play button ▶️ at the top right corner to start the execution.
- Go to Executions tab. Click refresh 🔄 icon to refresh the status.
- Below screen will be displayed.
- It will take 2-3 minutes to finish execution.
- If the constraints are not satisfied, Data Quality will be updated to Un-Satisfactory.
- Click on
View Results
for more details of failure.
- If the constraints are satisfied, Data Quality will be updated to Satisfactory.
Modify Data Quality Checks
- Click on Pencil ✏️ icon on top right corner to edit it.
- One can add or remove constraints as per choice.
Add Data Quality Checks to a Schedule
- Click on SCHEDULES from left navigation-bar.
- Create a schedule for the above data quality check to run it based on the requirement.
tip
Create DQC schedule based on the dataset update schedule.