Skip to main content

How to perform Data Quality Checks on Amorphic?

headerImage

info

What are Data Quality Checks on Amorphic?

  • Amorphic data quality checks help you to ‘unit-test’ data to find error early before the data gets fed to consuming systems or machine learning algorithms.
  • Using Amorphic data quality checks, users can create a set of constraints (rules) on columns of structured datasets.
  • When DQ rules are executed, it provides a report of constraints - succeeded/failed.
  • A data quality check execution is considered a failure even if one constraint is a failure.

Amorphic data quality checks page consists of options to list or create a new Data quality check. You can sort through the data quality checks list using entities like name, created by, creation time etc.

Create a new Data Quality Check

To create a data quality check follow the below steps:

  • Goto home -> Datasets -> Data Quality Checks -> ➕ Create a new Data Quality Check
  • Enter the required information and click submit.
  • Make sure to change <your-userid> before clicking submit.
Data Quality Check Name: dq_retail_sales_'<your_userid>'
Description: Perform Data quality check on sales data
Domain: workshop
Dataset Name: retail_sales_transformed_<your_userid>
Auto-Constraint Suggestions Enabled: Yes
Keywords:

Below Image shows an example of constraint types. As we have enabled auto-constraint suggestions, we can pick from the list. Click Create Data Quality Check

Create Domain

  • Once the Data Quality Check is created successfully, click the Play button ▶️ at the top right corner to start the execution.
  • Go to Executions tab. Click refresh 🔄 icon to refresh the status.
  • Below screen will be displayed.

Create Domain

  • It will take 2-3 minutes to finish execution.
  • If the constraints are not satisfied, Data Quality will be updated to Un-Satisfactory.

Create Domain

  • Click on View Results for more details of failure.
  • If the constraints are satisfied, Data Quality will be updated to Satisfactory.

Create Domain

Modify Data Quality Checks

  • Click on Pencil ✏️ icon on top right corner to edit it.
  • One can add or remove constraints as per choice.

Add Data Quality Checks to a Schedule

  • Click on SCHEDULES from left navigation-bar.
  • Create a schedule for the above data quality check to run it based on the requirement.


tip

Create DQC schedule based on the dataset update schedule.