How to migrate external API data to the Amorphic dataset?
info
- Follow the steps mentioned below.
- Total time taken for this task: 20 Minutes.
- Pre-requisites: User registration is completed, logged in to Amorphic and role switched
Tidbits
- External API connections are used to migrate data from an API endpoint to Amorphic's dataset.
- Usually, these API endpoints are created by AWS API Gateway.
- Only
Basic
authentication is supported. - For this workshop, Let's ingest from a publicly available country-wise
COVID-19
data usingcovid-api.mmediagroup.fr/v1
. This is the code running in AWS Lambda. More details at https://github.com/M-Media-Group/Covid-19-API. - This will fetch data in the JSON format.
Create a source connection
- Click on 'Connections' widget on the home screen or click on
INGESTION
-->Connections
on the left side navigation-bar or you may also click onNavigator
on top right corner and search forConnections
. - Click on a ➕ icon at the top right corner.
- Enter the following details and click on
Create Connection
.
{
"Connection Name": "remote-api-2-amorphic-<your-userid>"
"Connection Type": "S3"
"Description": "Ingest from a publicly available country-wise `COVID-19` data using `covid-api.mmediagroup.fr/v1` to Amorphic. "
"Authorized Users": "Select your user name and any other user names you want to grant permission"
"Keywords": "Add relevant keywords like 'ext-api'. This will be useful for search"
"Version": "1.0"
"API Endpoint": "https://covid-api.mmediagroup.fr/v1/cases"
"API Authentication": "BASIC"
"Method": "GET"
}
Create a target dataset
- Click on 'DATASETS' --> 'Datasets' from left navigation-bar.
- Click on ➕ icon at the top right corner.
- Enter the following information and click on 'Register'.
{
"Dataset Name": "extapi_2_amd_ds_<your_userid>"
"Description": "This dataset is a destination for external API connection remote-api-2-amorphic-<your-userid>"
"Domain": "workshop(workshop)"
"Data Classifications":
"Keywords": "ext-api, covid-19"
"Connection Type": "External API"
"File Type": "Others"
"Target Location": "S3"
"Update Method": "Append"
"Connection": "remote-api-2-amorphic-<your-userid>"
"Enable Malware Detection": "No"
"Enable AI Services": "No"
"Enable Data Cleanup": "No"
}
Setup a schedule
- Click on 'SCHEDULES' from left navigation-bar.
- Click on ➕ icon at the top right corner.
- Enter the following information and click on 'Create'.
{
"Schedule Name": "extapi_2_ds_sched_<your_userid>"
"Description": "This schedule runs every 5 minutes to pull data from an external API to the Amorphic dataset."
"Type Of Job": "Data Ingestion"
"Select Dataset": "extapi_2_amd_ds_<your_userid> | ext-api" <-- Click ↩️ icon to refesh the list
"Keywords": "your_userid, ext-api"
"Allocated Capacity":
"Schedule Type": "Time Based"
"Schedule Expression": "rate(5 minutes)"
}
Check data transfer
Execution Status
tab of the schedule shows the status of executions as shown below.
- Hover on the message icon ✅ to check the job status.
- For more details, click on 'three dots' and check output logs.
- Check the
files
tab of dataset. The latest COVID-19 data in the JSON format is migrated here.
Disable schedule
- You don't want to keep running the schedule forever. This will reduce the load on the API.
- Click on the
Disable Schedule
icon of the schedule page. - Click
Yes
.
You can do more...
- Analyze JSON data in an ETL job to get insights from covid-19 data.