How to create and use ETL endpoint and notebook on Amorphic?

headerImage

info

Follow the steps mentioned below.
Total time taken for this task: 20 Minutes.
Pre-requisites: User registration is completed, logged in to Amorphic and role switched

Generate a public and private key pairs

Run the following command. Hit enter when prompted for 'Enter passphrase' and 'Enter same passphrase again'.

ssh-keygen -t rsa -C your_email@example.com

This will save two files under .ssh of home directory.
- Private key pair name is id_rsa
- Public key pair name is id_rsa.pub.
Copy the contents of id_rsa.pub file.

Create an ETL endpoint

Click on 'ETL' --> 'Endpoints' from left navigation-bar.
Click on ➕ icon at the top right corner.
Enter the following information and click on 'Create'.

{
  "Endpoint Name": "etl_endpoint_<your_userid>"
  "Description": "This is an ETL endpoint for developing scripts in the local environment."
  "Capacity": 2
  "Glue Python Version": 3
  "Auto Terminate": "Yes"
  "Auto Termination Time": "Choose next day same time"
  "Extra Python Libs S3Path":
  "Extra Jars S3Path": "Time Based"
  "Datasets With Write Access": Any Datasets that you want to read
  "Datasets With Read Access": Any Datasets that you want to write
  "Keywords": "ETL, Endpoint"
  "Public Keys": Paste the content of `id_rsa.pub` file
}

Create ETL Endpoint

Once the endpoint is created, Glue Endpoint Status will be 'provisioning' as shown below.

Create ETL Endpoint

Click 🔃 to refresh the status.
It takes approximately 10 minutes time to change the status to ready.
You may click on Edit Endpoint icon to add datasets or extend auto termination time.
Once the endpoint turns to ready status, you will see an Connect tab as shown below.

Create ETL Endpoint

Use Glue Endpoint

Before using the glue endpint, copy id_rsa private key to your home directory and change permissions.
- On Mac or linux, chmod 400 id_rsa
- On Windows, right click on id_rsa file --> 'Properties' --> click 'Edit' to remove other users/groups. Allow full control for owner --> Click apply and OK.
Use Pyspark shell
- ssh -i id_rsa glue@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com -t gluepyspark
Use Spark Scala shell
- ssh -i id_rsa glue@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com -t glue-spark-shell
SSH to EMR Master
- ssh -i id_rsa glue@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com

Create and use an ETL notebook

Click on 'ETL' --> 'ETL Notebooks' from left navigation-bar.
Click on ➕ icon at the top right corner.
Enter the following information and click on 'Create'.

{
  "Endpoint Name": "etl_notebook_<your_userid>"
  "Description": "This is an ETL notebook for developing scripts in the local environment."
  "Keywords": "ETL, Endpoint"
  "Instance Type": "ml.t2.large"
  "Volume Size": 10
  "Endpoint Name ": "etl_endpoint_<your_userid>"
  "Auto Terminate": "Yes"
  "Auto Termination Time": "Choose next day same time"
}

Create ETL Notebook

Once the notebook is created, Notebook Status will be 'Pending'.
Click 🔃 to refresh the status.
It takes approximately 10 minutes time to change the status to InService.
Once the endpoint turns to InService status, you will see a link under Notebook URL tab as shown below.

Create ETL Endpoint

Click on the link to go to a Jupyter notebook.
Choose the kernel needed for your development as shown below.

Create ETL Endpoint

Cleanup

Click on Stop Notebook icon at the top to stop the notebook instance.
Click on 'Delete notebook' to delete etl_notebook_<your_userid>
Go to ETL endpoints and delete etl_endpoint_<your_userid>.

Congratulations!!!

You've learned how to use ETL tools on Amorphic.

info

Generate a public and private key pairs​

Create an ETL endpoint​

Use Glue Endpoint​

Create and use an ETL notebook​

Cleanup​

Congratulations!!!

Generate a public and private key pairs

Create an ETL endpoint

Use Glue Endpoint

Create and use an ETL notebook

Cleanup