Search
  • ashan3112

Get XML files to Oracle from S3 Using AWS Glue

Utilizing AWS Glue to get XML files from S3 to Oracle RDS

S3 Bucket:

- Create S3 bucket

- Upload sample xml file

IAM Role:

- CREATE IAM role that has sufficient access to your database, S3, AWS Glue

CREATE S3 VPC Endpoint:

- Go To VPC

- Click Endpoints from the left panel.

- Click Create Endpoint

- Find com.amazonaws.us-east-1.s3 and select the radio button.

- Select the appropriate route table to allow access to the subnets to access this new endpoint

- Click on Create Endpoint

- Close

Glue:

Add Tables:

- Go to AWS Glue and click on Tables from left panel under Databases

- Click on Add table using crawler

- Enter the name of the crawler as xml-oracle-crawler

- Click Next

- Select Data Source for Crawler source type

- Click Next

- Choose a data store to S3

- Select specific S3 bucket.

- Click Next

- Select add another data store to No.

- Click Next.

- Choose create IAM role radio button and enter a suffix for Role name or choose the one if already created.

- Click Next

- Set appropriate frequency.

- Click Next

- Click on Add Database button

- Add database name as oracle connection

- Expand Description and location (optional) and add database endpoint(in location field)

- Click Create

- Expand Grouping behavior of S3 data and Configuration options and set appropriate properties there as per client need.

- Click Next or Create.

Database => Connection

- Click on Connection from the left panel under databases

- Click on Add Connection

- Give a connection name and connection type as Amazon RDS

- Choose database engine as Oracle EE

- Click Next

- Choose the instance as appropriate from the suggestion dropdown

- Enter database, root user and root user pwd.

- Click Next and then Finish

:: Test the connection

- Select the connection that we just created

- And click on test connection button on the top.

- Choose appropriate IAM role that has sufficient access.

- Click Test (It will take a few moments to complete the process)

Create Job:

- Click on Jobs from the left panel under ETL

- Click on Add Job button.

- Give appropriate name to the job.

- Select appropriate IRM role (Amazon Glue Role)

- Keep the type Spark

- Keep Glue version to Spark 2.4, Python 3(Glue version 1.0)

- Select This job runs option as => A proposed script generated by AWS Glue

- Keep Script file name, S3 path where the script is stored and Temporary directory value as it is.

- Click Next.

- Select a data source we created in Add Table section.

- Click Next

- Choose a data target as Create tables in your data target

- Select Data Store as JDBC

- Select Connection

- Add database name.

- Click Next.

- Click Save Job and Edit script

- You can run script manually here to check if it is working fine. It will take 10 to 15 minutes to finish the execution.

Create Trigger:

- Go to AWS Glue

- Click Triggers from left panel under ETL.

- Click on Add Trigger

- Give a trigger name

- Select trigger type as schedule

- Click Enable Trigger on creation checkbox

- Click Finish


1 view0 comments

Recent Posts

See All