- ashan3112
Get XML files to Oracle from S3 Using AWS Glue
Utilizing AWS Glue to get XML files from S3 to Oracle RDS
S3 Bucket:
- Create S3 bucket
- Upload sample xml file
IAM Role:
- CREATE IAM role that has sufficient access to your database, S3, AWS Glue
CREATE S3 VPC Endpoint:
- Go To VPC
- Click Endpoints from the left panel.
- Click Create Endpoint
- Find com.amazonaws.us-east-1.s3 and select the radio button.
- Select the appropriate route table to allow access to the subnets to access this new endpoint
- Click on Create Endpoint
- Close
Glue:
Add Tables:
- Go to AWS Glue and click on Tables from left panel under Databases
- Click on Add table using crawler
- Enter the name of the crawler as xml-oracle-crawler
- Click Next
- Select Data Source for Crawler source type
- Click Next
- Choose a data store to S3
- Select specific S3 bucket.
- Click Next
- Select add another data store to No.
- Click Next.
- Choose create IAM role radio button and enter a suffix for Role name or choose the one if already created.
- Click Next
- Set appropriate frequency.
- Click Next
- Click on Add Database button
- Add database name as oracle connection
- Expand Description and location (optional) and add database endpoint(in location field)
- Click Create
- Expand Grouping behavior of S3 data and Configuration options and set appropriate properties there as per client need.
- Click Next or Create.
Database => Connection
- Click on Connection from the left panel under databases
- Click on Add Connection
- Give a connection name and connection type as Amazon RDS
- Choose database engine as Oracle EE
- Click Next
- Choose the instance as appropriate from the suggestion dropdown
- Enter database, root user and root user pwd.
- Click Next and then Finish
:: Test the connection
- Select the connection that we just created
- And click on test connection button on the top.
- Choose appropriate IAM role that has sufficient access.
- Click Test (It will take a few moments to complete the process)
Create Job:
- Click on Jobs from the left panel under ETL
- Click on Add Job button.
- Give appropriate name to the job.
- Select appropriate IRM role (Amazon Glue Role)
- Keep the type Spark
- Keep Glue version to Spark 2.4, Python 3(Glue version 1.0)
- Select This job runs option as => A proposed script generated by AWS Glue
- Keep Script file name, S3 path where the script is stored and Temporary directory value as it is.
- Click Next.
- Select a data source we created in Add Table section.
- Click Next
- Choose a data target as Create tables in your data target
- Select Data Store as JDBC
- Select Connection
- Add database name.
- Click Next.
- Click Save Job and Edit script
- You can run script manually here to check if it is working fine. It will take 10 to 15 minutes to finish the execution.
Create Trigger:
- Go to AWS Glue
- Click Triggers from left panel under ETL.
- Click on Add Trigger
- Give a trigger name
- Select trigger type as schedule
- Click Enable Trigger on creation checkbox
- Click Finish