- ashan3112
Database Replication: AWS Database Migration Service
In August 2016, Amazon Web Services released Database Migration Service (DMS). DMS is database replication software focused on making it easier to migrate data from a source database to a target destination like a data warehouse or data lake(within AWS).
What is AWS Database Migration Service?
The service supports migrations of data from multiple sources via a few different replication methods. Oracle to Oracle, Oracle, or Microsoft SQL Server to Amazon Aurora and Redshift. The product also supports MySQL data replication as well as Postgres, and others.
In addition to replication data from a database, AWS DMS allows you to continuously replicate your data with high availability and consolidate databases to cloud warehouses like Amazon RDS, Amazon Redshift, or object storage Amazon S3. The S3 destination becomes a perform landing zone for a data lake.
Types of Database Replication: One time, On-going
Typically, you will either be doing a one-time migration or continuous data replication. In the case of a unique one-time replication, you may undertake this process to do a seed replication to a new system for testing or production.
In the case of continuous replication, you may do this process on a schedule, such a nightly job, or undertake near real-time replication. Ongoing near real-time replication does have specific configuration requirements for read/write access to the source system.
If you only have read access to the source database, this will require an alternate replication pattern. Why? Most real-time processes will require write operations to the source for replicating updates. This will prevent most services from having the required write access to keep track of changes. Without write access to the source database, you need an alternate replication pattern that can track changes in the source system. An alternative is to switch to less frequent replication tasks.
In many cases near real-time replication may be nice to have, but a scheduled migration task is more than adequate. Scheduled DMS tasks can also be very cost-effective given you only need to pay for instances while running.
Another benefit of the AWS product is the opportunity to automate processing according to your specific requirements beyond the AWS user interface. For example, you can accomplish automation with AWS CLI, CloudFormation, or a third-party solution like Terraform.
DMS and Data Lake Landing Zone
One of the intriguing options and a less obvious use case of DMS is using S3 as a target destination. Using the DMS S3 target destination creates a cost-effective, and high-quality data lake landing zone for exported tables from a source system.
From your source system landing zone, you can create scalable, zero administration data pipelines to data lakes or cloud warehouses like Azure Data Lake, ,AWS Redshift, AWS Athena, AWS Redshift Spectrum.
Database replication software comparison
Up until a couple of years ago, tools like DMS were hard to come by or difficult to employ. The lack of cost-effective and quick setup solutions led several SaaS vendors like Fivetran, Stitch, Alooma, and Openbridge to roll out solutions.
Why did these companies build out solutions? Customers needed to support data replication from a source database like Postgres, MySQL, and others to a cloud warehouse like Redshift or BigQuery for data analytics. Moving data into a data lake or cloud warehouse opened new opportunities to use tools like Tableu, Looker, PowerBI, and others.
So why would you use a Saas tool like Fivetran integrations for data replication over DMS? Today, you likely would not unless you have heavily invested in Fivetran already. Given the emergence of the and the refinement of the product by AWS over the past 12–24 months, it is a go-to offering for replication. The only use case where we still leverage our Openbridge replication tools is for read-only data sources.
So you how much does a SaaS tool like Fivetran cost compared to AWS Database Migration Service? Fivetran costs will range from USD 36K to USD 120K per year. The Fivetran pricing model would likely be 20x more than a base AWS configuration. In fairness to Fivetran, you would never select them just for database replication services alone. The Fivetran cost would be prohibitive for a data replication use case. If you are thinking of Fivetran as a primary solution for replication, you should explore DMS as an alternative first.
What about other SaaS vendors? Fivetran alternatives like Stich also offer replication services. While Fivetran competitors like Stich are less expensive, DMS still affords greater flexibility and cost efficiencies, especially given their pricing model (number of replicated rows).
For replication use cases, this is less about SaaS comparisons like Stitch vs. Fivetran but more about how these services compare to the latest Amazon DMS offering. If you need to continuously replicate a read-only system, feel free to reach out to the Openbridge team for details on our service.
Getting Started
Getting started with the AWS Database Migration Service will require that you create an AWS account, set up a migration process, and associated replication instance(s). In more sophisticated use cases, you may need to employ the DMS schema conversion tool.
As with any system migration of data from one location to another, make sure you have a database migration plan in place. This is critical for testing and signoff of the processes. Without this plan, subtle shifts, gaps, or bugs can corrupt the downstream processing.
Openbridge provides a fully-managed Amazon’s Data Migration Service (DMS) to customers. The typical for our customers is to use the service to deliver data to an AWS S3 landing zone, we then ingest the data into a curated data lake, register everything in a data catalog, and create corresponding tables/views in Athena or Redshift Spectrum.