Data migration to the Cloud with Talend

Smooth Move – Taking Data to the Cloud with Talend

Organizations are rapidly making the move to cloud computing, with no signs of slowing down. The solution is cheaper and more widely used and, as a result, cross database conversion is becoming prevalent. Let’s look at how you can move data from an on-prem PostgreSQL Database to an Azure SQL Database, leveraging the Talend Open Studio for data integration.

Industry-leading relational database engines are very similar to one another; however, they are not identical in their supported data types, metadata or internal data manipulation capabilities. For example, you might need to extract data from a cloud-based storage for processing on-premises and load back into the cloud. Your evaluation and planning process will highlight the physical limitations inherent to migrating data from on-premises locations into the cloud.

Therefore, it is especially important to evaluate and plan a graceful move from an on-prem storage to your new cloud. To succeed, you will need to adopt the right suite of tools or find a technology partner to help you move your data.

How the transfer works

The complexity of this process is largely dependent on your individual business requirements. Here’s a simple ETL job transferring data between an Azure SQL Database and a PostgreSQL DB utilizing Talend’s Data Integration tool (to follow this process, we assume you are familiar with the basic functionality of the software).

The first step is to initialize your connection to both the PostgreSQL DB and Azure SQL DB. While this may seem obvious, there are steps you’ll need to consider in order to make a successful connection to Azure using this tool. Moving data from Azure with Talend requires the JDBC library. Consequently, Microsoft provides a series of connection libraries for SQL Database and SQL Server. Follow the link provided to download the latest Java Language driver highlighted.

JDBC Library download

Figure 1 – JDBC Library download

Next, make sure this is part of your connection library in the Talend folder, since this is essential to establishing a JDBC connection to Azure.

Now we are ready to make our connection to Azure and PostgreSQL (we are going to safely assume you are comfortable connecting to PostgreSQL). To connect to Azure SQL Database, launch Talend and create your metadata under DB connections as shown below. The available connections in your metadata in Talend should resemble the figure below, one connection for source and target.

Metadata

Figure 2 – Metadata

Create a JDBC Connection to Azure SQL Database

1. Name your DB connection

New Database Connection

Figure 3 – New Database Connection

2. Input the connection strings and settings as depicted in the figures below. Replace with your parameters as required.
DB Type: General JDBC
JDBC URL: jdbc:sqlserver://servername.database.windows.net:1433;databaseName=YourDB;user= YourUserName;password= YourPWD

Driver jar: Point to the Talend connection library folder. To do this, select the eclipse and in the resulting dialogue box, click external modules and navigate to the path where your jar file downloaded above is stored and add it.
Class name: com.microsoft.sqlserver.jdbc.SQLServerDriver

User name: YourUserName Password:YourPassword Schema:YourSchema Mapping file:mysql_id

Select connection module

Figure 4 – Select connection module

 

New Database Connection

Figure 5 – New Database Connection

Now you check the connection.

Test Connection

Figure 6 – Test Connection

As you can already see, there is a difference between how you connect to Azure on Talend other databases. What we have achieved here is an introduction to another powerful component of Talend Data Integration Tool.

From PostgreSQL to Azure SQL DB

We are now ready to move data from our PostgreSQL to Azure SQL DB. For the purposes of this demo, we will only interact with a single table between the source and target. First, drag the components into your workspace as shown in figure 8 below. For your Azure component, drag the connection and select the component to create – tJDBCOutput since it is our target DB. See figure below:

tJDBC components

Figure 7 – tJDBC Components


The Azure component setting should have the following setting. Keep in mind, the target table should already exist in your Azure SQL Database.

Azure component setting

Figure 8 – Azure component setting


Now it’s time to run the job, as the figure below shows the job status and notification.

DI Job

Figure 9 – DI Job


 

Results in Azure SQL DB

Figure 10 – Results in Azure SQL DB

 

Tags> , , , ,

Olu Fadiya

Olu Fadiya

Olu Fadiya is a Senior Consultant, Data Architect at SPR Consulting