Moving Legacy Data into a Warehouse

The IDB wanted to repurpose some of its data assets so that a new generation of users could benefit

The Challenge

More than 20 years ago, the Inter-American Development Bank (IDB) started to build an information system to track national trade and integration. Initial data sets were for trade flows by product; later, it added tariffs and preferential tariffs, tariffs quotas, rules of origin, sanitary and phytosanitary measures, rejected exports and more. As the data sources and data types grew, the system grew more complex and evolved into a loosely-connected hybrid of three databases connected by ad hoc data migrations. The production processes were arcane, redundant and took far too long to complete. The databases themselves contained redundant and inconsistent data, made worse by missing or corrupted fields. In fact, the three distinct databases are connected together by ad-hoc scripts that “glue” the data together as it was needed.

In addition, the system’s front-end was supposed to allow different kinds of users to query the data to obtain results relevant to them. However, the queries were slow and the workflow to complete them was cumbersome to the point of impossible. For example, changing a data filter, such as a country of interest or a product grouping or a year, required the user to restart the query from scratch, resulting in a frustrating and time-consuming process that all too often did not completely deliver the desired outcome. The consequence was that despite the inherent value of the data, few researchers were using it and those who did were barely realizing its potential.

The Solution

The IDB engaged Looi Consulting to evaluate the INTrade data architecture and make recommendations to optimize the system in order to produce interactive visualizations and reports on the fly with simpler yet more effective workflows. Rather than redesign the existing data tables and workflows (an option that was not considered economically viable), we proposed to design and develop a solution to migrate existing data without affecting the stable data curation processes or existing tables.

Based on those recommendations, Looi Consulting was further engaged to design and develop the following:

  • Data Warehouse with the proper fact tables, views and dimensions to serve the correct data on-demand.
  • ETLs to move and transform the data from the three databases to the Data Warehouse.
  • Web services/APIs to produce JSON objects that can be consumed by applications to produce interactive visualizations and reports.

A subsequent project then leveraged these changes to perform more engaging and fully interactive experiences, providing users with better ways to explore and compare the data.

Capabilities

Looi Consulting was a unique consultancy in that it provided strategy-level advice as to how to evolve and manage the IDB’’s data assets, but didn’t stop there. In addition, we developed a plan to realize it and then implemented it from prototype to production.

Our demonstrated capabilities included:

  • Data Management Consulting: Broad and deep understanding of client business and technical problems.
  • Data Modeling: Capturing business problems in data and designing solutions.
  • Data Migration: Use of Microsoft’s SSIS; management of semi-structured data.
  • Data Systems Planning and Architecture: Data cleansing and Data extraction.
  • Data Warehouse: Design and development.
  • ETL, ELT and related data management skills: Enterprise data warehousing design.

Timeframe

The initial engagement for discovery, architecture and design lasted 4 months. Design and build-out of the first phase of the Enterprise Data Warehouse took 8 months; it has been successfully deployed and continues to be improved. Today, it supports the work of economists, policy advisors, governments and investors throughout the Americas.

%

Query Time Reduction

%

Redundant Data Eliminated

Bad Data Records Removed

Simplified Architecture

Performance Increased 100X

Reduced Query Time

The new Data Warehouse has views and fact tables that combine data from two or the three databases, returning results from requests (via RESTful APIs) in a fraction of the time it requires in the original system, from 25% to less than 1% of the original time per request.

Eliminated Redundant Data

The three original databases have a lot of redundant data that was maintained for consistency. When we ETLed the data we made sure to create only the necessary fact tables, views and dimensions needed to get the desired request, eliminating the 75% of the redundant data.

Discovered and Removed Bad Data

We identified and cleaned 60,000 bad and duplicate records.

Simplified Architecture

The architecture in the new Data Warehouse is easier to maintain because the data ETLed from three separated databases has been organized into a single streamlined repository.
  • Query Times 1% 1%
  • Data Size 25% 25%

Impressive Results

By implementing a new Data Warehouse and ETLs of legacy data, Looi Consulting was able to dramatically simplify the increasingly fragile and complex system of databases. This improvement didn’t impact production or the on-going workflows and yet provided breakthrough value for the consumers of the data. They were suddenly able to generate visualizations and gain insights that were simply unattainable because the data was tied down by the system itself. Looi Consulting understood the full range of business and technical complexities and implemented solutions that met stakeholders’ needs.