Data Science

A modern discipline built on a storied foundation for today's enterprises

An Intersection of Disciplines

Data Science is a discipline that combines theories, practices, methods and tools from well-established fields such as mathematics, probability and statistics, computer science and engineering. Some specialized subjects in those fields like machine learning, statistical learning, pattern recognition, visualization, database, data warehousing and artificial intelligence are carefully applied to solve problems in innovative ways. Data Science is used by businesses, governments and other organizations to improve on-going operations, identify strategic weaknesses, improve customer insights, model complex systems and many other applications. Industries such as finance, marketing, manufacturing, retailing, healthcare and others utilize Data Science.

Although Data Science borrows heavily from many fields, it does so in ways unattainable to statisticians and computer scientists just a generation ago. The growth of data intersected with the rise of low cost storage and compute power have enabled a qualitatively different set of business and technical applications.

Most people are familiar with recommendations systems on eCommerce sites, such as Amazon. In principle, simple associations or product extensions can be implemented without reliance on big data. For example, if a buyer places a DSLR camera on a shopping cart, the recommender system might suggest purchasing a lens based on a predetermined set of related products. A more sophisticated system might employ machine learning to learn over the course of many such purchases which recommendations are the most likely to convert into actual sales; it might also learn from the purchase or browsing history of the buyer to recommend even more likely-to-convert products. In this case, Data Science could be used to define such recommendation systems. As the eCommerce site matures and more data becomes available, greater sophistication and complexity could be enabled.

The Challenges of Data Science

The challenges of using data science in the enterprise are numerous, but some of the most common ones include:

  • Is the data set adequate for the problem at hand?
  • What is the appropriate technology to manage the data and to derive insights?
  • Does the data show correlation or causation?

In some cases, enterprises have collections of data and want to learn what secrets may be hidden or just below the surface. On the other hand, enterprises may have a problem to solve but not know if the available data is sufficient. In both cases, a sound knowledge of Data Science, its methods and successful applications, are all highly desirable. As an example, we encountered a situation where legacy data was available in abundance but generating insights difficult; therefore, solutions such as ETL, data warehousing, and visualization were employed. We have more details in a related case study.

Where a problem must be solved, but perhaps the data too sparse, bootstrap approaches involving heuristics techniques may be successful until enough data can be collected.

A Thoughtful Approach

Technology is often the starting point for discussions about Data Science. However, that’s often premature. Better to define the nature of the problem and then evaluate technologies. For example, Enterprise Search is a key Data Science issue, but the specific applications often result in different technologies. We present one case where interesting tradeoffs were made.

Finally, there is still the tendency even among statisticians to blur the distinction between correlation and causation. Data collected in the raw is unlikely to show causation, though it is often seductively tempting to make this connection, especially when our intuitions seem to confirm it. Knowing how to treat the data and its presumptive conclusions is also a skill the Data Scientist must master.

Of course the standard for demonstrating causation is a randomized, controlled test, common in medical and scientific fields. In this respect, Data Scientists are the in the throes of conducting the largest number of randomized controlled experiments in scientific history—these are the well-known A|B Tests or in some cases, Multivariate Tests, that websites are conducting at all times. These techniques draw heavily from statistics and the scientific method.

 

Looi Consulting Assists from Design to Development

Data Science stands on the shoulders of great science; it is inherently multidisciplinary. Its application and practice requires a thoughtful approach that combines technology practice with principles of science. Our consultants can assist enterprises from discovery through design and implementation of solutions that leverage Data Science.

We provide both consulting and staffing services so that firms can get the most benefit from their legacy or new data investments.

 

Contact Us (Data Science)

    captcha   Please enter the characters into the box, below

    LCL helped us develop tools to better present the data we had been collecting for years. In just a few months, they designed and delivered a solution that yielded new insights and allowed us to view the data in ways we couldn’t before. The visualizations they designed have led to new ideas both in presenting our data and for the data products we offer.

    Jeremy T. Harris

    Economist, Inter-American Development Bank