Skip to content

Breaking Analytics Out of the Box - Literally

The lines between open source and commercial products are blurring rapidly as our options for building and executing analytics grow by the day. The range of options and price points available today enable anyone from a large enterprise to a single researcher to gain access to affordable, powerful analytic tools and infrastructure. As a result, analytics will continue to become more pervasive and more impactful.

Author’s note: I typically avoid mentioning specific products or services in my blogs. However, it is unavoidable for this topic. While I will make mention of a number of my company’s offerings here to illustrate specific examples of the themes, the themes themselves are broad and industry-wide.


Given the cost and overhead, it used to be that organizations would have to make an either/or choice when it came to selecting data platforms and analytical tools. Even with the advent of the open source movement, common opinions espoused either avoiding open source altogether or migrating completely to open source options. This either/or choice was a false one time has shown. As it turned out, most organizations now utilize a mixture of open source and commercial products to achieve maximum effectiveness.

From a platform perspective, large organizations typically are now making use of enterprise class commercial products alongside open source products and leveraging each to its strengths. At Teradata, for example, we have our Unified Data Architecture (UDA) which provides a single, scalable environment for performing analytics whether data sits in Teradata, Hadoop, or a number of other platforms. Users no longer have to be aware of which platform the data is sitting on - they only need to worry about their analytic logic and let the system handle the allocation of processing. If you ask them if they’re using open source or commercial platforms, the best answer is a simple “Yes”! The right answer is “both/and” not “either/or”.


We are also seeing a broad expansion of options for executing analytic logic as various tools and platforms now allow users to access, if not directly execute, code and processes from other tools and platforms. For example, many people still think that the open source analytic toolset R must be executed within a dedicated R environment. However, this is no longer true. R can now be executed from within any number of other tools and platforms including a Teradata database through Teradata R, a Teradata Aster Analytics environment through Teradata Aster R, SAS, and Hadoop. In a similar twist, Teradata Aster Analytics is now available to run directly on a commodity hardware Hadoop platform. So, you can submit R code to Aster running on Hadoop! Mix and match at will.

The power in this is that one piece of code written in R can be submitted for execution to a wide range of places with little to no change to the code itself. Why does this matter? Teradata Aster R, for example, allows parallel and scalable execution of R code. It also allows the mixing and matching of R with non-R code so that you’re able to utilize the right mix of code for your specific needs. The same code will work the same in various environments. Once again, it isn’t either/or but both/and.


Another disruptive force is the cloud. Just a few years ago, if you wanted to test new analytics against a large new set of data, the investment would have been significant. It was necessary to first buy enough storage and processing capacity to handle the data and license the right software before you could even begin. Of course, those expenses greatly limited the ability of small organizations to even get started and stopped large organizations from exploring all but the most important new ideas.

Now with the cloud, it is possible to quickly set up whatever platform you need, rent the right software, and get started. You’ll only pay for what you use. This means that if an experiment goes amiss, the cost for the failure is affordable. The cloud truly takes the whole concept of analytic sandboxes or data labs to another level of affordability and ease of access.

Of course, lines are blurring here as well. Major commercial products are enabling themselves to be accessed within a cloud environment. For example, both Teradata and Teradata Aster are now available on Amazon Web Services (AWS). If you want to make sure something will work as you hope before investing, test it on the cloud. Then, simply migrate your processes to your internal environment if you decide to go that way. It is yet another both/and rather than either/or.

By the way, many people still think processes have to be either cloud based or local. Even that is no longer an either/or as new approaches like Teradata’s Hybrid Cloud allow a single process to span cloud and on-premise environments. Yet another both/and win.


In order to maximize the effectiveness of analytic initiatives, organizations need to make use of a wide range of options. Between the tight integration of different platforms, the ability to utilize software and code in many environments, and the affordability and flexibility of the cloud, a new era is upon us. You’ve got many options and no longer have huge up-front investments in hardware boxes while being locked into one toolset.

Technology has changed. It is now reasonable and affordable to take advantage of the both/and options available today. Push back on anyone who claims you need to make an either/or decision. You can cost effectively drive your analytics from discovery to deployment while making use of a breadth of world class options that was not just unaffordable, but unthinkable, even a few years ago.