“We are ready to get started with Hadoop and we know exactly what we need to do…” said no one ever. “We are interested in implementing a Hadoop based solution, but are struggling with how to get started…” is more typical.

You get the strategic value of Big Data (and so do your competitors), but it is ultimately about providing company wide insights that help drive revenue, reduce costs, and generally out compete the competition. Tactically, you get that it’s a major paradigm shift relative to traditional database technologies.

You also realize that this is not necessarily a replacement for your existing investment relational database or EDW, it’s a “+1”. At your disposal is now an Open Source Hadoop ecosystem, such as Hortonworks Data Platform (HDP), enabling you to quickly ingest, aggregate, query, disseminate, and analyze very large sets of what was once unusable data (unstructured/semi-structured, error logs, sensor, etc) from a variety of sources (databases, machines, flat files, instruments, web/app servers, legacy platforms, mobile devices, etc).

Additionally, you understand that Big Data does not have to mean Big $$$. Setting up an on-premise Hadoop cluster on commodity hardware is a relatively minimal investment. The tools are free to download and scaling is a matter of adding additional nodes. Also, Hadoop’s inherent triple redundancy removes the added expense for storage or back-up solutions.

So you get the value proposition and you’re ready to start, but not sure which use case(s) to implement first as your POC. So the question is, “WHERE do we begin?” You are not alone; this question is a typical stalling point and is the most frequent question we hear.

In choosing a use case, the goal should be to show either A) a considerable improvement over an existing activity or process and/or B) real value via a new activity or process that is not available due to cost and technical limitations in the current environment. Give some thought to the following scenarios:

  • Which existing database process or critical query can you improve on? A query that once took hours can be reduced to fractional seconds.
  • What instrumentation sensor data, when analyzed in real-time, will provide warnings or alerts on your shop floor that can prevent injury or increase efficiencies?
  • How can you complement your valuable investment in database, BI and a host of other technology platforms to drive faster and more actionable data analytics and reporting?
  • What if you could influence customer buying behaviors from data collected in real-time, while the customer is still in your store or on your site?
  • Is there social media data that, if available in real-time and actionable, would allow you to better understand brand sentiment and in turn drive more effective product placement and supply, all in real-time?
  • You have valuable historical data in a legacy format or mainframe. Querying, reporting, and analytics are inefficient. What if you can transfer it all into a “data lake” that enables faster and more efficient analytics and reporting?

These are but a few examples. Check out some other real world examples on Hortonworks’ blog. As a Hortonworks Systems Integration partner, eHire Labs is helping organizations navigate these questions towards a path of Big Data enablement and adoption be it an Enterprise, Fortune 500 or Start-up.