Tuesday, September 3, 2013

Big Data Analytics solutions for Online Marketing - Use Case 1

A sample Online Marketing application deployed in the Big Data Architecture, is shown below.



Online users search for products, services, topics of interest etc. not only in Google and other search engines, but also more importantly on site itself (For example, in eCommerce site Amazon.com, search is the top product finding method used by site visitors). Facilitating searchers by providing relevant search results is something online search providers like Google, Bing and also site search providers continuously optimize and calibrate.

From an Online Marketing perspective, once the searchers click through the search results and arrive at the website (if coming through external search like Google) or arrive at the product or topic page they were searching internally on the site, that page of arrival from a search result, called as landing page in Online Marketing terminology, is very important for:
  • Improving Conversion Rate (%) of the site.
  • Traffic dispersion to subsequent stages of the site.
  • Improving site engagement for the users 

As already discussed in a previous post, delivering dynamic and search relevant landing pages is very important, particularly for large websites like eCommerce stores, Music & Movie download sites, Travel websites etc.  While delivering keyword or search relevant landing pages dynamically across thousands of keywords, perhaps across hundreds of thousands of keywords for large websites, itself is a big challenge; even bigger challenge is to deliver these dynamic, search relevant landing pages targeted to each of different user segments. As already discussed previously, luckily Big Data Analytics solutions are available now to solve these Big Data challenges in Online Marketing.

Large websites generate and also need to process, huge volumes of different varieties of data as below:

  • Website clickstream data collected through Web Analytics applications like Omniture and from webserver logs.
  • The website content such as product content, marketing content, navigation etc. in various formats like text, images, videos etc. which is available in the web content management systems.
  • External web content typically collected by web crawlers, which includes content such as
    • Product content from competitor websites
    • Marketing collaterals from external industry websites etc.
  • User generated content such as product reviews, user survey feedback, social media posts, online discussions, tweets, blog posts, online comments, Wiki articles etc.

Most of the above varieties of data are unstructured or semi-structured, and hence cannot be collected and processed in traditional RDBMS databases like Oracle or MySQL.

For large websites, it is not just important to collect large volumes of variety of data as shown above, but it is also important to handle the velocity at which all these data is getting generated online, particularly clickstream data and user generated content.

This is where Big Data Analytics solutions come in. In this above example, a typical Architecture to support Big Data Analytics is solutioned using open source Apache Hadoop framework.  In an Hadoop architecture - big volumes, variety and velocity of online data are collected and then stored in HDFS file system. Hadoop architecture also provides RDBMS like databases such as HBase, for storing big data in traditional style, particularly useful for beginners and new users of these Big Data Architectures. As we can see in this example, a big data landing zone is set up on a Hadoop cluster to collect big data, which is then stored in HDFS file system.

Using Map-Reduce programming method, Online Marketing Analysts or Big Data Scientists or Analysts develop and deploy various algorithms on a Hadoop cluster for performing Big Data Analytics. These algorithms can be implemented in standard Core Java programming language which is the core programming language used for executing various services for collecting, storing and analyses of big data in a Hadoop architecture.  Additional programming languages like Pig, Hive, Python or R can be used to implement the same algorithms with less number of lines of code to be deployed. However code written in any of these additional languages would still be compiled into Core Java code by Java Compilers for execution on Big Data Hadoop Architectures.

Some of the use cases of Online Marketing Algorithms which can be implemented on Hadoop Architecture for deriving Analytics are shown in the same example. All these algorithms are deployed using the Map-Reduce programming method.

  • Keyword Research: Counting the number of occurrences in content and search for hundreds of thousands of keywords across the diverse variety of data collected into Hadoop and stored in HDFS. This algorithm would help identify top keywords by volume, and also the long tail of hundreds of thousands of keywords searched by users. Even new hidden gems among keywords can be discovered using this algorithm to deploy in SEM/SEO campaigns.
  • Content Classifications / Themes: Classify the user generate content and also web content into specific themes. Due to huge processing capabilities of Hadoop Architecture, huge volumes of content can be processed and classified into dozens of major themes and hundreds of sub themes.
  • User Segmentation: Individual user behavior available in web clickstream data is combined with online user generated content and further combined with user targeted content available in web content management systems to generate dozens of user segments, both major & minor segments. Further this algorithm would identify the top keywords and right content themes targeted for each of the dozens of user segments, by combining the output from other algorithms used for Keyword Research and Content Classifications.

Also, since the Hadoop Architecture is running on clusters of computers, all the above algorithms can not only process huge voluminous amounts and varieties of data, but can handle data in motion which keeps coming into the Hadoop Big Data landing zone in near real time. This would enable the Online Marketing Campaigns to be tweaked in near real time to derive better ROIs from Online Marketing spends.  In the example illustrated above, the output from the 3 algorithms running in parallel, is dynamic Keyword Relevant Content Rich User Targeted Landing Pages generated in near real time, for hundreds of thousands of keywords, across dozens of content themes and targeted across dozens of user segments. This output would be integrated with eCommerce platforms or Web Content Management Systems or with Web Portals for creation, production & delivery of Keyword Relevant Content Rich User Targeted Landing Pages in near real time.


Signature: Roopkumar T.V.

No comments:

Post a Comment