Twitter Analytics powered by MongoDB,MapReduce and JQGraph

Posted: April 3, 2011 in Uncategorized

In recent times I have been preaching and sipping a lot of noSQL. The results have been good so far. After several years pure sql driven data oriented design it was always a time for a fresh start! If you want a touch of humor on why noSQL, you may read this link.

The core of VentureAlly[@Ventureally] business reco engine used some  carefully selected and proven technology stack.

I will like to walk through a feature of VentureAlly that was developed over a weekend as a mini-hackathon.

I am sharing this at this time when  social media analysis is still RedHot and some of the concepts used here can be used by the readers to apply similar/derived architectures.

Here we go:

Problem statement: ABC Inc manufactures xyz product and wants to understand how many users like/dislike the product at a certain time.

Solution Provider(VentureAlly)’s problem statement: Capture tweets from the twittersphere on ABC, process the tweets about XYZ and filter the tweets based on understanding of “thumbs up” /”thumbs-down”.

Technology stack : Java, Apache HttpClient, Basic Text mining algorithms that can do lexical matching., MongoDB for data storage and MapReduce , JQGraph to graphically represent the interpreted data.

Overview :

As shown below, we use an application that runs periodically and downloads tweets from Twitter using  twitter search API. It then parses the tweets using standard text mining algorithms that filter out what can be perceived as a  “thumbs up” or a “thumbs down” tweet. The results are then stored in MongoDB.

Twitter Analytics Architecture

A technology stack to analyze tweets

The user loads an analysis dashboard which makes an Ajax call to a Data handler ( e.g a servlet)  which in turn runs a map reduce against the MongoDB data store which contains a collections of filtered tweets.

Here’s a snippet from how the MapReduce part looks like :

public DBObject getTwitterAggregate(long orgid, String keyword) {
DBObject positiveResult=getTwitterAggregate(orgid, keyword, true);
DBObject negativeResult=getTwitterAggregate(orgid, keyword, false);
BasicDBList list=new BasicDBList();
return (DBObject)list;

public DBObject getTwitterAggregate(long orgid,String keyword,boolean assertion) {
//prepare the response;
DBObject responseObj=new BasicDBObject();
responseObj.put(“orgid”, orgid);
responseObj.put(“assertion”, assertion);

DBObject mapkey=new BasicDBObject( “tweetdate” , true ) ; //map
String reduce=”function(doc, prev) { prev.sum += 1}”; //reduce
//apply filter
DBObject cond=new BasicDBObject() ; //map
cond.put(“orgid”, orgid);
cond.put(“assertion”, assertion);
DBObject initial=new BasicDBObject( “sum” , 0 ) ;//assign sum to 0
//fetch response
DBObject x=mongo.getDbTable(TWEETS).group(mapkey, cond, initial, reduce);//run mapreduce
responseObj.put(“results”, x);
return responseObj;


And finally the end result:

Twitter analytics on Consumer brand perception


Pretty simple huh? Drop a line if you want to try this out.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s