DZone just released earlier this week the new Spark RefCard. Since I made a peer review of it, it is a good time to discuss this topic.
Tuning an Out of a Box Solution
Spark provides an out of the box example that with some tuning, you will get the top ten trending Twitter tags every 10 and 60 seconds.
- Create a new Twitter App or use your existing app credentials at Twitter Apps.
- Download and install Java, Scala and Spark
- Adjust the environment variables :
- Scala home
export SCALA_HOME=/usr/lib/scala - PATH to run scala
export PATH=$PATH:$SCALA_HOME/bin - Add the location of spark-streaming-twitter_2.10-1.0.0.jar, twitter4j-core-3.0.3.jar and twitter4j-stream-3.0.3.jar to CLASSPATH
export CLASSPATH=$CLASSPATH:/root/spark/lib/twitter4j/ - Run the code after tuning some parameters:
- Get into the spark foldercd /var/lib/spark/spark-1.0.1/
- If you are running on a single core machine (or you want to just make sure you will get results and not just "WARN BlockManager: Block input-0-XXXXXXXX already exists on this machine; not re-adding it") change the ./bin/run-example code:
sudo sed -i 's/local\[\*\]/local\[2\]/g' *.txt - Run the example (please remember that you should write the class name, including the streaming., and avoid placing the path, the scala extension or any other fancy stuff:sudo ./bin/run-example streaming.TwitterPopularTags
- The result will be shown after several seconds:Popular topics in last 60 seconds (194 total):
#MTVStars (42 tweets)
#NashsNewVideo (9 tweets)
#IShipKarma (6 tweets)
#SledgehammerSaturday (6 tweets)
#NoKiam (5 tweets)
#mufc's (3 tweets)
#gameinsight (3 tweets)
Bottom Line
Spark is an amazing platform, with some little adjustments you will be able to enjoy it in a few minutes
Keep Performing,
Moshe Kaplan