Introduction
============

WordCount hadoop example: Inserts a bunch of words across multiple rows,
and counts them, with RandomPartitioner. The word_count_counters example sums
the value of counter columns for a key.

The scripts in bin/ assume you are running with cwd of examples/word_count.


Running
=======

First build and start a Cassandra server with the default configuration*. Ensure that the Thrift
interface is enabled, either by setting start_rpc:true in cassandra.yaml or by running
`nodetool enablethrift` after startup.
Once Cassandra has started and the Thrift interface is available, run

contrib/word_count$ ant
contrib/word_count$ bin/word_count_setup
contrib/word_count$ bin/word_count
contrib/word_count$ bin/word_count_counters

In order to view the results in Cassandra, one can use bin/cqlsh and
perform the following operations:
$ bin/cqlsh localhost
> use cql3_wordcount;
> select * from output_words;

The output of the word count can now be configured. In the bin/word_count
file, you can specify the OUTPUT_REDUCER. The two options are 'filesystem'
and 'cassandra'. The filesystem option outputs to the /tmp/word_count*
directories. The cassandra option outputs to the 'output_words' column family
in the 'cql3_wordcount' keyspace.  'cassandra' is the default.

Read the code in src/ for more details.

The word_count_counters example sums the counter columns for a row. The output
is written to a text file in /tmp/word_count_counters.

*It is recommended to turn off vnodes when running Cassandra with hadoop. 
This is done by setting "num_tokens: 1" in cassandra.yaml. If you want to
point wordcount at a real cluster, modify the seed and listenaddress 
settings accordingly.


Troubleshooting
===============

word_count uses conf/logback.xml to log to wc.out.

