NLP Data Sets

Finegrained Sentiment Dataset

The finegrained sentiment dataset contains 294 product reviews from various online sources manually annotated with sentence level sentiment by myself and Ryan McDonald. The data is approximately balanced with respect to domain (books, dvds, electronics, music, videogames) and overall review sentiment (positive, negative, neutral).

The dataset and annotation scheme is described in more detail in [1].

Feel free to use the data set for any friendly purpose, but please cite [1] if you use it for any of your publications.

Any comments and suggestions are welcome by e-mail to Oscar Täckström: 

Downloads

The data set can be accessed in tar.gz format: finegrained.tar.gz. The compressed file contains a file with the data set and a readme describing the data format and encoding.

References 

The data set was used for the following publication:

[1] Oscar Täckström and Ryan McDonald. Discovering fine-grained sentiment with latent variable structured prediction models. European Conference on Information Retrieval (ECIR 2011), Dublin, Ireland, 2011.

A longer version of this paper with more details on the data and the annotation is available as the SICS Technical Report T2011:02.