Title: | Sentiment Analysis Scorer |
---|---|
Description: | Sentiment analysis is a popular technique in text mining that attempts to determine the emotional state of some text. We provide a new implementation of a common method for computing sentiment, whereby words are scored as positive or negative according to a dictionary lookup. Then the sum of those scores is returned for the document. We use the 'Hu' and 'Liu' sentiment dictionary ('Hu' and 'Liu', 2004) <doi:10.1145/1014052.1014073> for determining sentiment. The scoring function is 'vectorized' by document, and scores for multiple documents are computed in parallel via 'OpenMP'. |
Authors: | Drew Schmidt [aut, cre] |
Maintainer: | Drew Schmidt <[email protected]> |
License: | BSD 2-clause License + file LICENSE |
Version: | 0.1-6 |
Built: | 2024-12-31 05:02:09 UTC |
Source: | https://github.com/wrathematics/meanr |
Sentiment analysis is a popular technique in text mining. Roughly speaking, the technique is an attempt to determine the overall emotional attitude of a piece of text (i.e., positive or negative). We provide a new implementation of a common method for computing sentiment, whereby words are scored as positive or negative according to a "dictionary", and then an sum of those scores for the document is produced. We use the 'Hu' and 'Liu' sentiment dictionary for determining sentiment. The scoring function is 'vectorized' by document, and scores for multiple documents are computed in parallel via 'OpenMP'.
Drew Schmidt
Returns the number of cores + hyperthreads on the system. The function
respects the environment variable OMP_NUM_THREADS
.
meanr.nthreads()
meanr.nthreads()
The number of cores + hyperthreads on the system (an integer).
Computes the sentiment score, the sum of the total number of positive and negative scored words. The function is vectorized so that it will return one row per string. The scoring function ignores (upper/lower) case and punctuation.
score(s, nthreads = meanr.nthreads())
score(s, nthreads = meanr.nthreads())
s |
A string or vector of strings. |
nthreads |
Number of threads to use. By default it will use the total number of cores + hyperthreads. |
The scoring function uses OpenMP to process text in parallel.
The function uses the Hu and Liu sentiment dictionary (same as everybody else) available here: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
A dataframe, consisting of columns "positive", "negative", "score", and "wc". With the exception of "score", these are counts; that is, "positive" is the number of positive sentiment words, "negative" is the number of negative sentiment words, and "wc" is the wordcount (total number of words).
Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. National Conference on Artificial Intelligence.
library(meanr) s1 = "Abundance abundant accessable." s2 = "Banana apple orange." s3 = "Abnormal abolish abominable." s = c(s1, s2, s3) # as separate 'documents' score(s, nthreads=1) # as one document score(paste0(s, collapse=" "), nthreads=1)
library(meanr) s1 = "Abundance abundant accessable." s2 = "Banana apple orange." s3 = "Abnormal abolish abominable." s = c(s1, s2, s3) # as separate 'documents' score(s, nthreads=1) # as one document score(paste0(s, collapse=" "), nthreads=1)