This week I upgraded jargonBot by taking its existing functionality and adding machine learning concepts to its decision making. Previously, the way that jargonBot chose whether or not to define a word was very basic: If the character set was not among the 80,000 most commonly typed in English and it existed in the dictionary, then it would be defined. However, now jargonBot is equipped with the ability to learn. It uses a model called LinearRegression (with the input variables of word popularity, word length, and comment length) to approximate the number of upvotes the definition of a word will receive. If it expects to get more upvotes than downvotes, jargonBot will define that word. It then remembers the comment that it made, and checks back on it later - updating the model with the number of upvotes the comment actually got. To improve the bot’s interactivity and allow it to get more information from each comment, the bot also goes through all replies to its comment. Using a “sentiment analyzer,” which attempts to determine if the overall sentiment of a phrase is positive or negative, it adds or subtracts the replies score respectively. For example, if an entirely positive reply (i.e. “this bot rocks!”) to the bot got five upvotes, it would add five upvotes to its score for that comment. I was lucky enough to find a sentiment analyzer that does not only return 1 or -1 (positive or negative respectively), but in fact returns a decimal that describes the strength of the message. This allows me to be more precise with my adjustments. For example, if a comment that is relatively neutral but leans slightly negative (“this bot is useful but could be better”) gets 10 upvotes, I can subtract one or two from my score.
Now that jargonBot has the ability to learn, once it starts actively running on a number of subreddits I expect its defining skills to improve. I’m not entirely sure on which timeframe this will occur, but I expect that within the few weeks it runs there will be a noticeable change.