View the full project on GitHub here.

Background

I initially started this project with the intent to perform some financial analysis on stock prices using Yahoo Finance’s API, but after going down a machine learning rabbit-hole on YouTube, I wanted to incorporate some ideas from the videos I watched. Thus, I decided to upgrade my project a little by also creating a prediction model for stock prices that also utilized sentiment analysis. I wanted to be able to find as much data as possible, so I chose to focus on Alphabet Inc., as they have been in the news lately.

While in my machine learning rabbit-hole, I found a video by Nicholas Renotte about using a BERT model to conduct sentiment analysis. I was so ecstatic to learn that there were pre-trained sentiment models, as I had only trained my own. Funnily enough, I later found that BERT was developed by Google after I had already decided to focus on their parent company, but it was still a funny coincidence.

The Data

I downloaded Alphabet’s stock data and the S&P 500 information from the Yahoo Finance API. As I initially wanted to perform sentiment analysis on news headlines I downloaded data from NewsAPI, but unfortunately was only able to obtain about 100 rows of headlines about Alphabet Inc. due to the limitations of their developer level license. I also could only get data from the past month, which would not be sufficient for my model as I planned on using the last year of data for training.

In a bind, I chose to also download data from Reddit using their API to get more words for my sentiment model. Though I was able to collect more rows, I only had 183 rows of word data, while I had 553 rows of financial data. I’ll discuss how I handled this further down.

Financial Analysis

Once I downloaded the data from Yahoo Finance, I conducted an analysis of their stock and created some measures to calculate the 50-day moving average, 200-day moving average, and Relative Strength Index or RSI. Then I created some visualizations to get a better view at Alphabet’s stock performance.

Sentiment Analysis

Excited to try out BERT, I cleaned and merged both my NewsAPI data and Reddit before creating the model. Since I could only get data for the past month from NewsAPI, I filtered my Reddit data so it would only include post titles after last month.

I appreciated the simplicity of using BERT compared to creating a more manual model. Though, I did plan on doing more research on the model specs in the future as I was curious to see how it came up with the score rating. Once all the titles in dataset were scored from 1-5, I sorted the dataset by the date column to prepare for merging with the financial data frame.

Combining All The Data

Due to my limited amount of articles, my datasets were imbalanced. I chose to to fill the null or blank values in the Sentiment Score column of my data as 3, as I wanted the score to be neutral to help the performance of my model. I also figured that I could later create a model using a different data source to get content for the sentiment analysis and then compare the two to see which is more accurate. Once my data frames were an equal size, I merged them and prepared them for the model.

The LSTM Model

Once I finished combining the data, I created a model for training and passed the data through. My prediction model was ready to go! I plan on tracking the predictions over the next week or month to see it how it compares to the actual stock prices. Stay tuned for more, as I’ll continuously be improving the model.

- Mali

Financial Analysis of Alphabet Inc.

Background

The Data

Financial Analysis

Sentiment Analysis

Combining All The Data

The LSTM Model

Golden Hour Recipes

Contact

Financial Analysis of Alphabet Inc.

Background

The Data

Financial Analysis

Sentiment Analysis

Combining All The Data

The LSTM Model

Will Google Lose Their Monopoly on Search?: Key Takeaways from the February Tech on Tap Panel

Churn Baby Churn

Golden Hour Recipes

Contact