
Sentiments derive stock markets. Which markets will go UP or which security will go DOWN is highly correlated to investors’ overall sentiments.
Investors sentiments are fueled by:
- Economy
- Political Circumstances
- Short & Long-Term Trends
- Inflation
- Liquidity
- Market Insight
- And many more
Can Machine Learning help?
Machine learning contributes to every field of life to maximize potential and returns, But its utilization in the financial analysis is paramount.
In recent years, domains like Robo Advisors for Portfolio Management, Algorithm-based tradings, Fraud Detection, Insurance/Loan Underwriting have emerged as prime implementations of machine learning.
The use of algorithms in computer-based trading was incorporated in the early ’70s to execute trades, but with recent advances in AI have made it possible for programmers to develop programs that can improve themselves over time and get better and better.
So, the question is, can we employ machine learning to predict the stock price?
Sentiment Analysis
One of the domains in machine learning is sentiment analysis. It involves machine learning and natural language processing to classify and score the input text data. It is highly effective in making sense of unstructured data (tweets, emails, comments, etc.).
One way to determine if the consumer’s feedback is positive or negative is to read the feedback yourself, but what if you have thousands upon thousands of feedback to analyze? It becomes humanly impossible. Sentiment Analysis can easily do this job and can even provide you the score at the granular polarity like Very Positive, Positive, Neutral, Negative, and Very Negative.
This capability can, in turn, allows you to optimize and be more efficient in your offering.
One more use case of Sentiment Analysis is in financial analysis, specifically in market insight. We know for sure that market insight heavily influences investors to attitude towards security or a market.
Is it possible to develop a program that can scavenge daily financial news from different news sources, score the news sentiments, and let you know if the share price will go UP or DOWN for the AAPL market opens tomorrow? Let’s see.
The Program
To come up with this small program, I will use Python and VADER sentiment analysis model.
The best way to predict the future is to learn from the past, so I will try to scavenge as many dated news articles as freely possible 🙂
The approach will be:
- Scrape dated financial news related to AAPL security from a news aggregator website
- Extract historical data for AAPL for the same date range for which we have extracted newsÂ
- Prepare the news data for sentiment scoring
- Calculate daily returns based on Adjusted Closing
- Correlate the returns with sentiment scoring
Financial News
It’s easier to find a week or less worth of free news archive, but if you are looking for news for the last month or more, it gets trickier. The only reliable sources are news aggregator websites. Still, most of these financial news aggregators are behind a paywall, and getting the free news articles requires you to be ever so patient.
For scrapping the news, I found a financial news aggregator called wallmine.com. The free offering from them were news archives from September and older, up to nine pages. So I scraped a little over 1200 headlines and summaries for news archives related to AAPL security.

Because they were simply headlines and single-line summaries, I only used the ones that mention Apple or AAPL in the content. Filtering slashed my news dataset from 1200+ to 684.

By this time, I’m 50% sure that our sample is not enough, so the outcomes might not be accurate or might not prove anything. But I continued with it as we are more interested in approach now, and then later, we will go through lessons-learned.
After prepping the news dataset, it looked something like this.

Historical Prices
After the news, now it was time to extract the historical prices. For this, first, I found out the MAX and MIN dates, i.e., the date range of our news archives.

I used these start and end dates to retrieve the historical data for AAPL security using Yahoo Finance.

Performed some data type conversions and dropped the columns which were not relevant to the analysis, as we are only interested in date and adjusted closing.

Performed calculations to extract daily returns based on the last day’s adjusted closing price.

Scoring the Sentiments
Imported VADER libraries and started with scoring the headlines.

Then I scored the news summaries.

I used VADER’s compound scores for both headlines and summary and extracted the average from both to get the news’s overall score.

Our news dataset looks like this now.

Merging Scores and Returns
Our news dataset includes multiple news archives for each date, But our historical price data only contains a single data point per date. We will sum all the SA score for each date and extract a daily news sentimental score.

Now let’s shift the score to the previous day to be compared to today’s return.

Let’s merge the score with returns based on the date.

Replacing scores having NaN with 0.

I am only considering non-zero scores.

Evaluating the findings
The correlation coefficient score turned out to be 0.00794, which is almost 0 and means that our SA score for headline and summaries did not relate to calculated returns.

The plot also did not show any trend.


Conclusion
Though the results were not what I was looking for, there are concrete reasons behind this. Let see what we learned:
- It’s all about data. We lost the results when we ended up with only 684 rows containing AAPL or Apple, out of 1200+ news archives.
- The news content should have more density, rather than having a single line summary. If we have access to full news articles, we might get better scoring from VADER.
- The More, the Merrier is the saying, and it is valid here as well. To come up with a sophisticated and closely accurate prediction, you need access to more data sources. The sources should not be limited to news archives. Your data sources should include press releases, quarterly-yearly reports, blogs, or anything relevant you can get your hands on.
- Market insight through news and reports is not the only factor. You should leverage other data points like rating agency inputs, overall market conditions, current & future events pipelines.
Going back to the question, is it doable? Yes, But you would need a trove of data sources, and you cannot simply rely on news archives.
Let me know what you think about it.
All the work done for this analysis can be found here on GitHub.
Credits to Lucas Liew from algotrading101.com
Leave a Reply