Leveraging Unstructured Data - Valuepickr Example

It goes without saying that there is a heck of a lot of unstructured data out there which has the potential to give investors an edge. The only challenge is the lack of time to aggregate all the findings. But as someone looking to learn a thing or two from the massive trend-based investing (trading?) frenzy that we have recently witnessed (think Gamestop), I ran a little experiment.

What does the Valuepickr community as a whole say about stocks that are in vogue? About people who are revered as investors? A quick web scraping algo that I built on Python fetches data on the most discussed, most liked & most viewed topics on Valuepickr. I ran the program on 20 Feb and then again today. The output shortlists the ‘hottest’ topics in this forum between then and now.

1. Topics with the most posts

Malkd’s Core Portfolio 67
Potential wealth creators portfolio: Views Invited 50
Sahil’s Portfolio 45
Apollo Tricoat Ltd(ATL) 43
IDFC First Bank Limited 30

2. Topics with the most views

Hitesh portfolio 11761
Laurus Labs - Can Business Transform to Next Level? 10989
ITC: 9413
Malkd’s Core Portfolio 8706
IDFC First Bank Limited 8206

3. Topics with the most likes

Hitesh portfolio 231
Laurus Labs - Can Business Transform to Next Level? 206
Apollo Tricoat Ltd(ATL) 203
IDFC First Bank Limited 153
Malkd’s Core Portfolio 140

4. Topics with the highest Likes-to-Posts Ratio

Likes to Posts Ratio
Poly Medicure - at an inflection point! 22
AA - Abhishek’s Attic (place to store stuff to clear my head)! 19
Pix transmissions - low profile microcap company 15.5
Multi-Disciplinary Reading - Book Reviews 13.6
Deepak Nitrite 13.5

5. Topics with the highest Likes-to-Views

Likes to Views Ratio
Best Valuepickr contest 2015 0.073170732
Bambino Agro-Horse inside the stable? 0.043478261
Transpek Industry limited 0.041979673
Ion Exchange (India) Limited 0.038834951
Equity Investing as a full time career? 0.037900875

6. Topics with the highest Views-to-Posts ratio

Views to Posts Ratio
Deepak Nitrite 1797.75
Vaibhav Global : Back from the dead 1481
HDFC Life Insurance Company 1176
Poly Medicure - at an inflection point! 1134
Equity Advisory Services in India 1132

I have had this running once every 20 days or so for the last 3 months and it has enabled me to find businesses that I’d have otherwise not found by myself. The community as a whole has the power to guide a trend-following investor in a direction where everyone is collectively headed and this insight is quite powerful.

We can however only really unlock the value of NLP/Web scraping technologies by widening the scope of where data is collected from. Valuepickr forum is just one place. Reddit has a group called Indianstockbets (I think?). Perhaps there are other pockets. But, I am hoping to find out.


How could we benefit from this data, can this data after a while can be used as a screener?

As the members or the active members belong to not one group but to different groups with different points of view and different perspectives. And it is beneficial to follow one topic with active replies or few posts which summarizes a company, or follow one member whose style of investing even if contradicts our school of thought.

But how could this aggregation of topics with most posts, views, likes will lead to anything conclusive because a popular thread with most number of views or posts or likes does not indicate anything but a discussion which could be about 1 company or 10, not necessarily about a company in vogue, for trading purposes.

A topic like ITC which will be in many members’ PF will obviously have more views and posts as an article about the company is written every few days and it gets posted in the thread. And Hitesh sir’s thread obviously has more views and he has more likes as it provides answers to almost all kinds of questions from all of us. It could be as simple as 2 members discussing about a company they have invested in, and not to mention the discussion that happens for days when there is a huge price fall of a stock.

My limited point is as each thread has its own characteristics and discussions, how can the numbers we get, be the views, posts or likes can be used to find a company, as there are no keywords present in the discussion and no charts are presented every time a discussion happens.

1 Like

Oh absolutely. These sets of tables by themselves are not going to lead to an investment decision. But here are the possibilities.

  1. These tables can lead one to a more in-depth study/analysis of a particular topic/stock that comes up. To give you an example, RACL geartech featured on this list in the first week of January. I did not know about it before and eventually got to analysing it. It makes the stock discovery process easier. It is not the process by and in itself.

  2. What I have not done as part of this version of the code is to train the code to actually go in and read the posts themselves. And that is possible. If and when I get that done, a natural language processing technique can evaluate sentiments (positive vs negative). It can do clustering of similar type of discussions/conversations going on. Once again, not the outcome by itself. But another way to make sense of the direction of the conversation.

  3. Taking it one step further, these company/stock-wise discussions can be vectorised into numerical features. And if I can manage to collect data lets say for a year or so and regress those numerical vectors against market returns by each stock, it should hypothetically be possible to determine if discussions in this forum cause a price movement in the market. It should be possible to separate the kind of discussions that cause price movement from the kind that doesn’t. And that model training can be used to predict the future.

The possibilities are endless. This above is just the start.


Will you please also share the results of businesses that you have discovered other than RACL.

Looks like a technofunda system, except the fundamental part is outsourced to a code. And I could sense limitations too, as with any model.

I could say about the 2nd point, all VP members do not articulate the same way so I think it is impossible for any sophisticated code, any language technique to evaluate sentiments from what it read, as there could be limitless interpretations to what people say. Also sometimes the opinions and views expressed and completely subjective, and sometimes there will be polarized views like with ITC or with any stock trading at high PE.

Looks promising if one could get useful and actionable data. Please keep updating with your findings.

Nice effort. Meta observation - In the 2017 bull run, the in-thing was coat-tailing famous investors. In current bull run looks like we are like dogs chasing our own tails. :slight_smile:


Good efforts , is it possible for doing more data scince stuff , specially sentiment analysis on trending companies ?

Vignesh, you have a very structured thought process. Very nice ideas which can be used anywhere and not just in stocks. And as you say the possibilities are endless.

1 Like

One of the biggest benefit of using this forum is, there are several guru’s who share deep insights about companies. The other advantage here is, people review earnings calls and evaluate the managements if they are walking the talk.
Liked your thoughts, would like to share my thoughts. IMHO if you con focus on few things, this group can immensely benefit

  1. You may want to identify the gurus and have small section on guru’s posts, just company names might be OK
  2. Good if your code can identify management that walked the talk
  3. In general, most of the dashboards focus on information. IMHO, great if you could focus on actionable insights
    Lastly, you might need webmaster or administrators permission
    BTW: am sure there are several developers in this group, they may chip in their time. I can volunteer 6 FTEs per month

One other prominent finding was Sandur Manganese. Interesting company.

Disclosure : I do not hold stocks. Not an investment advice.

Welcome to the era of retail investors! :slight_smile:

Thanks - definitely some very good ideas. I will take you up on your offer to help. And anybody else who is able to help out.

For the sake of running this as an experiment once, I have extracted all the posts from Hitesh Portfolio. It is such a rich discussion channel. The attached excel has all the posts, the date these posts were made, the id of the person posting, the number of likes and the post ID.

stocks.xlsx (116.0 KB)
hitesh_portfolio_allposts_19MAR2021.xlsx (1.3 MB)

I have also attached another excel file with the name of all the stocks listed in the stock exchange. One of the first tasks would be to figure out other ‘variations’ in the names of each of these stocks so that a rule-based engine can pick up references of these stocks in the text. For e.g., Ajanta Pharma, Ajanta, ajanta, Ajnta pharma, ajnta phrma etc…

Once we have that done, it is a matter of wading through the data to try and answer questions that we frame. I’d appreciate the forum’s thoughts first on problem statement framing. What is it that we should try and figure out? It needs to be specific enough, but at the same time should generalise well to other data sets/other text data from Twitter/Reddit etc. Here are my thoughts.

  1. At any given point in time, how positive are the discussions around the stock names mentioned?
  2. Correlation between stock forward returns and velocity of discussions in the forum.
  3. Accurately classify a given statement about a stock as positive or negative. Some help is already available on this. Researchgate(dot)net has a financial phrase bank called FinancialPhraseBank-v1.0 which has positive/negative training data for about 15K statements. It can be used to start training the model. But forum discussions are a lot more informal.
  4. How does the opinion of the expert (in our case Hitesh Bhai) influence/change the opinion of the other fellow boarders?

Would be keen to hear more perspectives on problem statement framing before we start working on the data. Happy to move the discussion to Kaggle/Github if there is sufficient interest.

1 Like

Except Sakar, every other stock is picked from seniors conversations. My biggest influencers are Mr.Ayush Mittal, Mr. Sandeep Patel, Mr. Hitesh bhai and others. Immensely benefited

1 Like

Of course, not a recommendation