Some thoughts on comparable companies analysis and machine learning

I’ve been reading up on investment banking lately for work, also out of interest. The first chapter in Investment Banking: Valuation, Leveraged Buyouts, and Mergers and Acquisitions by Joshua Pearl and Joshua Rosenbaum talks about comparable companies analysis, also called trading comps. It’s one way that investment bankers go about valuing private companies that are about to go public, or those that may merge or be acquired by another company, for example.

Automating comparable companies analysis

The process seems fine, it involves data collection, then essentially comparing the “target” (the company you’d like to value) to those in a “comparables universe.” It’s all manual work, with the most difficult part being amassing the universe of comparable companies. The rest of it essentially involves simple addition, subtraction, multiplication, and division. And it makes me think… why can’t we just automate it with machine learning?

A quick Google search for “comparable companies analysis machine learning” returns an article from Feb 21, 2019, by Morningstar, with them touting their new “Equity Comparables” tool, which uses machine learning to value companies. Why has it taken until 2019 for something like this to come up? It seems like a no brainer, doesn’t it? I admit that I may not know much about investment banking at all, but I’d love for someone to clarify to me why ML is not used in this context. Or, please clarify that it is.

Let’s focus on automating the seemingly most difficult part: identifying a universe of comparable companies.

Could we define some sort of learning task that allows us to obtain company embeddings? That way, we could obtain the comparable universe by doing a cosine similarity search, for example (see my post on common similarity metrics).

But what would this learning task be?

Embeddings are the goal!


Perhaps we could use graph convolutional neural networks, thereby exploiting relationships between companies and learn embeddings that way. But what would those relationships be? Supply chain relations?

Technical analysis and sentiment embeddings

What if we used technical analysis indicators from a recent time step, and concatenated those with senitment embeddings obtained from recent news stories? Maybe that would work?

Stock prices

We can take weekly stock prices as our “sentences” with our “words” being the rounded to the nearest whole number stock price. Then stocks that have similar prices should have similar embeddings. So we can represent a sentence as a list of stocks sorted by price.

But maybe using percent change instead of absolute price would be more appropriate, so we can find stocks that move together. Perhaps absolute percent change would be interesting? So we could represent a sentence as a list of stocks sorted by the percent change.

I really don’t know

As mentioned, this may be something that’s already done, or isn’t practical to do. If that’s the case, please let me know. I’m very curious.

Also published on Medium.

Leave a Reply

Your email address will not be published. Required fields are marked *