How Does the TF IDF Algorithm Work?

cnxt_dev
cnxt_dev
2019/09/24 20:00

tf idf algorithm

The TF IDF algorithm is something that Google has been using to tank content for quite some time. While visually, the algorithm appears quite complex, it’s more about knowing how it works.

Let’s break the algorithm down into two parts:

TF

TF refers to term frequency. In other words, the number of times a word appears in a document. For example, if a blog of 500 words had to contain the word ‘juice’ 20 times, the TF for the word ‘juice’ is 20/500 = 0.04

IDF

IDF refers to inverse document frequency. In other words, how significant a particular word is in a body of content. For example, if the word ‘juice’ appears 500 times in 1,000 blog posts that form part of a set of 50,000 blog posts, the IDF would be calculated as follows: 50,000/1,000 = 50

Once you put it all together, you would end up with this TF IDF calculation for the word ‘juice’: 0.04 * 50 = 2.

Why You Should Care about the TF IDF Algorithm

The algorithm matters most to content marketers who want to calculate the weight of their keywords. The higher the numerical value of a word, the rarer it is. The smaller the value, the more common the word is.

Once you have the weight of your target words, take the terms with the highest weights and look at their search volumes on the web. You can now select the terms with high search terms and low competition to give your content the best chance of ranking well.

While this is a great way to work smart when it comes to optimising your content, it’s still crucial for your content to make sense to the user if you want it to rank well.