Skip to main content

What does WDF*IDF mean?

WDF*IDF is a formula that can be used to calculate how often a term occurs in relation to your own document and "all available" documents on the Internet. WDF*IDF means "within document frequency" * "inverse document frequency", i.e. the frequency of the term in your own document * the number of all available documents in relation to the number of documents containing the term.

In more detail, the first part of the formula (WDF) delivers the following:

i = word

j = document

L = total number of words in document j

Freq(i,j) = Frequency of word i in document j

Explanation of "+1": if Freq(i,j) = 0, the "+1" causes log2(1) = 0 in the numerator. The result is a percentage that indicates the frequency of the term in relation to all terms in the text.

The second part of the formula (IDF) returns this:

where {\displaystyle N_{D}}N_{D} denotes the number of documents and {\displaystyle f_{t}}f_{t} the number of documents that contain the term {\displaystyle t}t. If the document frequency increases, the fraction becomes smaller.

Multiplying both formulas results in a percentage value that indicates how often the entered term occurs in your own text in relation to all available texts. The higher this value, the more relevant the term is (for the topic).

There are tools that perform this calculation for your desired keywords and display the results in a chart. One paid tool is WDF*IDF from onPage.org. A free alternative is https://www.wdfidf-tool.com/, which, however, only offers 100 queries per hour (for all users together).

How does the WDF*IDF analysis work?

The tools (usually) check the first 10 search results that Google delivers for the keyword that is entered as the analysis term. These pages are the data basis and are used as a ratio generator. The tools now determine the frequency of various terms on all these pages. You can also use your own URL for the check. The tools calculate the frequency in the same way and then compare your URL with the database.

The result is a chart, usually a bar chart. Each bar is assigned to a term and its height corresponds to the WDF*IDF value (chart from the free tool):

If your comparison point (yellow) is above the bar curve, it is above the so-called spam line. You should avoid this because search engines might suspect "keyword stuffing" on your site or blog.

The table below provides more precise figures:

You can see that the terms "anchor text", "anchor texts" and "link text" are relevant for the topic "anchor text", whereas "press portals or vserver" are not.

The tool also shows the"competition", i.e. the database:

It is worth taking a closer look at these as"best practice" examples and rebuilding them.

How can I optimize my site with WDF*IDF?

Use this analysis primarily to see whether your terms are above the spam line. If this is the case, you should reduce these terms by omitting them or replacing them with synonyms. Secondly, the analysis is very useful for keyword research. Especially if you want to write a detailed text, you should check whether you cover all aspects (i.e. the most important terms listed here). For example, an aspect for anchor text is also link building or backlink, or optimization or context.

Thirdly, the analysis serves to check whether you are using terms that are already in use frequently enough. The database consists of the top results. ("They must be doing it right") So it is also worth looking at what the "others" are doing better than you. If you're not quite happy with the usual WDF*IDF tools such as those from Ryte, the convenient "Sistrix Content Assistant" may be an alternative. However, the Sistrix modules are associated with considerable costs.

This video was uploaded by the creator of the free WDF*IDF tool and is intended to give you a broader overview of the tool: