A statistical model for a friend from the old times

Recently, in my work, I received an interesting order, or rather an inquiry about the possibility of carrying out an order consisting in the analysis of data from Google Search Console.

Recently, in my work, I received an interesting order, or rather an inquiry about the possibility of carrying out an order consisting in the analysis of data from Google Search Console.

I knew the very concept of the product which is GSC and the general assumptions of what the SEO industry is, but rather in a very superficial way – because in my professional life as an analyst I have never used this type of data.

Also, when I received an email from an old friend from college

Matthew, who runs SEO agency in United Kingdom asked me for quite specific data analysis, Because of the interesting topic I decided to look at this topic.

Currently – at the time of writing this post – I am after about a week of working on the data packages that I received from the SEO agency and I can tell you more or less what these data are about and why and how you can build a statistical model from them.

Well, in general, Google provides information via the GSC about the so-called index, i.e. a copy of our website that it has in its resources and about the way people search for data that are placed on our websites.
Google Index is a huge distributed database in which probably all websites in the world are placed. It is a kind of internet roadmap.

Of course, Google collects data on the use of this index – that is, the searches that Internet users will use every day.
These are very large collections containing information about the user’s country of origin, the device he uses, and the phrase he enters in the search engine and the overall number of hits and clicks in the search results.


There are several important indicators such as CTR that is a acronym for the click-through rate, and information about the average position in the search results is provided.

These are aggregated data for the website owner who must prove that the website belongs to him. We do not have access to individual people, only aggregated values. They are aggregated by websites URL or entered phrases (search keyword used by user)

It is actually a search log of a given page on the internet through the Google search engine.

Depending on the size of the website, the number of visitors, this data may be several or several hundred megabytes within, for example, one month

In general, a company that provided me with samples of data from the last three months aggregating Month by month and asked me to analyze indicators such as CBR and average page position

The main task I received from the Matthew was to develop a statistical model for data from Google Search Index.

The development of a statistical model obviously requires an analyst to know the structure of the data of mutual connections in order to demonstrate, For example, to identify correlation or inverse correlation between to sets of data.

In some cases we use interpolation techniques to find the proper mathematical functions that we can “guess” a given value for missing period of times or predict future values.

I am currently at the stage of preparing the model and the task seems to me to be very interesting and ambitious because from what I have learned, this type of model can be used to predict the activities that SEO agencies perform on individual phases and landing pages and customers.

Those data are crucial for further search engine optimization and Internet Marketing strategies.

So I am happy and willing to work on this order and I will soon share with you the results of my research and my work, the agency with which I cooperate assured me that if the model turns out to be effective in prediction, it will be applied to other clients of this seo agency

At the moment, the project is rather scientific-research as a research and development project and is not yet intended for commercial use, only after testing its effectiveness, it will be possible to decide whether the classical statistical approach to the analysis of data from the Google index is effective

Of course, it may turn out that modern methods, such as in neural networks or machine learning, will work more effectively in this model, but not in the first trials and analyzes they seem to be promising.

One thought on “A statistical model for a friend from the old times

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: