Student Projects/How to calculate importance score of a webpage

From Wikiversity
Jump to navigation Jump to search

When you have a query regarding something, the first thing you would do is to search in google. How is google able to give you the best results from among all those irrelevant web pages? If you are a regular Wikipedia user, the first result google shows will be a wiki page. How do google know your preferences? Part of magic behind this is google’s Page-rank algorithm. Page-rank as the name suggests rates the importance of each page on the web allowing google to rank pages and thus presenting the user with the most useful pages first.

Let us start with the term importance score / score of a page. The Importance Score (Score) of any page will be always a non-negative number.The score of any page is obtained from number of links to the page called back links to the page. It not only depends on the number of back links to the page, but also on total number of links from the page.

Suppose the web of interest contains n pages, each page is indexed by an integer k, 1≤ k ≤ n. Such a web is an example of a directed graph in which an arrow from a page A to B indicates a link from A to B.Let xk denote the score of page k in the web. Then xk > 0 and xj > xk indicates that the page j is more important than page k. If page j contains nj links, one of which links to page k, then we will boost page k's score by xj / nj .

Let Lk \subset {1,2,...,n}  denote the set of pages with a link to page k, that is Lk is the set of page k's back links.

For each k we have, xk = sum j€ Lk xj / nj

where nj is the number of outgoing links from page j . <references / The $25,000,000,000 Eigenvector: The Linear Algebra behind Google- Bryan-Leise>