Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/Descriptive statistics of the web graph

From Wikiversity
Jump to navigation Jump to search

Descriptive statistics of the web graph

Learning goals

  1. Know terms like Size and (unique) volume
  2. Be able to count the in and out degree of web pages
  3. Have an idea what kind of law (in & out) degree distributions follow
  4. Know that degree is not distributed in a fair way
  5. Know that the Gini coefficient can be used to measure fairness

Video

Script

the slide deck can be found at File:Descriptive statistics of the web graph.pdf

Quiz

1 having a random web crawl which of the following statements would you expect to be true?

the highest indegree would be smaller than the highest outdegree
counting the anchor-tags on one html document gives the indegree of the node representing this document
in degrees can be exactly counted
the indegree and outdegree distribution will take the same values.

2 Wich statements with regard to the gini coefficient are true?

high values mean that the measured distribution is not very equal
low values mean perfect equality
the gini coefficient can take values between 0 and infinity
the gini coefficient can take values between -1 and 1
the gini coefficient can take values between 0 and 1


Further reading

  1. tba

Discussion