# Web Science/Part2: Emerging Web Properties/MoocIndex/Quizes

this is a collection of quiz questions for part 2 of the Web Science MOOC until the part is not completely structured we need to collect the questions here.

## ideas for exercises

• generate small world graphs with 10, 100, 1000, 10000 and 100000 nodes and plot the diameter of them.
• plot the degree distribution on wikipedia
• plot the word distribution on wikipedia
• calculate different similarity measures on wikipedia articles
• fill out a diagram that has no description.

1 Given this in-degree distribution:

0 1 2 3 4 5 6 7 8 9 10
0 1 4 3 6 8 4 5 2 3 0

How many nodes are in this network?

 9 11 36 55 184

2 Given this in-degree distribution:

0 1 2 3 4 5 6 7 8 9 10
0 1 4 3 6 8 4 5 2 3 0

How many edges are in this network?

 9 11 36 55 184

3 Why is an in-degree distribution of Web pages different from an out-degree distribution of Web pages?

 They are not different, but the same. Large numbers of out-degrees are less likely to occur than large numbers of in-degrees Large numbers of in-degrees are less likely to occur than large numbers of out-degrees

4 A logarithmic function, y=log x, appears as a straight line in...

 a standard diagramme a log plot a log-log plot an exponential plot no kind of diagramme

5 A complex polynom, e.g. y=3+x^1.5+x^7+x^11, appears as a straight line in...

 a standard diagramme a log plot a log-log plot an exponential plot no kind of diagramme

6 Which of these statements are true?

 probabilities represent how often events have been observed in an experiment a histogram represents how often events have been observed in an experiment the values of a probability distribution add up to 1 the values of a histogram add up to 1

7 given a plot with a logarithmic y-axis which of the following is true?

 a curve that appears as a straight line in this plot represents a linear function a curve that appears as a straight line in this plot represents a logarithmic function going from one numerical axis label to the next one can be achieved by multiplying the first label with a constant number going from one numerical axis label to the next one can be achieved by adding a constant number to the first label

8 have a look at the following plot of functions ${\displaystyle f}$ and ${\displaystyle g}$. Which of the following statements is true?

 ${\displaystyle g(x)=c*f(x)}$ ${\displaystyle g(x)=f(x)+f(x)}$ ${\displaystyle g(x)=f(x)*f(x)}$ ${\displaystyle g(x)=f(x)+c}$

## modeling text in a vector space

1 You have 10 documents consisting of 20 words how many base vectors do you need to represent the documents using TF-IDF scores?

 10 (as many as documents) 20 (as many as words) 30 (number of documents plus number of words) 200 (number of documents times number of words)

2 map the following formulas to the metrics they represent

Tanimoto Coefficient Cosine Similarity
${\displaystyle {\frac {A\cap B}{A\cup B}}}$
${\displaystyle {\frac {A\cup B}{A\cap B}}}$
${\displaystyle {\frac {<{\vec {a}},{\vec {b}}>}{||{\vec {a}}||*||{\vec {b}}||}}}$
${\displaystyle {\frac {|A\cap B|}{|A\cup B|}}}$
${\displaystyle {\frac {|A\cup B|}{|A\cap B|}}}$
${\displaystyle {\frac {\sum _{i=1}^{n}a_{i}b_{i}}{\sum _{i=1}^{n}b_{i}b_{i}\sum _{i=1}^{n}a_{i}a_{i}}}}$

3 given the following documents: d1=the web science class is the most interesting class at the university d1=we study web science all day long map the following values

tf(the) df(the) idf(the) tf(web) df(web) idf(web)
1/2
1/3
1
2
3

4 Given the following Probabilities ${\displaystyle P(a)=0.1,P(b)=0.3,P(c)=0.6}$ which of the following is true:

 a a a b b is more likely to occure than b b a a a c c c b b is more likely to occure than b b a a a

5 In the urn process for generating words the following words have been generated: a a b a b a a c which of the following statements is true:

 a new word d can be generated with a probability of 1/4 a new word d can be generated with a probability of 1/8 a new word d can be generated with a probability of 1/9 the probability of the next generated word to be a is 1 / 3 the probability of the next generated word to be a is 1 / 4 the probability of the next generated word to be a is 2 / 3 the probability of the next generated word to be a is 3 / 4

## properties of the web graph

resourcen: http://www9.org/w9cdrom/160/160.html graph structures on the web

1 When modelling the world wide web as a graph what are the nodes?

 a single Web page that can be retrieved via an http request a single Web site a single IP-address every URL every URI

2 Creating the adjacency matrix of the web graph what is true

 the matrix is dense. the matrix is diagonalisable. more than 95% of all entries in the matrix will be 0. Using the bellman ford algorithm we would not detect cycles. when squaring the matrix the amount of zero entries will increase.

3 Which of the following statements are true about the Graph of Web pages

 pages are represented as edges the graph represents a scale free network the strongly connected component represents a scale free network the degree distribution is similar to that from an Erdős–Rényi graph The Graph is connected Eventually every web page can be reached by clicking and following links The most central node can always be reached by clicking around in consists of a bow tie structure.

4 What is true for small world networks?

 Subway systems or the street system are important examples of small world networks. removing a random node decreases the average diameter of the network. the diameter of the network grows proportional to the logarithm of the number of nodes. the diameter of the network grows proportional to the logarithm of of the logartighm of the number of nodes. if the graph has ${\displaystyle N}$ nodes the average node degree is proportional to ${\displaystyle LogN}$ when picking two random nodes the path between them is about ${\displaystyle logN}$

5 Which of the following is a definition of the diameter of a network?

 the average path length of the shortest path between two nodes is called the diameter. the longest value from calculating the shortest path between all pairs of nodes is called the diameter. the highest node degree is called the diameter. the number of edges in the graph divided by the number of nodes is called the diameter. the average node degree is called the diameter.

## working with graphs

1 You are given the adjecency matrix of the web graph. Which of the following statements holds true

 the number of rows equals the number of columns the number of rows equals the number of web pages the number of columns equals the number of links squaring the matrix has no effect there will be as many non zero entries as there are links ${\displaystyle a_{ij}=a_{ji}}$ the matrix is not symmetric

2 What is correct about the indegree and outdegree distribution of graphs

 the sum af all in degrees is smaller than the sum of all out degrees. the sum of all in degrees equals the sum of all out degrees. the sum af all in degrees is bigger than the sum of all out degrees. the above statements depend on the kind data that are modeled in the graph.

3 You are given the adjecency matrix A of the web graph. Which of the following statements about degree computations hold true?

 ${\displaystyle a_{ij}}$ encodes if there is a link from website ${\displaystyle i}$ to web site ${\displaystyle j}$. ${\displaystyle a_{ij}}$ is the indegree of ${\displaystyle i}$ and the outdegree of web site ${\displaystyle j}$. ${\displaystyle a_{ij}}$ is the indegree of ${\displaystyle i}$ and the outdegree of web site ${\displaystyle j}$. ${\displaystyle a_{ij}}$ encodes if there is a link from website ${\displaystyle j}$ to web site ${\displaystyle i}$. none of the above.

4 You are given the adjecency matrix A of the web graph. map the formulas to interpretations?

${\displaystyle f_{1}=\sum _{i}a_{ij}}$ and ${\displaystyle f_{2}=\sum _{j}a_{ij}}$
gives the outdegree of website i
gives the outdegree of website j
gives the outdegree of website i
gives the indegree of website j
sum of entries in column i
sum of entries in column j
sum of entries in row i
sum of entries in row j

5 You are given the adjecency matrix A of the web graph. and let ${\displaystyle {\vec {e_{i}}}}$ be the i-th base vector. What holds true?

 ${\displaystyle A{\vec {e_{i}}}}$ gives the ${\displaystyle i}$-th column ${\displaystyle A{\vec {e_{i}}}}$ gives the ${\displaystyle i}$-th row ${\displaystyle A{\vec {e_{i}}}}$ encodes the pages linking to page ${\displaystyle i}$ ${\displaystyle A{\vec {e_{i}}}}$ encodes the pages page ${\displaystyle i}$ links to. ${\displaystyle {\vec {e_{i}}}^{T}A}$ gives the ${\displaystyle i}$-th colmn ${\displaystyle {\vec {e_{i}}}^{T}A}$ gives the ${\displaystyle i}$-th row ${\displaystyle {\vec {e_{i}}}^{T}A}$ encodes the pages linking to page ${\displaystyle i}$ ${\displaystyle {\vec {e_{i}}}^{T}A}$ encodes the pages page ${\displaystyle i}$ links to.

6 You are given the adjecency matrix A of the web graph. and let ${\displaystyle {\vec {e_{i}}}}$ be the i-th base vector. Now you want to find websites that are similar to each other. Which of the following strategies would work?

 Using the Dijkstra Algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar. Using Breadth first search to calculate all shortest paths of length 2 starting from a given website and take them as similar. Using Bellman ford algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar. for all pairs of rows interpret them as vectors and calculate the cosine similarity for all pairs of colums interpret them as vectors and calculate the cosine similarity use the following formula for all paris ${\displaystyle i,j}$ of pages use ${\displaystyle {\frac {\sum _{k}a_{ki}a_{kj}}{(\sum _{k}a_{ki}a_{ki})(\sum _{k}a_{kj}a_{kj})}}}$ start from one website and do ${\displaystyle k}$ random walks

7 Take the adjacency matrix of the largest strongly connected component of the web graph. Now you do a random walk starting at the page ${\displaystyle i}$ which is encoded in your starting vector ${\displaystyle {\vec {v}}}$. Which of the following is true?

 ${\displaystyle {\vec {v}}={\vec {e_{i}}}}$ where ${\displaystyle {\vec {e_{i}}}}$ is the ${\displaystyle i}$-th base vector ${\displaystyle {\vec {v}}={\vec {e_{i}}}}$ where ${\displaystyle {\vec {e_{i}}}}$ encodes the ${\displaystyle i}$-th web page ${\displaystyle (A^{n}{\vec {v}})_{j}}$ gives you the probability of being at page ${\displaystyle j}$ after ${\displaystyle n}$ steps in the random walk ${\displaystyle (A^{n}{\vec {v}})}$ can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after ${\displaystyle n}$ steps in the random walk. ${\displaystyle /frac{(A^{n}{\vec {v}})}{||(A^{n}{\vec {v}})||}}$ can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after ${\displaystyle n}$ steps in the random walk.

8 preferential attachment

 cor

9 eigenvalues

 cor

## old questions

Who has created the Web?

 Tim Berners-Lee Vint Cerf Al Gore W3C IEEE Everyone Noone

Which of the following items may constitute forces that might bring structure to the Web?

 Entropy Collaborative work Standards Search engine behavior Randomness of user behavior Users imitating users

Which of the following statements are true

 Descriptive models cannot predict the future Independent variables cannot be observed The definition of dependent and independent variables depends on the creator of a model Predictive models describe causality observed correlations lead to good predictive models

1 What do people produce on the Web and can be subject of scientific investigation and modelling?

 Bookmarks Classification Colors File extensions Geolocations Keywords, tags, hashtags Likes Metadata Numbers Playlists Words Other

2 How can hyperlinks be reasonably modelled?

 As a directed pair of two Web pages (links from .... to ...) As a triplet of anchor text and two Web pages (using text for hyperlink ... links from ... to ... ) As a quadruplet of anchor text and two Web pages and target anchor name (using text for hyperlink ... links from ... to ... targeting anchor of name ...) As a quintuplet of anchor text and two Web pages and target anchor name and link creator (using text for hyperlink ... links from ... to ... targeting anchor of name ... created by ...)