Web Science/Part2: Emerging Web Properties/MoocIndex/Quizes

From Wikiversity
Jump to navigation Jump to search

this is a collection of quiz questions for part 2 of the Web Science MOOC until the part is not completely structured we need to collect the questions here.

ideas for exercises[edit]

  • generate small world graphs with 10, 100, 1000, 10000 and 100000 nodes and plot the diameter of them.
  • plot the degree distribution on wikipedia
  • plot the word distribution on wikipedia
  • calculate different similarity measures on wikipedia articles
  • fill out a diagram that has no description.

reading diagrams[edit]

1 Given this in-degree distribution:

0 1 2 3 4 5 6 7 8 9 10
0 1 4 3 6 8 4 5 2 3 0

How many nodes are in this network?

9
11
36
55
184

2 Given this in-degree distribution:

0 1 2 3 4 5 6 7 8 9 10
0 1 4 3 6 8 4 5 2 3 0

How many edges are in this network?

9
11
36
55
184

3 Why is an in-degree distribution of Web pages different from an out-degree distribution of Web pages?

They are not different, but the same.
Large numbers of out-degrees are less likely to occur than large numbers of in-degrees
Large numbers of in-degrees are less likely to occur than large numbers of out-degrees

4 A logarithmic function, y=log x, appears as a straight line in...

a standard diagramme
a log plot
a log-log plot
an exponential plot
no kind of diagramme

5 A complex polynom, e.g. y=3+x^1.5+x^7+x^11, appears as a straight line in...

a standard diagramme
a log plot
a log-log plot
an exponential plot
no kind of diagramme

6 Which of these statements are true?

probabilities represent how often events have been observed in an experiment
a histogram represents how often events have been observed in an experiment
the values of a probability distribution add up to 1
the values of a histogram add up to 1

7 given a plot with a logarithmic y-axis which of the following is true?

a curve that appears as a straight line in this plot represents a linear function
a curve that appears as a straight line in this plot represents a logarithmic function
going from one numerical axis label to the next one can be achieved by multiplying the first label with a constant number
going from one numerical axis label to the next one can be achieved by adding a constant number to the first label

8 have a look at the following plot of functions and . Which of the following statements is true?


modeling text in a vector space[edit]

1 You have 10 documents consisting of 20 words how many base vectors do you need to represent the documents using TF-IDF scores?

10 (as many as documents)
20 (as many as words)
30 (number of documents plus number of words)
200 (number of documents times number of words)

2 map the following formulas to the metrics they represent

Tanimoto Coefficient Cosine Similarity

3 given the following documents: d1=the web science class is the most interesting class at the university d1=we study web science all day long map the following values

tf(the) df(the) idf(the) tf(web) df(web) idf(web)
1/2
1/3
1
2
3

4 Given the following Probabilities which of the following is true:

a a a b b is more likely to occure than b b a a a
c c c b b is more likely to occure than b b a a a

5 In the urn process for generating words the following words have been generated: a a b a b a a c which of the following statements is true:

a new word d can be generated with a probability of 1/4
a new word d can be generated with a probability of 1/8
a new word d can be generated with a probability of 1/9
the probability of the next generated word to be a is 1 / 3
the probability of the next generated word to be a is 1 / 4
the probability of the next generated word to be a is 2 / 3
the probability of the next generated word to be a is 3 / 4


properties of the web graph[edit]

resourcen: http://www9.org/w9cdrom/160/160.html graph structures on the web

1 When modelling the world wide web as a graph what are the nodes?

a single Web page that can be retrieved via an http request
a single Web site
a single IP-address
every URL
every URI

2 Creating the adjacency matrix of the web graph what is true

the matrix is dense.
the matrix is diagonalisable.
more than 95% of all entries in the matrix will be 0.
Using the bellman ford algorithm we would not detect cycles.
when squaring the matrix the amount of zero entries will increase.

3 Which of the following statements are true about the Graph of Web pages

pages are represented as edges
the graph represents a scale free network
the strongly connected component represents a scale free network
the degree distribution is similar to that from an Erdős–Rényi graph
The Graph is connected
Eventually every web page can be reached by clicking and following links
The most central node can always be reached by clicking around
in consists of a bow tie structure.

4 What is true for small world networks?

Subway systems or the street system are important examples of small world networks.
removing a random node decreases the average diameter of the network.
the diameter of the network grows proportional to the logarithm of the number of nodes.
the diameter of the network grows proportional to the logarithm of of the logartighm of the number of nodes.
if the graph has nodes the average node degree is proportional to
when picking two random nodes the path between them is about

5 Which of the following is a definition of the diameter of a network?

the average path length of the shortest path between two nodes is called the diameter.
the longest value from calculating the shortest path between all pairs of nodes is called the diameter.
the highest node degree is called the diameter.
the number of edges in the graph divided by the number of nodes is called the diameter.
the average node degree is called the diameter.


working with graphs[edit]

1 You are given the adjecency matrix of the web graph. Which of the following statements holds true

the number of rows equals the number of columns
the number of rows equals the number of web pages
the number of columns equals the number of links
squaring the matrix has no effect
there will be as many non zero entries as there are links
the matrix is not symmetric

2 What is correct about the indegree and outdegree distribution of graphs

the sum af all in degrees is smaller than the sum of all out degrees.
the sum of all in degrees equals the sum of all out degrees.
the sum af all in degrees is bigger than the sum of all out degrees.
the above statements depend on the kind data that are modeled in the graph.

3 You are given the adjecency matrix A of the web graph. Which of the following statements about degree computations hold true?

encodes if there is a link from website to web site .
is the indegree of and the outdegree of web site .
is the indegree of and the outdegree of web site .
encodes if there is a link from website to web site .
none of the above.

4 You are given the adjecency matrix A of the web graph. map the formulas to interpretations?

and
gives the outdegree of website i
gives the outdegree of website j
gives the outdegree of website i
gives the indegree of website j
sum of entries in column i
sum of entries in column j
sum of entries in row i
sum of entries in row j

5 You are given the adjecency matrix A of the web graph. and let be the i-th base vector. What holds true?

gives the -th column
gives the -th row
encodes the pages linking to page
encodes the pages page links to.
gives the -th colmn
gives the -th row
encodes the pages linking to page
encodes the pages page links to.

6 You are given the adjecency matrix A of the web graph. and let be the i-th base vector. Now you want to find websites that are similar to each other. Which of the following strategies would work?

Using the Dijkstra Algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar.
Using Breadth first search to calculate all shortest paths of length 2 starting from a given website and take them as similar.
Using Bellman ford algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar.
for all pairs of rows interpret them as vectors and calculate the cosine similarity
for all pairs of colums interpret them as vectors and calculate the cosine similarity
use the following formula for all paris of pages use
start from one website and do random walks

7 Take the adjacency matrix of the largest strongly connected component of the web graph. Now you do a random walk starting at the page which is encoded in your starting vector . Which of the following is true?

where is the -th base vector
where encodes the -th web page
gives you the probability of being at page after steps in the random walk
can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after steps in the random walk.
can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after steps in the random walk.

8 preferential attachment

cor

9 eigenvalues

cor


old questions[edit]

Who has created the Web?

Tim Berners-Lee
Vint Cerf
Al Gore
W3C
IEEE
Everyone
Noone


Which of the following items may constitute forces that might bring structure to the Web?

Entropy
Collaborative work
Standards
Search engine behavior
Randomness of user behavior
Users imitating users


Which of the following statements are true

Descriptive models cannot predict the future
Independent variables cannot be observed
The definition of dependent and independent variables depends on the creator of a model
Predictive models describe causality
observed correlations lead to good predictive models


1 What do people produce on the Web and can be subject of scientific investigation and modelling?

Bookmarks
Classification
Colors
File extensions
Geolocations
Keywords, tags, hashtags
Likes
Metadata
Numbers
Playlists
Words
Other

2 How can hyperlinks be reasonably modelled?

As a directed pair of two Web pages (links from .... to ...)
As a triplet of anchor text and two Web pages (using text for hyperlink ... links from ... to ... )
As a quadruplet of anchor text and two Web pages and target anchor name (using text for hyperlink ... links from ... to ... targeting anchor of name ...)
As a quintuplet of anchor text and two Web pages and target anchor name and link creator (using text for hyperlink ... links from ... to ... targeting anchor of name ... created by ...)