Web Science/Part2: Emerging Web Properties/MoocIndex/Quizes

this is a collection of quiz questions for part 2 of the Web Science MOOC until the part is not completely structured we need to collect the questions here.

ideas for exercises

generate small world graphs with 10, 100, 1000, 10000 and 100000 nodes and plot the diameter of them.
plot the degree distribution on wikipedia
plot the word distribution on wikipedia
calculate different similarity measures on wikipedia articles
fill out a diagram that has no description.

reading diagrams

modeling text in a vector space

	10 (as many as documents)
	20 (as many as words)
	30 (number of documents plus number of words)
	200 (number of documents times number of words)

Tanimoto Coefficient	Cosine Similarity
		${\frac {A\cap B}{A\cup B}}$
		${\frac {A\cup B}{A\cap B}}$
		${\frac {<{\vec {a}},{\vec {b}}>}{\|\|{\vec {a}}\|\|*\|\|{\vec {b}}\|\|}}$
		${\frac {\|A\cap B\|}{\|A\cup B\|}}$
		${\frac {\|A\cup B\|}{\|A\cap B\|}}$
		${\frac {\sum _{i=1}^{n}a_{i}b_{i}}{\sum _{i=1}^{n}b_{i}b_{i}\sum _{i=1}^{n}a_{i}a_{i}}}$

	a a a b b is more likely to occure than b b a a a
	c c c b b is more likely to occure than b b a a a

	a new word d can be generated with a probability of 1/4
	a new word d can be generated with a probability of 1/8
	a new word d can be generated with a probability of 1/9
	the probability of the next generated word to be a is 1 / 3
	the probability of the next generated word to be a is 1 / 4
	the probability of the next generated word to be a is 2 / 3
	the probability of the next generated word to be a is 3 / 4

properties of the web graph

resourcen: http://www9.org/w9cdrom/160/160.html graph structures on the web

working with graphs

	the number of rows equals the number of columns
	the number of rows equals the number of web pages
	the number of columns equals the number of links
	squaring the matrix has no effect
	there will be as many non zero entries as there are links
	$a_{ij}=a_{ji}$
	the matrix is not symmetric

	the sum af all in degrees is smaller than the sum of all out degrees.
	the sum of all in degrees equals the sum of all out degrees.
	the sum af all in degrees is bigger than the sum of all out degrees.
	the above statements depend on the kind data that are modeled in the graph.

	$a_{ij}$ encodes if there is a link from website $i$ to web site $j$ .
	$a_{ij}$ is the indegree of $i$ and the outdegree of web site $j$ .
	$a_{ij}$ is the indegree of $i$ and the outdegree of web site $j$ .
	$a_{ij}$ encodes if there is a link from website $j$ to web site $i$ .
	none of the above.

$f_{1}=\sum _{i}a_{ij}$ and	$f_{2}=\sum _{j}a_{ij}$
		gives the outdegree of website i
		gives the outdegree of website j
		gives the outdegree of website i
		gives the indegree of website j
		sum of entries in column i
		sum of entries in column j
		sum of entries in row i
		sum of entries in row j

	$A{\vec {e_{i}}}$ gives the $i$ -th column
	$A{\vec {e_{i}}}$ gives the $i$ -th row
	$A{\vec {e_{i}}}$ encodes the pages linking to page $i$
	$A{\vec {e_{i}}}$ encodes the pages page $i$ links to.
	${\vec {e_{i}}}^{T}A$ gives the $i$ -th colmn
	${\vec {e_{i}}}^{T}A$ gives the $i$ -th row
	${\vec {e_{i}}}^{T}A$ encodes the pages linking to page $i$
	${\vec {e_{i}}}^{T}A$ encodes the pages page $i$ links to.

	Using the Dijkstra Algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar.
	Using Breadth first search to calculate all shortest paths of length 2 starting from a given website and take them as similar.
	Using Bellman ford algorithm to calculate all shortest paths of length 2 starting from a given website and take them as similar.
	for all pairs of rows interpret them as vectors and calculate the cosine similarity
	for all pairs of colums interpret them as vectors and calculate the cosine similarity
	use the following formula for all paris $i,j$ of pages use ${\frac {\sum _{k}a_{ki}a_{kj}}{(\sum _{k}a_{ki}a_{ki})(\sum _{k}a_{kj}a_{kj})}}$
	start from one website and do $k$ random walks

	${\vec {v}}={\vec {e_{i}}}$ where ${\vec {e_{i}}}$ is the $i$ -th base vector
	${\vec {v}}={\vec {e_{i}}}$ where ${\vec {e_{i}}}$ encodes the $i$ -th web page
	$(A^{n}{\vec {v}})_{j}$ gives you the probability of being at page $j$ after $n$ steps in the random walk
	$(A^{n}{\vec {v}})$ can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after $n$ steps in the random walk.
	$/frac{(A^{n}{\vec {v}})}{\|\|(A^{n}{\vec {v}})\|\|}$ can be seen as being a probability distribution. Where each component is the probability to be at the page represented by this component after $n$ steps in the random walk.

cor

old questions

	9
	11
	36
	55
	184

	9
	11
	36
	55
	184

	a standard diagramme
	a log plot
	a log-log plot
	an exponential plot
	no kind of diagramme

	a standard diagramme
	a log plot
	a log-log plot
	an exponential plot
	no kind of diagramme

	probabilities represent how often events have been observed in an experiment
	a histogram represents how often events have been observed in an experiment
	the values of a probability distribution add up to 1
	the values of a histogram add up to 1

	They are not different, but the same.
	Large numbers of out-degrees are less likely to occur than large numbers of in-degrees
	Large numbers of in-degrees are less likely to occur than large numbers of out-degrees

	a curve that appears as a straight line in this plot represents a linear function
	a curve that appears as a straight line in this plot represents a logarithmic function
	going from one numerical axis label to the next one can be achieved by multiplying the first label with a constant number
	going from one numerical axis label to the next one can be achieved by adding a constant number to the first label

	$g(x)=c*f(x)$
	$g(x)=f(x)+f(x)$
	$g(x)=f(x)*f(x)$
	$g(x)=f(x)+c$

tf(the)	df(the)	idf(the)	tf(web)	df(web)	idf(web)
						1/2
						1/3
						1
						2
						3