# Category:Web Science/Part2: Emerging Web Properties

Jump to navigation
Jump to search
The web as a software system.
The web as a collection of text documents.
The web as a graph of interlinked documents.
Even when choosing 1 point of view we have fundamentally different ways of modelling.
understand that only the model is described.
description of the model can be used for interpretation.
within the descriptive model one chooses measures to describe the object of study.
understand the notion of a modelling choice
be able to criticise a descriptive model and the modelling choices
Can be used to try to give a reason why something works.
need to be run more than once!
understand the notion of a modeling parameter
will be compared to the descriptive model of our object of study.
Understand why we selected simple English Wikipedia as a toy example for modeling the web
Understand that a task already as simple as counting words includes modeling choices
Be familiar with the term “unique word token”
Know some basic tools to count words and documents
Be familiar with some basic statistical objects like Median, Mean, and Histograms
Should be able to relate a histogram to its cumulative distribution function
Understand the ongoing, cyclic process of research
Know what falsifiable means and why every research hypothesis needs to be falsifiable
Be able to formulate your own research hypothesis
Understand what a log-log plot is
Improve your skills in reading and interpreting diagrams
Know about the word rank / frequency plot
Should be able to transfer a histogram or curve into a cumulative distribution function
Get a feeling for interdisciplinary research
Know the Automated Readability Index
Have a strong sense of support for our research hypothesis
Be able to critically discuss the limits of our models
Be able to name some fundamental properties about how frequencies of words in texts are distributed
Be a little bit more cautious about visual impressions when looking at log-log plots
Know both formulations of Zipf’s law
Be able to do a coordinate transformation to change the scales of your plots
Understand in which scenario power functions appear as straight lines
Know in which scenarios exponential functions appear as straight lines
Be even more cautious about your visual impressions
Know the axioms for a distance measure and how they relate to norms.
Know at least two distance measures on functions spaces.
Understand why changing to the CDF makes sense when looking at distance between functions.
Understand the principle of the Kolomogorov-Smirnov test for fitting curves
Know how to transform a rank frequency diagram to a powerlaw plot.
Understand how powerlaw and pareto plots relate to each other.
Be able to explain why a pareto plot is just and inverted rank frequency diagram
Be able to transform the zipf coefficient to the powerlaw and pareto coefficient and vice versa.
Understand that building the CDF is basically like building the integral.
Know the properties of a similarity measure
Be able to relate similarity and distance measures
Know of two applications for modelling similarity
Understand how text documents can be modeled as sets
Know the Jaccard coefficient as a similarity measure on sets
Know a trick how to remember the formula
Be aware of the possible outcomes of the Jaccard index
As always be able to criticize your model
Be familiar with the the vector space model for text documents
Be aware of term frequency and (inverse) document frequency
Have reviewed the definitions of base and dimension
Realize that the angle between two vectors can be seen as a similarity measure
Be aware of a unigram Language Model
Know Laplacian (aka +1) smoothing
Know the query likelihood model
The Kullback Leibler Divergence
See how a similarity measure can be derived from Kullback Leibler Divergence
Understand that different modeling choices can produce very different results.
Have a feeling how you could statistically compare the differences of the models.
Know how you could extract keywords from documents with the tf-idf approach.
Try to argue which model you like best in a certain scenario.
Understand the principle methodology for building generative models
Remember why people are interested in generative models
Know why descriptive models are needed when evaluating a generative model
Be aware of one way to create a model for text generation
Understand how to sample values from an arbitrary probability distribution
Have seen yet another application of the cumulative distribution function
Understand that sampling from a distribution is just a coordinate transformation of the uniform distribution
See that it makes sense to compare statistics
Understand that comparing statistics is not a well defined task
Be aware of the fact that very different models could lead to the same statistics
See that one can always increase the model parameters
Know that increasing model parameters often yields a more accurate model
Be aware of the bigram and mixed models as examples for our generative processes
Be familiar with a set theoretic way of denoting a graph
Know at least 4 different types of graphs
Have practiced your abilities in reading and writing mathematical formulas
Be able to model web pages as a graph
Know that the authorship graph is bipartite
Know what kind of graph the graph of web pages is
(as always) be aware of the fact that modeling is done by making choices
Know terms like Size and (unique) volume
Be able to count the in and out degree of web pages
Have an idea what kind of law (in & out) degree distributions follow
Know that degree is not distributed in a fair way
Know that the Gini coefficient can be used to measure fairness
Understand the notion of a path in a (directed) graph
Know that shortest paths between nodes need not be unique
Understand the notion of a strongly connected component
Know about the diameter of a graph
Be aware of the bow tie structure of the Web
Be able to read and build an adjacency matrix of a graph
Know some basic matrix vector multiplications to generate some statistics out of the adjacency matrix
Understand what is encoded in the components of the k-th power of the Adjacency matrix of a graph

# Web Science/Part2: Emerging Web Properties

## Lessons

- The question will remain unanswered during the lesson and the entire course.
- question of size is underspecified because a measure is needed.
- measure depends heavily on the choice of how we model the web.
- We have not yet defined what we mean when we say World Wide Web.

no learning goals defined

## Discussion

## Subcategories

This category has only the following subcategory.

### W

## Pages in category "Web Science/Part2: Emerging Web Properties"

The following 50 pages are in this category, out of 50 total.

### Q

- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Factors that have impact on advertisement campaigns/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Finding the true value of an advertisement/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Introduction to Online Advertisement/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Metrics for (online) advertisement/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/The Business Model of Search Engines/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Understanding the Problems with Click Fraud/quiz
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Understanding the problems with Web SPAM/quiz

### S

- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Factors that have impact on advertisement campaigns/script
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Finding the true value of an advertisement/script
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Introduction to Online Advertisement/script
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Understanding the Problems with Click Fraud/script
- Web Science/Part2: Emerging Web Properties/Web Search Ecosystem/Understanding the problems with Web SPAM/script
- Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem

### W

- Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web
- Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/Fitting a curve on a log log plot
- Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/The Zipf law for text
- Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/Visually straight lines on log log plots
- Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/Zipf law powerlaw or pareto law.webm
- Web Science/Part2: Emerging Web Properties/Generative Models for the Web
- Web Science/Part2: Emerging Web Properties/Generative Models for the Web/Evaluating a generative model
- Web Science/Part2: Emerging Web Properties/Generative Models for the Web/Introduction to generative modelling.webm
- Web Science/Part2: Emerging Web Properties/Generative Models for the Web/Pittfalls when increasing the number of model parameters
- Web Science/Part2: Emerging Web Properties/Generative Models for the Web/Sampling from a probability distribution
- Web Science/Part2: Emerging Web Properties/How big is the World Wide Web/3 ways to study the Web
- Web Science/Part2: Emerging Web Properties/How big is the World Wide Web/A simplistic descriptive model
- Web Science/Part2: Emerging Web Properties/How big is the World Wide Web/An unrealistic, simplistic generative model
- Web Science/Part2: Emerging Web Properties/How big is the World Wide Web/Problems with the question about the size of the Web
- Web Science/Part2: Emerging Web Properties/How big is the World Wide Web/Summary, further reading, homework
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/Descriptive statistics of the web graph
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/Modelling-graphs-with-linear-algebra
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/Reviewing terms from graph theory
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/The standard web graph model
- Web Science/Part2: Emerging Web Properties/Modeling the Web as a graph/Topology of the web graph
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Comparing Results of Similarity Merasures
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Cosine Similarity For Vectorspaces
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Jaccard Similarity for Sets
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Probabilistic Similarity Measures Kullback Leibler Divergence
- Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Similarity Measures and their Applications
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Counting Words And Documents
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/How to formulate a research hypothesis
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Linguists way of checking simplicity of text
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Number of words needed to understand most of Wikipedia
- Web Science/Part2: Emerging Web Properties/Simple statistical descriptive Models for the Web/Typical length of a document