Web Science
From Wikiversity
Home  Part1: Foundations of the web  Part2: Emerging Web properties  Part3: Behavioral Models  Part4: Web & society  Participate  About the Web Science MOOC 
Course elements
 PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content ·Dynamic Web Content
 PART2: Week5: How big is the Web? · Descriptive Web Models · Week7: Advanced Statistic Models · Modelling Similarity · Week8: Generative Modelling of the Web · Graph theoretic Web Modelling
 PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
 PART4: Week8 : Copyright · Net neutrality · Week9: Internet governance · Privacy
Introduction (0th week)
New here[edit]follow this link to learn

Web Science/Part1: Foundations of the web
Lessons
 understand the basic problems when communicating over a shared medium
 understand the origins of ethernet
 be able to name the ethernet header fields
 be able to explain the reason for the preamble
 understand that the cable length has an influence to transfer rate
 understand that speed of light is responsible for the connection between cable length and transfer rate
 be able to calculate the maximum cable length for a given transfer rate
 understand that the cable length is part of the Ethernet protocol
 Understand that Ethernet is a non deterministic program
 Be able to reconstruct a collision detection / resolve algorithm
 Understand what happens if two computers send data at the same time
no learning goals defined
 get introduced to the concept of an IPnetwork
 understand that networks can be interconnected
 learn about the importance for decentralization as a design principle
 realize that Local area networks can be fragmented via IP networks
 understand that an IP network as an overlay network is an abstract thing that is not directly reflecting the hardware settings
 understand the notion of an IPv4 address and its components like network and host part
 understand why MAC addresses do not fulfill the requirements of IP addresses.
 get introduced to the notion of an IP router / gateway
 review the definition and concept of an IP network
 understand that IP routing works on the level of IP networks
 understand the concept of subnetting
 review network classes and understand classless inter domain routing.
 get a feeling for the IP header
 get a better understanding of how the protocol works
 understand which header fields are changed while routing
no learning goals defined
 understand which problems of IP will be solved with the transmition control protocol
 be aware of the limitations of the internet protocol and the internet architecture
 get to know the end to end principle and in which only sender and receiver take care that communication works properly
 understand the concept of a logical connection (virtual communication channel) between two computers on the internet
 understand the importance of acknowledging received messages
 be able to understand the process of establishing a tcp / connection
 understand the concept of a socket in a TCP/IP package
 understand that ports are part of the TCP header
 be able to explain the difference between solicited and unsolicited TCP/IP traffic
 understand how ports can be used for multiplexing internet connections
 understand the concept of windowsize and sliding window
 understand how flow control can prevent TCP connections to overload link layer protocols and slow networks
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
no learning goals defined
 In this lesson you will learn some basics on the Question: Why Web Content needs structure and proper markup.
 Understand the Domain Object Model and the DOM tree
 Understand that HTML is just a special dialect of XML
 Understand the relationship between HTML and XML
 Be able to write simple HTML code having learned a few example elements of HTML (headings, paragraphs, lists, tables, links, anchors, emphasize, input fields; but also few dirty ones like italics, color,...)
 See that HTML really is just another simple mark up and has nothing to do with programming
 Be able to structure web Content using HTML and create pages following a specified structure.
 Know about the style attribute and how to use it within HTML elements
 Know already realize that there are some limits using the style attribute
 be able to create websites that follow a certain style guide
 See the problems with inline styles
 Understand that a style sheet gives you freedom
 being able to explain people why they should use style sheets
 be able to name at least 2 important point why to use style sheets
 know how the cascading process works
 know the basic syntax of cascading stylesheets
 know how to include a media file like a graphic to your webpage.
 understand that images like jpg, gif and bitmaps are hard for machines to understand.
 Know how to use a XML based format to create images that are easy to understand for machines and humans an can even make use of stylesheets.
 Understand that metadata is necessary to communicate the semantics of content
 See that using metadata for ranking in search results is a bad idea
 get introduced to modern ways of publishing media data as RDFa
 Understand the separation between content, structure, layout and meta data
 Review HTML, CSS, XML, SVG and RDFa
 Understand what makes a clean HTML markup ("separation of concerns") vs. unclean one ("mixing responsibilities"); and implications (better or worse maintenance, better or worse personalization, better or worse accessibility)
 become aware of the possibilities to create dynamic content within a webserver
 see that you don't have to implement a webserver to be able to serve dynamic content
 understand some main issues like blocking I/O that one should keep in mind when doing server side programming
 see how the web server is the entry point for web applications
 whitelisting of input vs blacklisting and a method of preventing XSS
 understand the basics of HTTP POST requests
 become aware of security issues while transfering data to a web server
 be able to create a simple web form in HTML
 See how a POST request is handled in a Java Servlet
 get to know the Request object
 see how a data base query and more advanced technology can be included to a servlet
 understand how javascript was supposed to support people to fill out web forms
 understand the issues and disadvantages that arise with javascript
 be aware of JavaScript APIs
 know some of the standard JavaScript libraries
 be able to understand the concept of Ajax requests.
no learning goals defined
Discussion
Web Science/Part2: Emerging Web Properties
Lessons
 The question will remain unanswered during the lesson and the entire course.
 question of size is underspecified because a measure is needed.
 measure depends heavily on the choice of how we model the web.
 We have not yet defined what we mean when we say World Wide Web.
 The web as a software system.
 The web as a collection of text documents.
 The web as a graph of interlinked documents.
 Even when choosing 1 point of view we have fundamentally different ways of modelling.
 understand that only the model is described.
 description of the model can be used for interpretation.
 within the descriptive model one chooses measures to describe the object of study.
 understand the notion of a modelling choice
 be able to criticise a descriptive model and the modelling choices
 Can be used to try to give a reason why something works.
 need to be run more than once!
 understand the notion of a modeling parameter
 will be compared to the descriptive model of our object of study.
no learning goals defined
 Understand why we selected simple English Wikipedia as a toy example for modeling the web
 Understand that a task already as simple as counting words includes modeling choices
 Be familiar with the term “unique word token”
 Know some basic tools to count words and documents
 Be familiar with some basic statistical objects like Median, Mean, and Histograms
 Should be able to relate a histogram to its cumulative distribution function
 Understand the ongoing, cyclic process of research
 Know what falsifiable means and why every research hypothesis needs to be falsifiable
 Be able to formulate your own research hypothesis
 Understand what a loglog plot is
 Improve your skills in reading and interpreting diagrams
 Know about the word rank / frequency plot
 Should be able to transfer a histogram or curve into a cumulative distribution function
 Get a feeling for interdisciplinary research
 Know the Automated Readability Index
 Have a strong sense of support for our research hypothesis
 Be able to critically discuss the limits of our models
 Be able to name some fundamental properties about how frequencies of words in texts are distributed
 Be a little bit more cautious about visual impressions when looking at loglog plots
 Know both formulations of Zipf’s law
 Be able to do a coordinate transformation to change the scales of your plots
 Understand in which scenario power functions appear as straight lines
 Know in which scenarios exponential functions appear as straight lines
 Be even more cautious about your visual impressions
 Know the axioms for a distance measure and how they relate to norms.
 Know at least two distance measures on functions spaces.
 Understand why changing to the CDF makes sense when looking at distance between functions.
 Understand the principle of the KolomogorovSmirnov test for fitting curves
 Know how to transform a rank frequency diagram to a powerlaw plot.
 Understand how powerlaw and pareto plots relate to each other.
 Be able to explain why a pareto plot is just and inverted rank frequency diagram
 Be able to transform the zipf coefficient to the powerlaw and pareto coefficient and vice versa.
 Understand that building the CDF is basically like building the integral.
 Know the properties of a similarity measure
 Be able to relate similarity and distance measures
 Know of two applications for modelling similarity
 Understand how text documents can be modeled as sets
 Know the Jaccard coefficient as a similarity measure on sets
 Know a trick how to remember the formula
 Be aware of the possible outcomes of the Jaccard index
 As always be able to criticize your model
 Be familiar with the the vector space model for text documents
 Be aware of term frequency and (inverse) document frequency
 Have reviewed the definitions of base and dimension
 Realize that the angle between two vectors can be seen as a similarity measure
 Be aware of a unigram Language Model
 Know Laplacian (aka +1) smoothing
 Know the query likelihood model
 The Kullback Leibler Divergence
 See how a similarity measure can be derived from Kullback Leibler Divergence
 Understand that different modeling choices can produce very different results.
 Have a feeling how you could statistically compare the differences of the models.
 Know how you could extract keywords from documents with the tfidf approach.
 Try to argue which model you like best in a certain scenario.
 Understand the principle methodology for building generative models
 Remember why people are interested in generative models
 Know why descriptive models are needed when evaluating a generative model
 Be aware of one way to create a model for text generation
 Understand how to sample values from an arbitrary probability distribution
 Have seen yet another application of the cumulative distribution function
 Understand that sampling from a distribution is just a coordinate transformation of the uniform distribution
 See that it makes sense to compare statistics
 Understand that comparing statistics is not a well defined task
 Be aware of the fact that very different models could lead to the same statistics
 See that one can always increase the model parameters
 Know that increasing model parameters often yields a more accurate model
 Be aware of the bigram and mixed models as examples for our generative processes
 Be familiar with a set theoretic way of denoting a graph
 Know at least 4 different types of graphs
 Have practiced your abilities in reading and writing mathematical formulas
 Be able to model web pages as a graph
 Know that the authorship graph is bipartite
 Know what kind of graph the graph of web pages is
 (as always) be aware of the fact that modeling is done by making choices
 Know terms like Size and (unique) volume
 Be able to count the in and out degree of web pages
 Have an idea what kind of law (in & out) degree distributions follow
 Know that degree is not distributed in a fair way
 Know that the Gini coefficient can be used to measure fairness
 Understand the notion of a path in a (directed) graph
 Know that shortest paths between nodes need not be unique
 Understand the notion of a strongly connected component
 Know about the diameter of a graph
 Be aware of the bow tie structure of the Web
 Be able to read and build an adjacency matrix of a graph
 Know some basic matrix vector multiplications to generate some statistics out of the adjacency matrix
 Understand what is encoded in the components of the kth power of the Adjacency matrix of a graph