Web Science/Part2: Emerging Web Properties/Ranking

Course elements

PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content · Dynamic Web Content
PART2: Week5: How big is the Web? · Descriptive Web Models · Week6: Advanced Statistic Models · Modelling Similarity · Week7: Generative Modelling of the Web · Graph theoretic Web Modelling
PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
PART4: Week10 : Copyright · Net neutrality · Week11: Internet governance · Privacy

December 17th: Ranking and Recommendations

Short info

3 dimensions of social capital:

Structural (links, Page Rank and Random surfer)
Cognitive (content, tf-idf)
Relations (retweets in twitter)

What we will look at next are:

User Interest
Attention
Innovation rate (production of new hashtags)

--oleamm (discuss • contribs) 01:05, 15 February 2014 (UTC)

December 19th: Spreading Memes

Short info from the video

Meme implies

Information
Replication Mechanism (social mechanism, communication, incentive)

In twitter #hashtag is a part (proxy) of meme, while meme is a full message.

Entropy measures how complex (or diverse) information (or user interests) is. Entropy formula.

User interests as a vector. For each user interests vector, entropy can be calculated.

User interests entropy vs system entropy (example on a diagram). Note: all vectors should have the same size (the number of terms held in the vector).

Similarity of user interests. Cosine similarity between user interests vectors. Only possible between 0 and 90 degrees, since there are no negative values in user interests vectors (term's use can not be negative). Cosine similarity = 1 means the angle between two vectors is 0 => we have similar interests. Cosine similarity = 0 means vectors (interests) are not similar at all. Example: does a particular tweet is correlated with a user interest? (suggestions what is interesting for user).

We have interests (without fading – interests do not depending on time, for simplifying) and entropy (diversity).

Meme diffusion model explanation: tweets, screen and user memory size. Interesting point: More memory cause less memes.

Discussed paper: Competition among memes in a world with limited attention (L. Weng, A. Flammini, A. Vespignani, F. Menczer). Link. --oleamm (discuss • contribs) 20:08, 11 February 2014 (UTC)

January 9th: User Modelling and Recommendations

Short info from the video

W3C Meta Framework consisting of:

Identity Framework. Authentication, Oauth: client (application), resource owner (end user), server (facebook).
Profile Framework. Distributed user profile.
Policy Framework. W3C P3P - what data is stored, how and by whom it is used, how long is stored. Usability is still an issue.
Content Framework. Cross posting content (for ex. from twitter to facebook).
Analytics Framework. Tracking user.
Other

Second part.

Modelling user. User characteristics (info, interest - vectors and similarity). Elicitation, customization, stereotyping. --oleamm (discuss • contribs) 14:54, 15 February 2014 (UTC)

Web User Profiling and Recommendations

When you look back to the definition of Web Science, you see the users in the corresponding picture:

But, up to this point we have mainly looked at:

Information (e.g. Web page, tweet, meme)
- We have aggregated data about such information, e.g.
  - terms (words) used
  - (page)Rank
  - Number of retweets
Structure (e.g. page links, friendship links)
- We have aggregated data wrt structure, e.g.
  - indegree
  - outdegree
  - degree distribution

But, we have not so much aggregated data about the *user*, although in a technical-social system the representation of the user, his/her activities is central indeed!

In fact, when we either want to analyze the user from a Web Science perspective, or when we want to understand a user's behavior from a commercial point of view (E.g. for advertisement reasons, or for offering him a better Web service), or if we had unethical/illegal objectives of tracking a user (e.g. to steal his identity), we need models that capture the different aspects of a user.

And if we are the user and we want to protect us from unethical or illegal behavior of others, we need to know what they do in order to be able to protect ourselves.

And if we want to provide an ethical and legal service, we let the user choose to which extent he/she wants to be tracked for which kind of purposes. Thereby, providing an ethical and legal service is not just an altruistic objective (it may be), but clearly organizations that behave unethically and gain a bad reputation may quickly loose their business proposition. Or, do you want your credit card number / your identity / your privacy being stolen?

User Profile Framework

Identification

Content-based Recommendations

Collaborative Filtering

The Filter Bubble

Eli Pariser anectodes

Herding: The Music Experiments Series

These experiments were performed as part of a larger set of experiments by the authors described at: http://www.princeton.edu/~mjs3/musiclab.shtml

The complement to actual herding are paid-for activities, e.g. Facebook like farms: http://www.technologyreview.com/view/530961/the-hidden-world-of-facebook-like-farms/

Rests

Web search typically happens in two steps:

1. Sorting into relevant - not relevant
2. ranking according to
 * any kind of ranking function
  * page rank
  * ...
  * personalization

Personalization and Recommendation

* User profile
 * User represented by vector
* Set of users represented by matrix
* Recommendations based on linear algebra

Examples

* product/music/... recommendations (slideshare: Steffen likes his own slides!)