Web Science/Part2: Emerging Web Properties/Ranking
Home | Part1: Foundations of the web | Part2: Emerging Web properties | Part3: Behavioral Models | Part4: Web & society | Participate | About the Web Science MOOC |
- PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content · Dynamic Web Content
- PART2: Week5: How big is the Web? · Descriptive Web Models · Week6: Advanced Statistic Models · Modelling Similarity · Week7: Generative Modelling of the Web · Graph theoretic Web Modelling
- PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
- PART4: Week10 : Copyright · Net neutrality · Week11: Internet governance · Privacy
December 17th: Ranking and Recommendations
[edit | edit source]Short info
[edit | edit source]3 dimensions of social capital:
- Structural (links, Page Rank and Random surfer)
- Cognitive (content, tf-idf)
- Relations (retweets in twitter)
What we will look at next are:
- User Interest
- Attention
- Innovation rate (production of new hashtags)
--oleamm (discuss • contribs) 01:05, 15 February 2014 (UTC)
December 19th: Spreading Memes
[edit | edit source]Short info from the video
[edit | edit source]Meme implies
- Information
- Replication Mechanism (social mechanism, communication, incentive)
In twitter #hashtag is a part (proxy) of meme, while meme is a full message.
Entropy measures how complex (or diverse) information (or user interests) is. Entropy formula.
User interests as a vector. For each user interests vector, entropy can be calculated.
User interests entropy vs system entropy (example on a diagram). Note: all vectors should have the same size (the number of terms held in the vector).
Similarity of user interests. Cosine similarity between user interests vectors. Only possible between 0 and 90 degrees, since there are no negative values in user interests vectors (term's use can not be negative). Cosine similarity = 1 means the angle between two vectors is 0 => we have similar interests. Cosine similarity = 0 means vectors (interests) are not similar at all. Example: does a particular tweet is correlated with a user interest? (suggestions what is interesting for user).
We have interests (without fading – interests do not depending on time, for simplifying) and entropy (diversity).
Meme diffusion model explanation: tweets, screen and user memory size. Interesting point: More memory cause less memes.
Discussed paper: Competition among memes in a world with limited attention (L. Weng, A. Flammini, A. Vespignani, F. Menczer). Link. --oleamm (discuss • contribs) 20:08, 11 February 2014 (UTC)
January 9th: User Modelling and Recommendations
[edit | edit source]Short info from the video
[edit | edit source]W3C Meta Framework consisting of:
- Identity Framework. Authentication, Oauth: client (application), resource owner (end user), server (facebook).
- Profile Framework. Distributed user profile.
- Policy Framework. W3C P3P - what data is stored, how and by whom it is used, how long is stored. Usability is still an issue.
- Content Framework. Cross posting content (for ex. from twitter to facebook).
- Analytics Framework. Tracking user.
- Other
Second part.
Modelling user. User characteristics (info, interest - vectors and similarity). Elicitation, customization, stereotyping. --oleamm (discuss • contribs) 14:54, 15 February 2014 (UTC)
Web User Profiling and Recommendations
[edit | edit source]When you look back to the definition of Web Science, you see the users in the corresponding picture:
But, up to this point we have mainly looked at:
- Information (e.g. Web page, tweet, meme)
- We have aggregated data about such information, e.g.
- terms (words) used
- (page)Rank
- Number of retweets
- We have aggregated data about such information, e.g.
- Structure (e.g. page links, friendship links)
- We have aggregated data wrt structure, e.g.
- indegree
- outdegree
- degree distribution
- We have aggregated data wrt structure, e.g.
But, we have not so much aggregated data about the *user*, although in a technical-social system the representation of the user, his/her activities is central indeed!
In fact, when we either want to analyze the user from a Web Science perspective, or when we want to understand a user's behavior from a commercial point of view (E.g. for advertisement reasons, or for offering him a better Web service), or if we had unethical/illegal objectives of tracking a user (e.g. to steal his identity), we need models that capture the different aspects of a user.
And if we are the user and we want to protect us from unethical or illegal behavior of others, we need to know what they do in order to be able to protect ourselves.
And if we want to provide an ethical and legal service, we let the user choose to which extent he/she wants to be tracked for which kind of purposes. Thereby, providing an ethical and legal service is not just an altruistic objective (it may be), but clearly organizations that behave unethically and gain a bad reputation may quickly loose their business proposition. Or, do you want your credit card number / your identity / your privacy being stolen?
User Profile Framework
[edit | edit source]Identification
[edit | edit source]Content-based Recommendations
[edit | edit source]Collaborative Filtering
[edit | edit source]The Filter Bubble
[edit | edit source]Eli Pariser anectodes
Herding: The Music Experiments Series
[edit | edit source]These experiments were performed as part of a larger set of experiments by the authors described at: http://www.princeton.edu/~mjs3/musiclab.shtml
The complement to actual herding are paid-for activities, e.g. Facebook like farms: http://www.technologyreview.com/view/530961/the-hidden-world-of-facebook-like-farms/
Rests
[edit | edit source]- Web search typically happens in two steps:
1. Sorting into relevant - not relevant 2. ranking according to * any kind of ranking function * page rank * ... * personalization
- Personalization and Recommendation
* User profile * User represented by vector * Set of users represented by matrix * Recommendations based on linear algebra
- Examples
* product/music/... recommendations (slideshare: Steffen likes his own slides!)