Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem
Appearance
Home | Part1: Foundations of the web | Part2: Emerging Web properties | Part3: Behavioral Models | Part4: Web & society | Participate | About the Web Science MOOC |
Course elements
- PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content · Dynamic Web Content
- PART2: Week5: How big is the Web? · Descriptive Web Models · Week6: Advanced Statistic Models · Modelling Similarity · Week7: Generative Modelling of the Web · Graph theoretic Web Modelling
- PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
- PART4: Week10 : Copyright · Net neutrality · Week11: Internet governance · Privacy
Survival of the fittest
- Fit for whom?
- Search engine operator, search users, advertisers
- Unfit for spammers
- Key performance indicators (multi-criteria optimization problem!)
- Value per click
- User: usability, relevance of search results, coverage of the Web
- Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
- Advertiser: click-through and conversion rate
- Value per click
part 1
[edit | edit source]what is a search engine?
[edit | edit source]- why is it important
- what is key word search?
Search engine history
[edit | edit source]- Archie, 1990
- Gopher, 1991
- WebCrawler, Lycos, Yahoo search 1994
- AltaVista search 1996
- Google search 1998
- Sequels: Baidu, Yandex, Bing
- Alternatives: ask.com, wolframalpha.com
- Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...
Search system architecture
[edit | edit source]- what is a web crawler
- what is a search index (inverted index)
- (for now) blackbox ranking
- binary search relevance
- interface (auto completion, search results,...)
ranking in search I: application of tf idf
[edit | edit source]- show how tf idf can be used for ranking.
ranking in search II: random surfer model
[edit | edit source]double[][] transitionMatrix = { { 0., 1. / 3., 1., 1. / 3., 0. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 1. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 0. } };
int numberOfNodes = 5;
int steps = 100;
int[] frequency = new int[numberOfNodes];
int page = 0;
for (int i = 0; i < steps; i++) {
// Make one random move.
double r = Math.random();
double sum = 0.0;
// go through a column of the matrix
for (int j = 0; j < numberOfNodes; j++) {
sum += transitionMatrix[j][page];
// if propability is high enough see this as a jump
if (r < sum) {
System.out.println("Go from: " + page + " to:" + j);
page = j;
break;
}
}
frequency[page]++;
}
comparison tfidf vs random surfer
[edit | edit source]- Random surfer + tfidf
- showing how to combine two models.
- even more methods can be included
relevance is a choice: Trust issues with search engines
[edit | edit source]- understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
- it will be hard to sense manipulations (magic keyword barack obama)
- large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)
SPAM and SEO
[edit | edit source]- understand that search results can be manipulated
- metadata (schema.org)
The following video of the flipped classroom associated with this topic are available:
You can find more information on wiki commons and also directly download this file
part 2
[edit | edit source]multi stakeholder system
[edit | edit source]- search engine
- end user
- web site owner
- advertiser
- (web master (SEO))
economics of a search engine
[edit | edit source]- understand the concept of keyword based advertising
- understand the auction system of keywords
- understand the model of shared econnomy and man in the middle business models
- taken from b:Strategy_for_Information_Markets/Search_engine_business_models and w:Vickrey_auction
- w:Generalized_second-price_auction
personalization of search results
[edit | edit source]- key methods of personalization (using a coockie)
- graph view of user interests
- collaborative filtering
filter bubble effects
[edit | edit source]Technologies for your own search engine
[edit | edit source]- hadoop
- solr
- nutch
- Elastic search
Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks
Advertising
[edit | edit source]Stakeholders
- advertiser
- customer
- content owner/portal
- advertising network
Intermediaries:
- markets (ebay)
- advertising networks (doubleclick,...)
push out advertisement service from the portal into ad network
- customer: more exact profile, better ad targeting
- content owner/portal: better targeted ads lead to higher revenue
- advertiser: higher click-through rate/conversion rate
- ad network: valuable business model
- Technology
- Business model
- Pricing, auctions
- real-time bidding