Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem
Jump to navigation
Jump to search
Home | Part1: Foundations of the web | Part2: Emerging Web properties | Part3: Behavioral Models | Part4: Web & society | Participate | About the Web Science MOOC |
Course elements
- PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content · Dynamic Web Content
- PART2: Week5: How big is the Web? · Descriptive Web Models · Week6: Advanced Statistic Models · Modelling Similarity · Week7: Generative Modelling of the Web · Graph theoretic Web Modelling
- PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
- PART4: Week10 : Copyright · Net neutrality · Week11: Internet governance · Privacy
Survival of the fittest
- Fit for whom?
- Search engine operator, search users, advertisers
- Unfit for spammers
- Key performance indicators (multi-criteria optimization problem!)
- Value per click
- User: usability, relevance of search results, coverage of the Web
- Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
- Advertiser: click-through and conversion rate
- Value per click
part 1[edit | edit source]
what is a search engine?[edit | edit source]
- why is it important
- what is key word search?
Search engine history[edit | edit source]
- Archie, 1990
- Gopher, 1991
- WebCrawler, Lycos, Yahoo search 1994
- AltaVista search 1996
- Google search 1998
- Sequels: Baidu, Yandex, Bing
- Alternatives: ask.com, wolframalpha.com
- Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...
Search system architecture[edit | edit source]
- what is a web crawler
- what is a search index (inverted index)
- (for now) blackbox ranking
- binary search relevance
- interface (auto completion, search results,...)
ranking in search I: application of tf idf[edit | edit source]
- show how tf idf can be used for ranking.
ranking in search II: random surfer model[edit | edit source]
double[][] transitionMatrix = { { 0., 1. / 3., 1., 1. / 3., 0. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 1. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 0. } };
int numberOfNodes = 5;
int steps = 100;
int[] frequency = new int[numberOfNodes];
int page = 0;
for (int i = 0; i < steps; i++) {
// Make one random move.
double r = Math.random();
double sum = 0.0;
// go through a column of the matrix
for (int j = 0; j < numberOfNodes; j++) {
sum += transitionMatrix[j][page];
// if propability is high enough see this as a jump
if (r < sum) {
System.out.println("Go from: " + page + " to:" + j);
page = j;
break;
}
}
frequency[page]++;
}
comparison tfidf vs random surfer[edit | edit source]
- Random surfer + tfidf
- showing how to combine two models.
- even more methods can be included
relevance is a choice: Trust issues with search engines[edit | edit source]
- understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
- it will be hard to sense manipulations (magic keyword barack obama)
- large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)
SPAM and SEO[edit | edit source]
- understand that search results can be manipulated
- metadata (schema.org)
The following video of the flipped classroom associated with this topic are available:
You can find more information on wiki commons and also directly download this file
part 2[edit | edit source]
multi stakeholder system[edit | edit source]
- search engine
- end user
- web site owner
- advertiser
- (web master (SEO))
economics of a search engine[edit | edit source]
- understand the concept of keyword based advertising
- understand the auction system of keywords
- understand the model of shared econnomy and man in the middle business models
- taken from b:Strategy_for_Information_Markets/Search_engine_business_models and w:Vickrey_auction
- w:Generalized_second-price_auction
personalization of search results[edit | edit source]
- key methods of personalization (using a coockie)
- graph view of user interests
- collaborative filtering
filter bubble effects[edit | edit source]
Technologies for your own search engine[edit | edit source]
- hadoop
- solr
- nutch
- Elastic search
Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks
Advertising[edit | edit source]
Stakeholders
- advertiser
- customer
- content owner/portal
- advertising network
Intermediaries:
- markets (ebay)
- advertising networks (doubleclick,...)
push out advertisement service from the portal into ad network
- customer: more exact profile, better ad targeting
- content owner/portal: better targeted ads lead to higher revenue
- advertiser: higher click-through rate/conversion rate
- ad network: valuable business model
- Technology
- Business model
- Pricing, auctions
- real-time bidding