Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem

From Wikiversity
Jump to navigation Jump to search

Survival of the fittest

  • Fit for whom?
    • Search engine operator, search users, advertisers
    • Unfit for spammers
  • Key performance indicators (multi-criteria optimization problem!)
    • Value per click
      • User: usability, relevance of search results, coverage of the Web
      • Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
      • Advertiser: click-through and conversion rate

part 1[edit]

what is a search engine?[edit]

  • why is it important
  • what is key word search?

Search engine history[edit]

  • Archie, 1990
  • Gopher, 1991
  • WebCrawler, Lycos, Yahoo search 1994
  • AltaVista search 1996
  • Google search 1998
  • Sequels: Baidu, Yandex, Bing
  • Alternatives: ask.com, wolframalpha.com
  • Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...

Search system architecture[edit]

  • what is a web crawler
  • what is a search index (inverted index)
  • (for now) blackbox ranking
  • binary search relevance
  • interface (auto completion, search results,...)

ranking in search I: application of tf idf[edit]

  • show how tf idf can be used for ranking.

ranking in search II: random surfer model[edit]

  • explaining random surfer model
  • This graphic depicts a graph with 100 steps of a random surfer animation it is used for the Web science MOOC
double[][] transitionMatrix = { { 0., 1. / 3., 1., 1. / 3., 0. },
		{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 1. },
		{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 0. } };
int numberOfNodes = 5;
int steps = 100;

int[] frequency = new int[numberOfNodes];
int page = 0;
for (int i = 0; i < steps; i++) {
	// Make one random move.
	double r = Math.random();
	double sum = 0.0;
	// go through a column of the matrix
	for (int j = 0; j < numberOfNodes; j++) {
		sum += transitionMatrix[j][page];
		// if propability is high enough see this as a jump
		if (r < sum) {
			System.out.println("Go from: " + page + " to:" + j);
			page = j;
			break;
		}
	}
	frequency[page]++;
}

comparison tfidf vs random surfer[edit]

  • Random surfer + tfidf
  • showing how to combine two models.
  • even more methods can be included

relevance is a choice: Trust issues with search engines[edit]

  • understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
  • it will be hard to sense manipulations (magic keyword barack obama)
  • large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)

SPAM and SEO[edit]

  • understand that search results can be manipulated
  • metadata (schema.org)

The following video of the flipped classroom associated with this topic are available:

You can find more information on wiki commons and also directly download this file

part 2[edit]

multi stakeholder system[edit]

  • search engine
  • end user
  • web site owner
  • advertiser
  • (web master (SEO))

economics of a search engine[edit]

personalization of search results[edit]

  • key methods of personalization (using a coockie)
  • graph view of user interests
  • collaborative filtering

filter bubble effects[edit]

Technologies for your own search engine[edit]

  • hadoop
  • solr
  • nutch
  • Elastic search

Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks

Advertising[edit]

Stakeholders

  • advertiser
  • customer
  • content owner/portal
  • advertising network

Intermediaries:

  • markets (ebay)
  • advertising networks (doubleclick,...)

push out advertisement service from the portal into ad network

  • customer: more exact profile, better ad targeting
  • content owner/portal: better targeted ads lead to higher revenue
  • advertiser: higher click-through rate/conversion rate
  • ad network: valuable business model
  • Technology
  • Business model
  • Pricing, auctions
  • real-time bidding