Commercial Data Harvesting

From Wikiversity
Jump to navigation Jump to search

The concept Commercial Data Harvesting (CDH) needs 5 basic constituents:

  • (Benefit/Incentive) information/communication service or game that is attractive for users. The user is provider of the data that can be sold by the CDH company. The users should perceive that they get a benefit instead of being harvested.
  • (USER GROUP) a large community of users of the service that generate the data (e.g. users of an information system or a messenger)
  • (CDH COMPANY: Service Provider) Company that performs commerical data harvesting.
  • (Method: User Data Analysis) analysis of collected data from and about the user by using data mining approaches to destill (digital) products[1] that can be sold so customers of the company
  • (CDH CUSTOMERS: Buyers of User Data and derived products) customers of the company that performs commercial data harvesting (CDH). The customers are willing to pay for the knowledge about users, e.g. tailored advertisments according to profile of the users. The payment of CDH user data and derived products allow a free service (e.g. free e-mail account, free use of messenger, ...)


This leads to the following definition:

Commercial Data Harvesting is a concept that
  • uses a communication and information service or game to collect data from a target user group and
  • sell the data or derived digital products to customers, that expect a benefit form having the data or using a digital service, that is based on the harvested user data.

Value of Harvested Data[edit]

The value is data and the derived information services is dependent on the

  • Size: the size community determines if the impact of CUSTOMERS of the data is harvested.
  • Community Network: Who communicates to whom? What type of target group works in the network (educators/students, engineers/developers, researchers, administration)? What type of data can be harvested?
  • Content: What are the topics that are discussed?


Explain the requirements and constraints to avoid commercial data harvesting in critical infrastructure!

Derived Informatioan[edit]

  • Create a user-profile of knowledge and expertise, e.g. to derive tailored advertisments. Basis driver is, that the probability of buying a product is higher if advertisment matches with interests and background of the users.
  • Political opinions and attitudes: Political statements can be tailored to public opinions that are identified by data mining methods.
  • Leisure activties, used technology: Users can be guided to leisure activities that are of interest for the user
  • Health related information and fitness. Certain activities have a positive or negative impact on health. The knowledge about these activities may be of interest for health care and health insurance.

IT-Environments for Harvesting[edit]

  • Commericial data harvesting needs IT environments in which users leave a "large" Digital Footprint. Analyse your own online behavior! Where do you leave a digital footprint (determine roughly the percentage of total online time or explicitly the time span for each IT environment. Examples of IT-environments that can serve as harvesting environments are:
    • Messengers (WhatsApp, Telegram, Signal, deltaChat, Mail, ...)
    • Social Media,
    • Office Products (e.g. writing project proposals, summaries, results, an analysis, ...)
    • GPS-Tracks and Navigation,
    • Voice Recognition,
    • Videoconferencing that is running on IT-infrastructure, that is not controlled by the company, research and developement unit,
    • ...
  • Analyse the benefits for you and perform a Risk Analysis
    • for yourself,
    • for a company or institution you work for or
    • in general for institutions, companies, ... you know (e.g. health care facilities, governmental administration, ...).

Learning Tasks[edit]

  • (Customer or Data Source) Most of the users think, they get a free digital service e.g.
    • e-mail account,
    • fitness analysis,
    • routing and navigation support.
So users regard themselves as customer of an provider of a free digital service, instead of being the information source for digital products that are sold to someone, who is willing to pay for the information or derived services from the determined user-profiles. Why is it important that users regards themselves as "customers of a free digital service" instead of being part of a sold digital product?
  • (Speech Recognition) Explain the role of speech recognition with mobile devices[2] for Commercial Data Harvesting. How is it possible to derive tailored advertisment by analysis of conversations. What are the potential privacy concerns[3] of individuals, research or development units, health care facilities,...
  • (Competition with an Award) A company designs a competition with a first, second and third price for providing a solution for a given problem.
    • Compare the PROs and CONs of a competition in comparision to research and development unit of the company.
    • There are many submissions to the competitions that have weaknesses and will not get a award. Why do have even unsuccessful submissions to competition a value for the company and the solution for the given problem. Would you communicate the value of submissions for the company to the participants?
What are the similarities and differences of Competition with a Award and Commercial data harvesting?
Discuss the need to communicate the "WHY" data is collected from a Neutral Point of View (NPOV) to support decision making of users if they want to share the data or are not willing to share the data for a specific purpose.
  • (Task for Authors of the Learning Resource) How should this learning resource should evolve that the Neutral Point of View (NPOV) in Wikiversity is respected (use talk/discuss page])?
  • (Artificial Intelligence) Commercial Data Harvesting e.g. from mobile devices, fitness trackers, ... generate user-specific data. Analyse the concepts of artificial intelligence and explain, how AI can be applied for pattern recognition of collected data about users!

WikiJournal of Science[edit]

Do you want to create a paper for WikiJournal of Science? Extend the topic with the state of the art technology, IT-strategies and an analysis of basic concepts of business plans for CDH or write an encyclopedic paper for the WikiJournal of Science, feel free to incorporate parts of the learning resource into the paper. Just use the "Cite this page..." feature for reference (see also Open Paper Development)

See also[edit]


  1. Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999, September). Analysis of a very large web search engine query log. In ACm SIGIR Forum (Vol. 33, No. 1, pp. 6-12). ACM.
  2. McGraw, I., Prabhavalkar, R., Alvarez, R., Arenas, M. G., Rao, K., Rybach, D., ... & Parada, C. (2016, March). Personalized speech recognition on mobile devices. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5955-5959). IEEE.
  3. Ramos, C., Augusto, J. C., & Shapiro, D. (2008). Ambient intelligence—the next step for artificial intelligence. IEEE Intelligent Systems, 23(2), 15-18.
  4. Humanitarian Open Street Map Team - Web Portal (accessed 2017/09/11) -