Data Mining and Modern Advertising

Some of the largest companies in the new digital age are built upon no cost services. The Google search engine and social media sites like Facebook and YouTube are built upon an economy of human attention and advertisements.

Data Mining and Modern Advertising
Photo by dole777 / Unsplash

Some of the largest companies in the new digital age are built upon no cost services. The Google search engine and social media sites like Facebook and YouTube began to dominate the internet in the age known as Web 2.010. The profits of these companies are built upon an economy of human attention and advertisements. Such platforms are incentivized to retain users’ attention and the most successful companies can achieve this through subtle psychological manipulation. While the systems being developed to facilitate this economy of human attention are impressive, the software professionals working on the current systems that allow for this manipulation of users are no longer complying with the ACM/IEEE-CS Software Engineering Code of Ethics. These systems cannot exist without being in violation of principal one: “Software engineers shall act consistently with the public interest”5. Though these systems are currently being used to encourage longer session times and keep the end-user’s attention for longer, research is being conducted on the application of these systems to contribute good to the general public, not only to company profits.

Data Mining

Data Mining and Modern Advertising
The term “Data Mining” as it is known today was introduced in the 1990s13. The concept of Data Mining has become much more prevalent in “Web 2.0”. Web 2.0, coined in the early 2000s to refer to a turning point in the post-dot-com bubble internet10, is the age where the internet became increasingly filled with user-generated content. This increase in user-generated content also provided internet-based companies incredibly large amounts of data. In 2018, Forbes estimated that 2.5 quintillion bytes of data were being produced every day9 and YouTube itself reports that 500+ hours of video content is uploaded to their platform every minute 18.

What is Data Science?

To leverage the amount of available data, extraction only processing of the data is not enough 13. In order to take full advantage of the data available, summation and interpretation is necessary. The practice of building models to facilitate this analysis is encapsulated under the field of “Data Science”. The process of extraction, summation, and interpretation falls largely into two main models, the “predictive model” and the “descriptive model” 2. The predictive model is used to predict an outcome from a given dataset based upon target variables, and the descriptive model is used to gain a “better understanding of the data, without any single specific target variable” 2. Simply put, the practice of data science is the process to, as Jeanette Wing says, “extract value from data”17.

Data Mining in Social Media

Throughout the modern internet, applications of the adage, “if you are not paying for the product, you are the product” can be found. For a large-scale tech company like Facebook, whose service is provided at no charge to the average user, one can believably question how a company at that size can be profitable. Facebook is a corporation with the technology in place to leverage their large user base for the benefit of businesses using their Pixel system4.

The Facebook Pixel

The Facebook Pixel is a system that allows a company to track users interactions on a website, conversions (sales) from advertisements placed onto one of Facebook’s platforms, as well as facilitating retargeting of users that have either shown interest in a specific product or a related product4. The Facebook Pixel system is based upon the practice of placing “cookies” into the end-user’s browser4. These cookie-based tracking systems have recently come under fire because they allow the large tech platforms the capability to perform both deterministic and predictive analysis on the end-user. The deterministic analysis allows these systems to determine a user’s demographics (e.g. age or race) in addition to their interests11. The information for the deterministic analysis is then available to businesses to see who their user base consists of. After this analysis is complete, these systems are capable of directly targeting end-users to “create more relevant advertisements” 4.

Algorithmically Building an Echo Chamber

These systems are highly beneficial to companies and advertisers, and various social media platforms are all simultaneously fighting for the attention of their users. Companies like YouTube, Facebook, Reddit, and Netflix each want to maximize engagement time with their users. One of the easiest ways to increase the engagement time with the end-user is to have a continual feed of information relevant to the specific user11. This feed will be built from the predictive analysis generated for the express purpose to retain a user’s interest. Unless a user specifically searches for information with differing viewpoints, this process slowly builds an “echo chamber” where the user’s beliefs are only confirmed and not challenged directly15. The presence of digital echo chambers may be a contributing cause of the increasing polarization between political ideologies12.




Data Mining Aided Suicide Prevention

Data mining does have a use for altruistic means, and Jeannette Wing from Columbia University has started promoting the idea of “Data for Good”17. This concept is based on two main ideas. First, that data science should be used to better people’s lives, and second data should be used in a “good” manner17. An example of data being used in a “good” manner can be seen in the proof of concept conducted by Seah and Shim14. Data science models can be used on individual users of a social media site to provide detection of language associated with suicide and self-harm14. With suicide as the 2nd leading cause of death in the age range of 10 to 34 years old1, the work conducted by Seah and Shim demonstrated that there is a valid use case to use predictive models to indicate users that may be at risk for self-harm14.


Companies currently competing within the market of human attention are currently required to produce products that are good for the company and the shareholders above all. Since these companies are developing systems which compete for the same attention pool, the designers of these systems are slowly creating technology that has the potential for addictive behavior. As the Center for Humane Technology states on their website “As long as social media companies profit from outrage, confusion, addiction, and depression, our well-being and democracy will continue to be at risk”7. The concept of addictive technology has become so prevalent that even the CEO of Twitter Jack Dorsey has acknowledged this idea when being questioned by the U.S. Senate Judiciary Committee6:

I do think, like anything else, these tools can be addictive, and we should be aware of that, acknowledge it, and make sure that we are making our customers aware of better patterns of usage. The more information the better here.


1. Biernesser, C., Sewall, C. J. R., Brent, D., Bear, T., Mair, C., & Trauth, J. (2020). Social media use and deliberate self-harm among youth: A systematized narrative review. Children and Youth Services Review, 116.
2. Boonjing, V., & Pimchangthong, D. (2017). Data Mining for Customers’ Positive Reaction to Advertising in Social Media. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems
3. Curtin, S. C. (2020, September 11). National Vital Statistics Reports.
4. Facebook. (2020, February). Business Help Center. Facebook Business Help Center.
5. Gotterbarn, D., Miller, K., & Rogerson, S. (2018, December 19). Software Engineering Code. ACM Ethics.
6. Hartmans, A. (2020, November 17). Jack Dorsey says social media platforms like Twitter and Facebook can be addictive - Mark Zuckerberg says the research is 'inconclusive'. Business Insider.
7. Join the movement for Humane Technology. Center for Humane Technology. (2020).
8. Koch, R. (Ed.). (2019, May 9). Cookies, the GDPR, and the ePrivacy Directive.
9. Marr, B. (2018, May 21). How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. Forbes.
10. O'Reilly, T. (2005, November 30). What Is Web 2.0. O'Reilly.
11. Rhodes, L. (Producer), & Orlowski, J (Director). (2020). The Social Dilemma [Video file]. Retrieved from
12. Pew Research Center. (2017, October 5). The Partisan Divide on Political Values Grows Even Wider. Pew Research Center.
13. Ramzan, M., & Ahmad, M. (2014). Evolution of data mining: An overview. 2014 Conference on IT in Business, Industry and Government (CSIBIG).
14. Seah, J. H. K., & Shim, K. J. (2018). Data Mining Approach to the Detection of Suicide in Social Media: A Case Study of Singapore. 2018 IEEE International Conference on Big Data.
15. Seneca, C. (2020, September 17). How to Break Out of Your Social Media Echo Chamber. Wired.
16. Wing, J. M. (2018a). Data for Good: Abstract. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
17. Wing, J. M. (2018b, August 19). Data for Good: Keynote Address. KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Retrieved November 15, 2020, from
18. YouTube. (2020, September 24). YouTube for Press.