Trust and safety

Last updated

Trust and Safety (T&S) refers to the organizational functions, teams, policies, and technologies that online platforms use to protect users from harmful content, abusive behavior, fraud, and security threats. The term originated in e-commerce contexts in the late 1990s, [1] where it described efforts to build trust between buyers and sellers in online marketplaces. [2] As social media platforms grew in the 2000s and 2010s, T&S expanded to address challenges related to user-generated content, including harassment, online child safety, hate speech, misinformation, and violent extremism. [3]

Contents

Trust and Safety work combines human review with automated detection systems to enforce platform policies. [4] The field has faced scrutiny over enforcement practices, labor conditions for moderators, [5] and questions about platform accountability, with regulatory frameworks increasingly mandating specific T&S requirements. [6] [7]

History

The concept of "Trust and Safety" (T&S) emerged as a critical function in the early days of e-commerce, driven by the inherent risks of transactions between physically separated and anonymous parties. [1] In a 1999 press release, the online auction site eBay used the term while introducing their "SafeHarbor trust and safety program," which included easy access to escrow services and customer support. [8] Initially, eBay's strategy was built on a "community trust" model as its primary security communication, encouraging users to regulate themselves. [2] "Trust" was used to reference trust among eBay users and between eBay users and eBay itself; "safety" was used to reference keeping platform users safe. [3] With internet platforms growing in scale and complexity, there was a marked increase in the scope of online harms. Social media, app stores, and marketplaces faced unique threats, including impersonation, the spread of malware, and sophisticated scams. The term soon spread throughout the tech industry, expanding from online marketplaces to social media, dating websites, and app stores. [9] Trust and Safety teams emerged as distinct entities across the tech sector, with increasing specialization in fields such as child protection, cyberbullying, and harassment prevention.

Regulatory response

While the term evolved within the private sector, judicial rulings began shaping a legislative framework that encouraged investment in online user protection teams. The landmark New York state court decision in Stratton Oakmont, Inc. v. Prodigy Services Co. in 1995 ruled that the early online service provider Prodigy could be liable as a "publisher" for defamatory content posted by a user because the service actively filtered and edited user posts. [10] The ruling created a disincentive for interactive computer services to engage in content moderation, as any attempt to regulate content could result in publisher liability for all user-generated material on their platforms. In response, Congress enacted Section 230 of the Communications Decency Act in 1996, which provided immunity for online platforms hosting third-party content. [11]

Professionalization

During the 2010s, the Trust & Safety field matured as a distinct professional discipline with major technology firms investing heavily in Trust and Safety, creating expansive teams staffed by professionals from legal, policy, technical, and social sciences backgrounds. [1] In February 2018, Santa Clara University School of Law hosted the inaugural "Content Moderation & Removal at Scale" conference, where representatives from major technology companies, including Google, Facebook, Reddit, and Pinterest, publicly discussed their content moderation operations for the first time. [12] During this conference, a group of human rights organizations, advocates, and academic experts developed the Santa Clara Principles on Transparency and Accountability in Content Moderation, which outlined standards for meaningful transparency and due process in platform content moderation. [13]

The Trust & Safety Professional Association (TSPA) and Trust & Safety Foundation (TSF) were jointly launched in 2020 as a result of the Santa Clara conference. [14] [15] [16] TSPA was formed as a 501(c)(6) non-profit, membership-based organization supporting the global community of professionals working in T&S with resources, peer connections, and spaces for exchanging best practices. [14] TSF is a 501(c)(3) and focuses on research. [17] TSF co-hosted the inaugural Trust & Safety Research Conference held at Stanford University. [18] [19]

Core functions

While organizational structures vary across companies, T&S teams typically focus on the following core functions:

Account integrity

Account integrity teams work to detect and prevent fraudulent accounts, fake profiles, account takeovers, and coordinated inauthentic behavior. [20] This function combines behavioral analysis, pattern recognition, and authentication systems to identify suspicious account activity, and focuses on ensuring that accounts represent real individuals or legitimate entities and operate within platform guidelines.

Fraud

Fraud prevention extends beyond fake accounts to include detection of financial scams, phishing attempts, and marketplace fraud in order to protect users from financial harm and maintains platform trustworthiness for legitimate transactions. [21] This can include manual and automatic detection systems that analyze transaction patterns including velocity (frequency and volume), geographic anomalies, mismatched billing information, and connections to known fraudulent accounts or payment instruments. Machine learning is frequently used to assign risk levels to transactions based on multiple factors, enabling automated blocking of high-risk payments while allowing low-risk transactions to proceed smoothly. [22] [23] [24]

Content moderation

Content moderation involves reviewing, classifying, and taking action on content (including user-generated content, advertisements, and company-generated content) that violates a platform's policies, such as explicit nudity, misinformation, graphic violence and any other non compliant materials. [25] [26] Platforms use a combination of automated systems and human reviewers to enforce content policies at scale before the content is live, proactive review after the content is live, and reactive review after the content is live. [4] [27]

Child safety

Child safety in Trust & Safety contexts typically refers to the prevention, detection, and reporting of online child sexual exploitation and abuse and represents a critical function in the field. [28] [29] [30] Platforms deploy specialized detection systems to identify child sexual abuse material (CSAM), including hash-matching technologies that detect known CSAM images and machine learning classifiers designed to discover previously unknown material. [31] Child safety also includes the detection and disruption of grooming and solicitation of minors by adults. [32]

Platform manipulation and coordinated behavior

Platform manipulation includes detecting bot networks, spam operations, and coordinated inauthentic behavior designed to artificially amplify content or manipulate public discourse. Detecting coordinated campaigns, including sock puppet attacks, requires understanding evolving adversarial tactics and identifying patterns across multiple accounts. [33] [34]

Regulatory compliance

Regulatory compliance has increasingly become a distinct T&S function as regulatory frameworks have expanded globally. This includes managing copyright takedown requests under laws like the Digital Millennium Copyright Act (DMCA), implementing requirements under the EU's Digital Services Act (DSA), and responding to law enforcement requests. [35] The organization and structure of this function differs by company, with some teams being embedded in Trust & Safety or kept separate in legal or compliance departments. [36]

Approaches to Trust and Safety

Policy development and enforcement

Platform policies are the rules and standards that govern user behavior and content on online platforms. Most platforms distinguish between public-facing documents such as community guidelines or terms of service that describe acceptable use, and internal enforcement materials, which provide moderators with detailed instructions for applying those policies. [37]

In the early stages of an online service, platform policies are often written by founders or early engineering and operations teams. [37] As platforms grow, responsibility for policy development typically shifts to dedicated Legal, Policy, or Trust & Safety departments. [38] Researchers have noted that these policies are shaped by a range of factors, [27] including the platform's stated purpose, regulatory requirements, and business priorities. [37] Policy development can also shift rapidly and involve continuous iteration in response to emerging circumstances. [4] Policy frameworks tend to evolve over time as companies develop new features and respond to user feedback, enforcement data, public controversies, and stakeholder pressure. [39] [40] [41] Approaches to policy-setting differ widely. On centralized commercial platforms, rules are typically written and enforced by internal staff, whereas some decentralized or community-based platforms distribute policymaking to volunteer moderators or user councils. [42]

Enforcement approaches vary across platforms. Some adopt rules-based systems with standardized responses to specific violations, while others implement context-dependent frameworks that consider user intent, cultural norms, and potential harm. [43] [44] Regardless of the model or approach, enforcement efforts generally have two goals: consistent application of rules and the ability to implement them at scale. [45]

Human review and operations

Trust and Safety operations combine human reviewers with automated systems to evaluate content, accounts, and behavior against platform policies. [5] As digital platforms scaled globally with increasing needs for global scale and language support, [46] Business Process Outsourcing (BPO) firms have become instrumental, providing large teams of trained moderators usually based in regions like Southeast Asia, Eastern Europe, and Latin America. This model of commercial content moderation is used by large companies such as Facebook, [47] TikTok, [48] and Google as well as smaller platforms such as Pinterest, Snapchat, and Bluesky. Some platforms like Discord and Reddit rely on a mix of moderators employed by the platform as well as volunteer moderators. [49] The operating model differs by company, depending on the size of the moderation cost and impact of brand risk. [50]

Studies on moderator labor conditions reveal significant psychological costs, [51] with reviewers experiencing trauma, burnout, and mental health impacts from sustained exposure to graphic violence, child abuse imagery, and other harmful content. [5] [52]

Automation and tooling

Automated detection systems enable platforms to identify potential policy violations at scales exceeding human capacity. [53] These technologies include hash-matching systems such as PhotoDNA, PDQ, and CSAI Match that identify known illegal content, such as CSAM and terrorism and violent extermism, through digital fingerprinting; [54] machine learning classifiers that analyze visual, textual, and behavioral patterns; [55] natural language processing tools for analyzing context and meaning; [56] and network analysis systems that detect coordinated behavior patterns. [57] Platforms integrate detection technologies with case management systems that route flagged content into review queues, assign priority levels, track enforcement decisions, and manage user appeals. [44]

Technical infrastructure also includes integration with external databases maintained by organizations including the National Center for Missing & Exploited Children (NCMEC) and intelligence sharing programs like Project Lantern of the Technology Coalition, facilitating information sharing across platforms and with dedicated nonprofit organizations tasked with investigating specific harms. [58] Internal enforcement guidelines are typically confidential, though leaked documents have occasionally provided public insight into implementation practices. [59] [60] [61]

See also

References

  1. 1 2 3 "The Evolution of Trust & Safety". Trust & Safety Professional Association. Retrieved 2025-11-08.
  2. 1 2 Boyd, Josh (April 1, 2002). "In Community We Trust: Online Security Communication at eBay". Journal of Computer-Mediated Communication. 7 (3).
  3. 1 2 Cryst, Elena; Grossman, Shelby; Hancock, Jeff; Stamos, Alex; Thiel, David (2021). "Introducing the Journal of Online Trust and Safety". Journal of Online Trust and Safety. 1 (1). doi: 10.54501/jots.v1i1.8 . ISSN   2770-3142.
  4. 1 2 3 Gillespie, Tarleton (2018-06-26). Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press. ISBN   978-0-300-23502-9.
  5. 1 2 3 Roberts, Sarah T. (2019). Behind the Screen: Content Moderation in the Shadows of Social Media. Yale University Press. doi:10.2307/j.ctvhrcz0v. ISBN   978-0-300-23588-3. JSTOR   j.ctvhrcz0v.
  6. Langvardt, Kyle (2017–2018). "Regulating Online Content Moderation". Georgetown Law Journal. 106: 1353.
  7. "White Paper: Regulation, the Internet Way | Data-Smart City Solutions". datasmart.hks.harvard.edu. 2015-04-08. Retrieved 2025-11-09.
  8. "About eBay: Press Releases". pages.ebay.com. Archived from the original on 2000-08-15. Retrieved 2025-11-08.
  9. Keats Citron, Danielle; Waldman, Ari Ezra (August 23, 2025). "The Evolution of Trust and Safety". Emory Law Journal. Forthcoming. SSRN   5401604.
  10. "Stratton Oakmont, Inc. et al. v. Prodigy Services Company, et al - Internet Library of Law and Court Decisions". www.internetlibrary.com. Retrieved 2025-11-08.
  11. Dickinson, Gregory M. (2024). "Section 230: A Juridical History". Stanford Technology Law Review. 28: 1.
  12. "An Exercise in Moderation". Santa Clara University. Retrieved 2025-11-08.
  13. "Santa Clara Principles on Transparency and Accountability in Content Moderation". Santa Clara Principles. Retrieved 2025-11-08.
  14. 1 2 Cai, Adelin; Tsao, Clara (2020-08-28). "The Trust & Safety Professional Association: Advancing The Trust And Safety Profession Through A Shared Community Of Practice". Techdirt. Retrieved 2025-11-09.
  15. tspa-production (2020-06-17). "A Pre-History of the Trust & Safety Professional Association (TSPA)". Trust & Safety Professional Association. Retrieved 2025-11-09.
  16. "Databite No. 134: Origins of Trust and Safety with Alexander Macgillivray and Nicole Wong". Data & Society. Retrieved 2025-11-09.
  17. Menking, Amanda; Elswah, Mona; Grüning, David J.; Hansen, Lasse H.; Huang, Irene; Kamin, Julia; Normann, Catrine (2025-07-17), Bridging Boundaries: How to Foster Effective Research Collaborations Across Affiliations in the Field of Trust and Safety, arXiv: 2507.13008 , retrieved 2025-11-09
  18. "Trust and Safety Research Conference 2025". cyber.fsi.stanford.edu. Retrieved 2025-11-09.
  19. Hendrix, Justin (2022-09-25). "Trust and Safety Comes of Age? | TechPolicy.Press". Tech Policy Press. Retrieved 2025-11-09.
  20. Weedon, Jen; Nuland, William; Stamos, Alex (2017), Information operations and Facebook
  21. Castell, Michelle (April 2013), Mitigating Online Account Takeovers: The Case for Education (PDF), Retail Payments Risk Forum Survey Paper, Federal Reserve Bank of Atlanta, archived from the original (PDF) on 2021-09-25
  22. Bin Sulaiman, Rejwan; Schetinin, Vitaly; Sant, Paul (2022-06-01). "Review of Machine Learning Approach on Credit Card Fraud Detection". Human-Centric Intelligent Systems. 2 (1): 55–68. doi: 10.1007/s44230-022-00004-0 . ISSN   2667-1336.
  23. Ali, Abdulalem; Abd Razak, Shukor; Othman, Siti Hajar; Eisa, Taiseer Abdalla Elfadil; Al-Dhaqm, Arafat; Nasser, Maged; Elhassan, Tusneem; Elshafie, Hashim; Saif, Abdu (2022-09-26). "Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review". Applied Sciences. 12 (19): 9637. doi: 10.3390/app12199637 . ISSN   2076-3417.
  24. Raghavan, Pradheepan; Gayar, Neamat El (December 2019). "Fraud Detection using Machine Learning and Deep Learning". 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). pp. 334–339. doi:10.1109/ICCIKE47802.2019.9004231. ISBN   978-1-7281-3778-0.
  25. PricewaterhouseCoopers. "The quest for truth: Content moderation". PwC. Retrieved 2023-03-08.
  26. Cinelli, Matteo; Pelicon, Andraž; Mozetič, Igor; Quattrociocchi, Walter; Novak, Petra Kralj; Zollo, Fabiana (2021-11-11). "Dynamics of online hate and misinformation". Scientific Reports. 11 (1): 22083. Bibcode:2021NatSR..1122083C. doi:10.1038/s41598-021-01487-w. ISSN   2045-2322. PMC   8585974 . PMID   34764344.
  27. 1 2 Klonick, Kate (2018-04-10). "The New Governors: The People, Rules, and Processes Governing Online Speech". Harvard Law Review. Retrieved 2025-11-09.
  28. "CyberTipline". National Center for Missing & Exploited Children. Archived from the original on 2025-08-05. Retrieved 2025-11-09.
  29. Jang, Yujin; Ko, Bomin (2023-08-19). "Online Safety for Children and Youth under the 4Cs Framework-A Focus on Digital Policies in Australia, Canada, and the UK". Children (Basel, Switzerland). 10 (8): 1415. doi: 10.3390/children10081415 . ISSN   2227-9067. PMC   10453252 . PMID   37628414.
  30. Thakur, Dhanaraj (2024-11-21). "Real Time Threats: Analysis of Trust and Safety Practices for Child Sexual Exploitation and Abuse (CSEA) Prevention on Livestreaming Platforms". Center for Democracy and Technology. Retrieved 2025-11-09.
  31. Sujay, Devangana; Kapoor, Vineet; Kumar Shandilya, Shishir (2024-12-27). "A Comprehensive Survey of Technological Approaches in the Detection of CSAM". Taylor & Francis: 30–43. doi:10.1201/9781003471103-3. ISBN   978-1-003-47110-3. Archived from the original on 2025-04-29.
  32. "Grooming in the Digital Age". National Center for Missing & Exploited Children. Archived from the original on 2025-09-24. Retrieved 2025-11-09.
  33. "Exposing Cross-Platform Coordinated Inauthentic Activity in the Run-Up to the 2024 U.S. Election". arxiv.org. Retrieved 2025-11-09.
  34. Cinelli, Matteo; Cresci, Stefano; Quattrociocchi, Walter; Tesconi, Maurizio; Zola, Paola (2025-03-19), "Coordinated inauthentic behavior and information spreading on Twitter", Decision Support Systems, 160 113819, arXiv: 2503.15720 , doi:10.1016/j.dss.2022.113819 , retrieved 2025-11-09
  35. Mackey, Rory Mir and Aaron (2025-06-26). "How Cops Can Get Your Private Online Data". Electronic Frontier Foundation. Retrieved 2025-11-09.
  36. "Law Enforcement Response". Trust & Safety Professional Association. Retrieved 2025-11-09.
  37. 1 2 3 "Policy Development". Trust & Safety Professional Association. Retrieved 2025-11-11.
  38. "Key Functions and Roles". Trust & Safety Professional Association. Retrieved 2025-11-11.
  39. Suzor, Nicolas P.; West, Sarah Myers; Quodling, Andrew; York, Jillian (2019-03-27). "What Do We Mean When We Talk About Transparency? Toward Meaningful Transparency in Commercial Content Moderation". International Journal of Communication. 13: 18–18. ISSN   1932-8036.
  40. Gorwa, Robert (2024). The Politics of Platform Regulation: How Governments Shape Online Content Moderation. Oxford University Press. ISBN   978-0-19-769285-1.
  41. Newman, Lily Hay. "The Daily Stormer's Last Defender in Tech Just Dropped It". Wired. ISSN   1059-1028 . Retrieved 2025-11-11.
  42. Buckley, Nicole; Schafer, Joseph Scott (August 2, 2021). "Censorship-Free Platforms: Evaluating Content Moderation Policies and Practices of Alternative Social Media". for(e)dialogue.
  43. Edelson, Laura (2024), Hurwitz, Justin (Gus); Langvardt, Kyle (eds.), "Content Moderation in Practice", Media and Society After Technological Disruption, Cambridge: Cambridge University Press, pp. 150–160, ISBN   978-1-009-17441-1 , retrieved 2025-11-11
  44. 1 2 François, Camille; Shen, Juliet; Roth, Yoel; Lai, Samantha; Povolny, Mariel (July 30, 2025). "Four Functional Quadrants for Trust & Safety Tools: Detection, Investigation, Review & Enforcement (DIRE)". Trust, Safety, and the Internet We Share: Multistakeholder Insights, Edited Volume, Taylor & Francis (Forthcoming). SSRN   5369158.
  45. Schaffner, Brennan; Bhagoji, Arjun Nitin; Cheng, Siyuan; Mei, Jacqueline; Shen, Jay L; Wang, Grace; Chetty, Marshini; Feamster, Nick; Lakier, Genevieve; Tan, Chenhao (2024-05-11). ""Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies". Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. CHI '24. New York, NY, USA: Association for Computing Machinery: 1–16. doi:10.1145/3613904.3642333. ISBN   979-8-4007-0330-0.
  46. Mukherjee, Sujata; Eissfeldt, Jan (2023-09-25). "Evaluating the Forces Shaping the Trust & Safety Industry | TechPolicy.Press". Tech Policy Press. Retrieved 2025-11-09.
  47. "The Silent Partner Cleaning Up Facebook for $500 Million a Year (Published 2021)". 2021-08-31. Retrieved 2025-11-09.
  48. Shead, Sam (2020-11-12). "TikTok is luring Facebook moderators to fill new trust and safety hubs". CNBC. Retrieved 2025-11-09.
  49. Seering, Joseph; Dym, Brianna; Kaufman, Geoff; Bernstein, Michael (2022-02-28). "Pride and Professionalization in Volunteer Moderation: Lessons for Effective Platform-User Collaboration". Journal of Online Trust and Safety. 1 (2). doi: 10.54501/jots.v1i2.34 . ISSN   2770-3142.
  50. Madio, Leonardo; Quinn, Martin (2025). "Content moderation and advertising in social media platforms". Journal of Economics & Management Strategy. 34 (2): 342–369. doi:10.1111/jems.12602. hdl: 11577/3516499 . ISSN   1530-9134.
  51. Pinchevski, Amit (2023-01-01). "Social media's canaries: content moderators between digital labor and mediated trauma". Media, Culture & Society. 45 (1): 212–221. doi:10.1177/01634437221122226. ISSN   0163-4437.
  52. Spence, Ruth; Bifulco, Antonia; Bradbury, Paula; Martellozzo, Elena; DeMarco, Jeffrey (2023-09-18). "The psychological impacts of content moderation on content moderators: A qualitative study". Cyberpsychology: Journal of Psychosocial Research on Cyberspace. 17 (4). doi: 10.5817/CP2023-4-8 . ISSN   1802-7962.
  53. Gorwa, Robert; Binns, Reuben; Katzenbach, Christian (2020-01-01). "Algorithmic content moderation: Technical and political challenges in the automation of platform governance". Big Data & Society. 7 (1) 2053951719897945. doi: 10.1177/2053951719897945 . ISSN   2053-9517.
  54. Teunissen, Coen; Napier, Sarah (July 2022). "Child sexual abuse material and end-to-end encryption on social media platforms: An overview". Trends and Issues in Crime and Criminal Justice (653): 1–19.
  55. Chen, Thomas M. (2021-10-10). "Automated Content Classification in Social Media Platforms". Taylor & Francis: 53–71. doi:10.1201/9781003134527-6. ISBN   978-1-003-13452-7. Archived from the original on 2025-05-06.
  56. Khan, Zeeshan (2025-02-28). "Natural Language Processing Techniques for Automated Content Moderation". International Journal of Web of Multidisciplinary Studies. 2 (2): 21–27. ISSN   3049-2424.
  57. Haythornthwaite, Caroline (2023-07-01). "Moderation, Networks, and Anti-Social Behavior Online". Social Media + Society. 9 (3) 20563051231196874. doi: 10.1177/20563051231196874 . ISSN   2056-3051.
  58. "What is Lantern?". inhope.org. Retrieved 2025-11-09.
  59. "Inside Facebook: Die geheimen Lösch-Regeln von Facebook". Süddeutsche.de (in German). 2016-12-16. Retrieved 2025-11-11.
  60. Köver, Chris; Reuter, Markus (2019-12-02). "Discrimination: TikTok curbed reach for people with disabilities". netzpolitik.org (in German). Retrieved 2025-11-11.
  61. "Inside Facebook's Secret Rulebook for Global Political Speech (Published 2018)". 2018-12-27. Retrieved 2025-11-11.