Trust and Safety (T&S) refers to the organizational functions, teams, policies, and technologies that online platforms use to protect users from harmful content, abusive behavior, fraud, and security threats. The term originated in e-commerce contexts in the late 1990s, [1] where it described efforts to build trust between buyers and sellers in online marketplaces. [2] As social media platforms grew in the 2000s and 2010s, T&S expanded to address challenges related to user-generated content, including harassment, online child safety, hate speech, misinformation, and violent extremism. [3]
Trust and Safety work combines human review with automated detection systems to enforce platform policies. [4] The field has faced scrutiny over enforcement practices, labor conditions for moderators, [5] and questions about platform accountability, with regulatory frameworks increasingly mandating specific T&S requirements. [6] [7]
The concept of "Trust and Safety" (T&S) emerged as a critical function in the early days of e-commerce, driven by the inherent risks of transactions between physically separated and anonymous parties. [1] In a 1999 press release, the online auction site eBay used the term while introducing their "SafeHarbor trust and safety program," which included easy access to escrow services and customer support. [8] Initially, eBay's strategy was built on a "community trust" model as its primary security communication, encouraging users to regulate themselves. [2] "Trust" was used to reference trust among eBay users and between eBay users and eBay itself; "safety" was used to reference keeping platform users safe. [3] With internet platforms growing in scale and complexity, there was a marked increase in the scope of online harms. Social media, app stores, and marketplaces faced unique threats, including impersonation, the spread of malware, and sophisticated scams. The term soon spread throughout the tech industry, expanding from online marketplaces to social media, dating websites, and app stores. [9] Trust and Safety teams emerged as distinct entities across the tech sector, with increasing specialization in fields such as child protection, cyberbullying, and harassment prevention.
While the term evolved within the private sector, judicial rulings began shaping a legislative framework that encouraged investment in online user protection teams. The landmark New York state court decision in Stratton Oakmont, Inc. v. Prodigy Services Co. in 1995 ruled that the early online service provider Prodigy could be liable as a "publisher" for defamatory content posted by a user because the service actively filtered and edited user posts. [10] The ruling created a disincentive for interactive computer services to engage in content moderation, as any attempt to regulate content could result in publisher liability for all user-generated material on their platforms. In response, Congress enacted Section 230 of the Communications Decency Act in 1996, which provided immunity for online platforms hosting third-party content. [11]
During the 2010s, the Trust & Safety field matured as a distinct professional discipline with major technology firms investing heavily in Trust and Safety, creating expansive teams staffed by professionals from legal, policy, technical, and social sciences backgrounds. [1] In February 2018, Santa Clara University School of Law hosted the inaugural "Content Moderation & Removal at Scale" conference, where representatives from major technology companies, including Google, Facebook, Reddit, and Pinterest, publicly discussed their content moderation operations for the first time. [12] During this conference, a group of human rights organizations, advocates, and academic experts developed the Santa Clara Principles on Transparency and Accountability in Content Moderation, which outlined standards for meaningful transparency and due process in platform content moderation. [13]
The Trust & Safety Professional Association (TSPA) and Trust & Safety Foundation (TSF) were jointly launched in 2020 as a result of the Santa Clara conference. [14] [15] [16] TSPA was formed as a 501(c)(6) non-profit, membership-based organization supporting the global community of professionals working in T&S with resources, peer connections, and spaces for exchanging best practices. [14] TSF is a 501(c)(3) and focuses on research. [17] TSF co-hosted the inaugural Trust & Safety Research Conference held at Stanford University. [18] [19]
While organizational structures vary across companies, T&S teams typically focus on the following core functions:
Account integrity teams work to detect and prevent fraudulent accounts, fake profiles, account takeovers, and coordinated inauthentic behavior. [20] This function combines behavioral analysis, pattern recognition, and authentication systems to identify suspicious account activity, and focuses on ensuring that accounts represent real individuals or legitimate entities and operate within platform guidelines.
Fraud prevention extends beyond fake accounts to include detection of financial scams, phishing attempts, and marketplace fraud in order to protect users from financial harm and maintains platform trustworthiness for legitimate transactions. [21] This can include manual and automatic detection systems that analyze transaction patterns including velocity (frequency and volume), geographic anomalies, mismatched billing information, and connections to known fraudulent accounts or payment instruments. Machine learning is frequently used to assign risk levels to transactions based on multiple factors, enabling automated blocking of high-risk payments while allowing low-risk transactions to proceed smoothly. [22] [23] [24]
Content moderation involves reviewing, classifying, and taking action on content (including user-generated content, advertisements, and company-generated content) that violates a platform's policies, such as explicit nudity, misinformation, graphic violence and any other non compliant materials. [25] [26] Platforms use a combination of automated systems and human reviewers to enforce content policies at scale before the content is live, proactive review after the content is live, and reactive review after the content is live. [4] [27]
Child safety in Trust & Safety contexts typically refers to the prevention, detection, and reporting of online child sexual exploitation and abuse and represents a critical function in the field. [28] [29] [30] Platforms deploy specialized detection systems to identify child sexual abuse material (CSAM), including hash-matching technologies that detect known CSAM images and machine learning classifiers designed to discover previously unknown material. [31] Child safety also includes the detection and disruption of grooming and solicitation of minors by adults. [32]
Platform manipulation includes detecting bot networks, spam operations, and coordinated inauthentic behavior designed to artificially amplify content or manipulate public discourse. Detecting coordinated campaigns, including sock puppet attacks, requires understanding evolving adversarial tactics and identifying patterns across multiple accounts. [33] [34]
Regulatory compliance has increasingly become a distinct T&S function as regulatory frameworks have expanded globally. This includes managing copyright takedown requests under laws like the Digital Millennium Copyright Act (DMCA), implementing requirements under the EU's Digital Services Act (DSA), and responding to law enforcement requests. [35] The organization and structure of this function differs by company, with some teams being embedded in Trust & Safety or kept separate in legal or compliance departments. [36]
Platform policies are the rules and standards that govern user behavior and content on online platforms. Most platforms distinguish between public-facing documents such as community guidelines or terms of service that describe acceptable use, and internal enforcement materials, which provide moderators with detailed instructions for applying those policies. [37]
In the early stages of an online service, platform policies are often written by founders or early engineering and operations teams. [37] As platforms grow, responsibility for policy development typically shifts to dedicated Legal, Policy, or Trust & Safety departments. [38] Researchers have noted that these policies are shaped by a range of factors, [27] including the platform's stated purpose, regulatory requirements, and business priorities. [37] Policy development can also shift rapidly and involve continuous iteration in response to emerging circumstances. [4] Policy frameworks tend to evolve over time as companies develop new features and respond to user feedback, enforcement data, public controversies, and stakeholder pressure. [39] [40] [41] Approaches to policy-setting differ widely. On centralized commercial platforms, rules are typically written and enforced by internal staff, whereas some decentralized or community-based platforms distribute policymaking to volunteer moderators or user councils. [42]
Enforcement approaches vary across platforms. Some adopt rules-based systems with standardized responses to specific violations, while others implement context-dependent frameworks that consider user intent, cultural norms, and potential harm. [43] [44] Regardless of the model or approach, enforcement efforts generally have two goals: consistent application of rules and the ability to implement them at scale. [45]
Trust and Safety operations combine human reviewers with automated systems to evaluate content, accounts, and behavior against platform policies. [5] As digital platforms scaled globally with increasing needs for global scale and language support, [46] Business Process Outsourcing (BPO) firms have become instrumental, providing large teams of trained moderators usually based in regions like Southeast Asia, Eastern Europe, and Latin America. This model of commercial content moderation is used by large companies such as Facebook, [47] TikTok, [48] and Google as well as smaller platforms such as Pinterest, Snapchat, and Bluesky. Some platforms like Discord and Reddit rely on a mix of moderators employed by the platform as well as volunteer moderators. [49] The operating model differs by company, depending on the size of the moderation cost and impact of brand risk. [50]
Studies on moderator labor conditions reveal significant psychological costs, [51] with reviewers experiencing trauma, burnout, and mental health impacts from sustained exposure to graphic violence, child abuse imagery, and other harmful content. [5] [52]
Automated detection systems enable platforms to identify potential policy violations at scales exceeding human capacity. [53] These technologies include hash-matching systems such as PhotoDNA, PDQ, and CSAI Match that identify known illegal content, such as CSAM and terrorism and violent extermism, through digital fingerprinting; [54] machine learning classifiers that analyze visual, textual, and behavioral patterns; [55] natural language processing tools for analyzing context and meaning; [56] and network analysis systems that detect coordinated behavior patterns. [57] Platforms integrate detection technologies with case management systems that route flagged content into review queues, assign priority levels, track enforcement decisions, and manage user appeals. [44]
Technical infrastructure also includes integration with external databases maintained by organizations including the National Center for Missing & Exploited Children (NCMEC) and intelligence sharing programs like Project Lantern of the Technology Coalition, facilitating information sharing across platforms and with dedicated nonprofit organizations tasked with investigating specific harms. [58] Internal enforcement guidelines are typically confidential, though leaked documents have occasionally provided public insight into implementation practices. [59] [60] [61]