MAC address anonymization

Last updated

MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google, [1] Apple [2] and CrowdVision [3] - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.

Contents

Flawed approaches

Simple hashing

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits). [4] An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e, [5] an entirely different hash due to the avalanche effect.

The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.

For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.

In 2018, academics found that with modern computing equipment with the ability to calculate 6 Giga MD5 hashes and 844 Mega SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in: [6]

Truncation

Another approach that has been tested is truncating the MAC Address by removing the Organizationally unique identifier (the first 24 bits of the 48 bit MAC Address). [7] However, as only 0.1% of the total Organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit. [8] Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in use - allowing the device type to be identified with only a small part of the MAC Address. [9]

Ali and Dyo approach

Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics. [10] In particular, Junade Ali and Vladimir Dyo developed an approach which works by: [11]

  1. Using computationally expensive hash functions like Bcrypt to prevent background knowledge attacks
  2. Truncating the resulting hash to achieve K-anonymity

The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the Birthday Paradox. [12] Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision p can be calculated by where there are m MAC Addresses and n possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".

Related Research Articles

<span class="mw-page-title-main">HMAC</span> Computer communications hash algorithm

In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message.

<span class="mw-page-title-main">Cryptographic hash function</span> Hash function that is suitable for use in cryptography

A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:

In computer security, a side-channel attack is any attack based on extra information that can be gathered because of the fundamental way a computer protocol or algorithm is implemented, rather than flaws in the design of the protocol or algorithm itself or minor, but potentially devastating, mistakes or oversights in the implementation. Timing information, power consumption, electromagnetic leaks, and sound are examples of extra information which could be exploited to facilitate side-channel attacks.

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

A Sybil attack is a type of attack on a computer network service in which an attacker subverts the service's reputation system by creating a large number of pseudonymous identities and uses them to gain a disproportionately large influence. It is named after the subject of the book Sybil, a case study of a woman diagnosed with dissociative identity disorder. The name was suggested in or before 2002 by Brian Zill at Microsoft Research. The term pseudospoofing had previously been coined by L. Detweiler on the Cypherpunks mailing list and used in the literature on peer-to-peer systems for the same class of attacks prior to 2002, but this term did not gain as much influence as "Sybil attack".

Peer-to-peer file sharing (P2P) systems like Gnutella, KaZaA, and eDonkey/eMule, have become extremely popular in recent years, with the estimated user population in the millions. An academic research paper analyzed Gnutella and eMule protocols and found weaknesses in the protocol; many of the issues found in these networks are fundamental and probably common on other P2P networks. Users of file sharing networks, such as eMule and Gnutella, are subject to monitoring of their activity. Clients may be tracked by IP address, DNS name, software version they use, files they share, queries they initiate, and queries they answer to. Clients may also share their private files to the network without notice due to inappropriate settings.

The Internet of things (IoT) describes physical objects with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. The Internet of things (IoT) falls under the Electronics & Communication and Computer Science Engineering. Internet of things has been considered a misnomer because devices do not need to be connected to the public internet, they only need to be connected to a network, and be individually addressable.

Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as hard and soft privacy technologies.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

<span class="mw-page-title-main">Google Safe Browsing</span> Service that warns about malicious URLs

Google Safe Browsing is a service from Google that warns users when they attempt to navigate to a dangerous website or download dangerous files. Safe Browsing also notifies webmasters when their websites are compromised by malicious actors and helps them diagnose and resolve the problem. This protection works across Google products and is claimed to “power safer browsing experiences across the Internet”. It lists URLs for web resources that contain malware or phishing content. Browsers like Google Chrome, Safari, Firefox, Vivaldi, Brave and GNOME Web use these lists from Google Safe Browsing to check pages against potential threats. Google also provides a public API for the service.

Zerocoin is a privacy protocol proposed in 2013 by Johns Hopkins University professor Matthew D. Green and his graduate students, Ian Miers and Christina Garman. It was designed as an extension to the Bitcoin protocol that would improve Bitcoin transactions' anonymity by having coin-mixing capabilities natively built into the protocol. Zerocoin is not currently compatible with Bitcoin.

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalenius.

l-diversity, also written as -diversity, is a form of group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or mining algorithms in order to gain some privacy. The l-diversity model is an extension of the k-anonymity model which reduces the granularity of data representation using techniques including generalization and suppression such that any given record maps onto at least k-1 other records in the data. The l-diversity model handles some of the weaknesses in the k-anonymity model where protected identities to the level of k-individuals is not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity. The l-diversity model adds the promotion of intra-group diversity for sensitive values in the anonymization mechanism.

Monero is a cryptocurrency which uses a blockchain with privacy-enhancing technologies to obfuscate transactions to achieve anonymity and fungibility. Observers cannot decipher addresses trading Monero, transaction amounts, address balances, or transaction histories.

<span class="mw-page-title-main">Yuval Elovici</span>

Yuval Elovici is a computer scientist. He is a professor in the Department of Software and Information Systems Engineering at Ben-Gurion University of the Negev (BGU), where he is the incumbent of the Davide and Irene Sala Chair in Homeland Security Research. He is the director of the Cyber Security Research Center at BGU and the founder and director of the Telekom Innovation Laboratories at Ben-Gurion University. In addition to his roles at BGU, he also serves as the lab director of Singapore University of Technology and Design’s (SUTD) ST Electronics-SUTD Cyber Security Laboratory, as well as the research director of iTrust. In 2014 he co-founded Morphisec, a start-up company, that develops cyber security mechanisms related to moving target defense.

A wireless onion router is a router that uses Tor to connect securely to a network. The onion router allows the user to connect to the internet anonymously creating an anonymous connection. Tor works using an overlaid network which is free throughout the world, this overlay network is created by using numerous relay points created using volunteer which helps the user hide personal information behind layers of encrypted data like layers of an onion. Routers are being created using Raspberry Pi adding a wireless module or using its own inbuilt wireless module in the later versions.

Ride sharing networks face issues of user privacy like other online platforms do. Concerns surrounding the apps include the security of financial details, and privacy of personal details and location. Privacy concerns can also rise during the ride as some drivers choose to use passenger facing cameras for their own security. As the use of ride sharing services become more widespread so do the privacy issues associated with them.

Spatial cloaking is a privacy mechanism that is used to satisfy specific privacy requirements by blurring users’ exact locations into cloaked regions. This technique is usually integrated into applications in various environments to minimize the disclosure of private information when users request location-based service. Since the database server does not receive the accurate location information, a set including the satisfying solution would be sent back to the user. General privacy requirements include K-anonymity, maximum area, and minimum area.

<span class="mw-page-title-main">Junade Ali</span> British computer scientist and cybersecurity researcher

Junade Ali is a British computer scientist known for research in cybersecurity.

References

  1. "Google Maps Has Been Tracking Your Every Move, And There's A Website To Prove It". Junkee. 15 August 2014. Retrieved 2016-04-10.
  2. "How your iPhone has been tracking your every move in secret | Metro News". Metro.co.uk. 2014-09-28. Retrieved 2016-04-11.
  3. "iInside retail brochure: Leading the market in indoor location techno…". 2014-03-10.{{cite journal}}: Cite journal requires |journal= (help)
  4. echo -n "112233445566"|md5sum = eb341820cd3a3485461a61b1e97d31b1
  5. echo -n "112233445567"|md5sum = 391907146439938c9821856fa181052e
  6. Marx, Matthias; Zimmer, Ephraim; Mueller, Tobias; Blochberger, Maximilian; Federrath, Hannes (2018). Hashing of personally identifiable information is not sufficient. Gesellschaft für Informatik e.V. ISBN   978-3-88579-675-6.
  7. Fuxjaeger, P.; Ruehrup, S.; Paulin, T.; Rainer, B. (Fall 2016). "Towards Privacy-Preserving Wi-Fi Monitoring for Road Traffic Analysis". IEEE Intelligent Transportation Systems Magazine. 8 (3): 63–74. doi:10.1109/MITS.2016.2573341. ISSN   1941-1197. S2CID   2646906.
  8. Demir, Levent; Cunche, Mathieu; Lauradoux, Cédric (2014-06-11). "Analysing the privacy policies of Wi-Fi trackers". Proceedings of the 2014 workshop on physical analytics. Association for Computing Machinery. pp. 39–44. doi:10.1145/2611264.2611266. ISBN   978-1-4503-2825-8. S2CID   2624491.
  9. Martin, Jeremy; Rye, Erik; Beverly, Robert (2016-12-05). "Decomposition of MAC address structure for granular device inference". Proceedings of the 32nd Annual Conference on Computer Security Applications. Los Angeles, California, US: Association for Computing Machinery. pp. 78–88. doi: 10.1145/2991079.2991098 . ISBN   978-1-4503-4771-6.
  10. Feng, X.; Feng, Y.; Dawam, E. S. (August 2020). "Artificial Intelligence Cyber Security Strategy". 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). pp. 328–333. doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00064 . ISBN   978-1-7281-6609-4.
  11. Ali, Junade; Dyo, Vladimir (2020-12-25). Practical Hash-based Anonymity for MAC Addresses. pp. 572–579. doi: 10.5220/0009825105720579 . ISBN   978-989-758-446-6.
  12. Demir, L.; Kumar, A.; Cunche, M.; Lauradoux, C. (2018). "The Pitfalls of Hashing for Privacy". IEEE Communications Surveys and Tutorials. 20 (1): 551–565. doi:10.1109/COMST.2017.2747598. ISSN   1553-877X. S2CID   3571244.