MAC address anonymization

Last updated

MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google, [1] Apple [2] and CrowdVision [3] - which track users' movements via their computer hardware - to simultaneously preserve the identities of the people they are tracking, while tracking the hardware itself.

Contents

Flawed approaches

Simple hashing

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits). [4] An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e, [5] an entirely different hash due to the avalanche effect.

The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.

For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.

In 2018, academics found that with modern computing equipment with the ability to calculate 6 billion MD5 hashes and 844 million SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in: [6]

Truncation

Another approach that has been tested is truncating the MAC Address by removing the Organizationally unique identifier (the first 24 bits of the 48 bit MAC Address). [7] However, as only 0.1% of the total Organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit. [8] Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in use - allowing the device type to be identified with only a small part of the MAC Address. [9]

Ali and Dyo approach

Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics. [10] In particular, Junade Ali and Vladimir Dyo developed an approach which works by: [11]

  1. Using computationally expensive hash functions like Bcrypt to prevent background knowledge attacks
  2. Truncating the resulting hash to achieve K-anonymity

The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the Birthday Paradox. [12] Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision p can be calculated by where there are m MAC Addresses and n possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".

Related Research Articles

<span class="mw-page-title-main">HMAC</span> Computer communications authentication algorithm

In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message. An HMAC is a type of keyed hash function that can also be used in a key derivation scheme or a key stretching scheme.

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.

<span class="mw-page-title-main">Ron Rivest</span> American cryptographer

Ronald Linn Rivest is an American cryptographer and computer scientist whose work has spanned the fields of algorithms and combinatorics, cryptography, machine learning, and election integrity. He is an Institute Professor at the Massachusetts Institute of Technology (MIT), and a member of MIT's Department of Electrical Engineering and Computer Science and its Computer Science and Artificial Intelligence Laboratory.

<span class="mw-page-title-main">Universally unique identifier</span> Label used for information in computer systems

A Universally Unique Identifier (UUID) is a 128-bit label used for information in computer systems. The term Globally Unique Identifier (GUID) is also used, mostly in Microsoft systems.

<span class="mw-page-title-main">Cryptographic hash function</span> Hash function that is suitable for use in cryptography

A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:

Simple file verification (SFV) is a file format for storing CRC32 checksums of files to verify the integrity of files. SFV is used to verify that a file has not been corrupted, but it does not otherwise verify the file's authenticity. The .sfv file extension is usually used for SFV files.

File verification is the process of using an algorithm for verifying the integrity of a computer file, usually by checksum. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

A rainbow table is a precomputed table for caching the outputs of a cryptographic hash function, usually for cracking password hashes. Passwords are typically stored not in plain text form, but as hash values. If such a database of hashed passwords falls into the hands of attackers, they can use a precomputed rainbow table to recover the plaintext passwords. A common defense against this attack is to compute the hashes using a key derivation function that adds a "salt" to each password before hashing it, with different passwords receiving different salts, which are stored in plain text along with the hash.

<span class="mw-page-title-main">Aircrack-ng</span> Software suite

Aircrack-ng is a network software suite consisting of a detector, packet sniffer, WEP and WPA/WPA2-PSK cracker and analysis tool for 802.11 wireless LANs. It works with any wireless network interface controller whose driver supports raw monitoring mode and can sniff 802.11a, 802.11b and 802.11g traffic. Packages are released for Linux and Windows.

In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, "thumbprint" is used instead of "fingerprint."

Peer-to-peer file sharing (P2P) systems like Gnutella, KaZaA, and eDonkey/eMule, have become extremely popular in recent years, with the estimated user population in the millions. An academic research paper analyzed Gnutella and eMule protocols and found weaknesses in the protocol; many of the issues found in these networks are fundamental and probably common on other P2P networks. Users of file sharing networks, such as eMule and Gnutella, are subject to monitoring of their activity. Clients may be tracked by IP address, DNS name, software version they use, files they share, queries they initiate, and queries they answer to. Clients may also share their private files to the network without notice due to inappropriate settings.

Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as either hard or soft privacy technologies.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm


In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

<span class="mw-page-title-main">Google Safe Browsing</span> Service that warns about malicious URLs

Google Safe Browsing is a service from Google that warns users when they attempt to navigate to a dangerous website or download dangerous files. Safe Browsing also notifies webmasters when their websites are compromised by malicious actors and helps them diagnose and resolve the problem. This protection works across Google products and is claimed to “power safer browsing experiences across the Internet”. It lists URLs for web resources that contain malware or phishing content. Browsers like Google Chrome, Safari, Firefox, Vivaldi, Brave, and GNOME Web use these lists from Google Safe Browsing to check pages against potential threats. Google also provides a public API for the service.

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalenius.

Monero is a cryptocurrency which uses a blockchain with privacy-enhancing technologies to obfuscate transactions to achieve anonymity and fungibility. Observers cannot decipher addresses trading Monero, transaction amounts, address balances, or transaction histories.

<span class="mw-page-title-main">Yuval Elovici</span>

Yuval Elovici is a computer scientist. He is a professor in the Department of Software and Information Systems Engineering at Ben-Gurion University of the Negev (BGU), where he is the incumbent of the Davide and Irene Sala Chair in Homeland Security Research. He is the director of the Cyber Security Research Center at BGU and the founder and director of the Telekom Innovation Laboratories at Ben-Gurion University. In addition to his roles at BGU, he also serves as the lab director of Singapore University of Technology and Design’s (SUTD) ST Electronics-SUTD Cyber Security Laboratory, as well as the research director of iTrust. In 2014 he co-founded Morphisec, a start-up company, that develops cyber security mechanisms related to moving target defense.

Spatial cloaking is a privacy mechanism that is used to satisfy specific privacy requirements by blurring users’ exact locations into cloaked regions. This technique is usually integrated into applications in various environments to minimize the disclosure of private information when users request location-based service. Since the database server does not receive the accurate location information, a set including the satisfying solution would be sent back to the user. General privacy requirements include K-anonymity, maximum area, and minimum area.

References

  1. "Google Maps Has Been Tracking Your Every Move, And There's A Website To Prove It". Junkee. 15 August 2014. Retrieved 2016-04-10.
  2. "How your iPhone has been tracking your every move in secret | Metro News". Metro.co.uk. 2014-09-28. Retrieved 2016-04-11.
  3. "iInside retail brochure: Leading the market in indoor location techno…". 2014-03-10.{{cite journal}}: Cite journal requires |journal= (help)
  4. echo -n "112233445566"|md5sum = eb341820cd3a3485461a61b1e97d31b1
  5. echo -n "112233445567"|md5sum = 391907146439938c9821856fa181052e
  6. Marx, Matthias; Zimmer, Ephraim; Mueller, Tobias; Blochberger, Maximilian; Federrath, Hannes (2018). Hashing of personally identifiable information is not sufficient. Gesellschaft für Informatik e.V. ISBN   978-3-88579-675-6.
  7. Fuxjaeger, P.; Ruehrup, S.; Paulin, T.; Rainer, B. (Fall 2016). "Towards Privacy-Preserving Wi-Fi Monitoring for Road Traffic Analysis". IEEE Intelligent Transportation Systems Magazine. 8 (3): 63–74. doi:10.1109/MITS.2016.2573341. ISSN   1941-1197. S2CID   2646906.
  8. Demir, Levent; Cunche, Mathieu; Lauradoux, Cédric (2014-06-11). "Analysing the privacy policies of Wi-Fi trackers". Proceedings of the 2014 workshop on physical analytics. Association for Computing Machinery. pp. 39–44. doi:10.1145/2611264.2611266. ISBN   978-1-4503-2825-8. S2CID   2624491.
  9. Martin, Jeremy; Rye, Erik; Beverly, Robert (2016-12-05). "Decomposition of MAC address structure for granular device inference". Proceedings of the 32nd Annual Conference on Computer Security Applications. Los Angeles, California, US: Association for Computing Machinery. pp. 78–88. doi: 10.1145/2991079.2991098 . ISBN   978-1-4503-4771-6.
  10. Feng, X.; Feng, Y.; Dawam, E. S. (August 2020). "Artificial Intelligence Cyber Security Strategy". 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). pp. 328–333. doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00064 . ISBN   978-1-7281-6609-4.
  11. Ali, Junade; Dyo, Vladimir (2020-12-25). Practical Hash-based Anonymity for MAC Addresses. pp. 572–579. doi: 10.5220/0009825105720579 . ISBN   978-989-758-446-6.
  12. Demir, L.; Kumar, A.; Cunche, M.; Lauradoux, C. (2018). "The Pitfalls of Hashing for Privacy". IEEE Communications Surveys and Tutorials. 20 (1): 551–565. doi:10.1109/COMST.2017.2747598. ISSN   1553-877X. S2CID   3571244.