MAC address anonymization

Last updated

MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google, [1] Apple [2] and CrowdVision [3] - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.

Contents

Flawed approaches

Simple hashing

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits). [4] An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e, [5] an entirely different hash due to the avalanche effect.

The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.

For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.

In 2018, academics found that with modern computing equipment with the ability to calculate 6 Giga MD5 hashes and 844 Mega SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in: [6]

Truncation

Another approach that has been tested is truncating the MAC Address by removing the Organizationally unique identifier (the first 24 bits of the 48 bit MAC Address). [7] However, as only 0.1% of the total Organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit. [8] Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in use - allowing the device type to be identified with only a small part of the MAC Address. [9]

Ali and Dyo approach

Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics. [10] In particular, Junade Ali and Vladimir Dyo developed an approach which works by: [11]

  1. Using computationally expensive hash functions like Bcrypt to prevent background knowledge attacks
  2. Truncating the resulting hash to achieve K-anonymity

The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the Birthday Paradox. [12] Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision p can be calculated by where there are m MAC Addresses and n possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".

Related Research Articles

<span class="mw-page-title-main">HMAC</span> Computer communications hash algorithm

In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message. An HMAC is a type of keyed hash function that can also be used in a key derivation scheme or a key stretching scheme.

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.

<span class="mw-page-title-main">Hash collision</span> Hash function phenomenon

In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits.

<span class="mw-page-title-main">Cryptographic hash function</span> Hash function that is suitable for use in cryptography

A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

A rainbow table is a precomputed table for caching the outputs of a cryptographic hash function, usually for cracking password hashes. Passwords are typically stored not in plain text form, but as hash values. If such a database of hashed passwords falls into the hands of attackers, they can use a precomputed rainbow table to recover the plaintext passwords. A common defense against this attack is to compute the hashes using a key derivation function that adds a "salt" to each password before hashing it, with different passwords receiving different salts, which are stored in plain text along with the hash.

<span class="mw-page-title-main">Aircrack-ng</span> Software suite

Aircrack-ng is a network software suite consisting of a detector, packet sniffer, WEP and WPA/WPA2-PSK cracker and analysis tool for 802.11 wireless LANs. It works with any wireless network interface controller whose driver supports raw monitoring mode and can sniff 802.11a, 802.11b and 802.11g traffic. Packages are released for Linux and Windows.

In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, "thumbprint" is used instead of "fingerprint."

A Sybil attack is a type of attack on a computer network service in which an attacker subverts the service's reputation system by creating a large number of pseudonymous identities and uses them to gain a disproportionately large influence. It is named after the subject of the book Sybil, a case study of a woman diagnosed with dissociative identity disorder. The name was suggested in or before 2002 by Brian Zill at Microsoft Research. The term pseudospoofing had previously been coined by L. Detweiler on the Cypherpunks mailing list and used in the literature on peer-to-peer systems for the same class of attacks prior to 2002, but this term did not gain as much influence as "Sybil attack".

Peer-to-peer file sharing (P2P) systems like Gnutella, KaZaA, and eDonkey/eMule, have become extremely popular in recent years, with the estimated user population in the millions. An academic research paper analyzed Gnutella and eMule protocols and found weaknesses in the protocol; many of the issues found in these networks are fundamental and probably common on other P2P networks. Users of file sharing networks, such as eMule and Gnutella, are subject to monitoring of their activity. Clients may be tracked by IP address, DNS name, software version they use, files they share, queries they initiate, and queries they answer to. Clients may also share their private files to the network without notice due to inappropriate settings.

Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as either hard or soft privacy technologies.

Wi-Fi positioning system is a geolocation system that uses the characteristics of nearby Wi‑Fi access points to discover where a device is located.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

Hashcat is a password recovery tool. It had a proprietary code base until 2015, but was then released as open source software. Versions are available for Linux, macOS, and Windows. Examples of hashcat-supported hashing algorithms are LM hashes, MD4, MD5, SHA-family and Unix Crypt formats as well as algorithms used in MySQL and Cisco PIX.

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalenius.

l-diversity, also written as -diversity, is a form of group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or mining algorithms in order to gain some privacy. The l-diversity model is an extension of the k-anonymity model which reduces the granularity of data representation using techniques including generalization and suppression such that any given record maps onto at least k-1 other records in the data. The l-diversity model handles some of the weaknesses in the k-anonymity model where protected identities to the level of k-individuals is not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity. The l-diversity model adds the promotion of intra-group diversity for sensitive values in the anonymization mechanism.

Monero is a cryptocurrency which uses a blockchain with privacy-enhancing technologies to obfuscate transactions to achieve anonymity and fungibility. Observers cannot decipher addresses trading Monero, transaction amounts, address balances, or transaction histories.

<span class="mw-page-title-main">Yuval Elovici</span>

Yuval Elovici is a computer scientist. He is a professor in the Department of Software and Information Systems Engineering at Ben-Gurion University of the Negev (BGU), where he is the incumbent of the Davide and Irene Sala Chair in Homeland Security Research. He is the director of the Cyber Security Research Center at BGU and the founder and director of the Telekom Innovation Laboratories at Ben-Gurion University. In addition to his roles at BGU, he also serves as the lab director of Singapore University of Technology and Design’s (SUTD) ST Electronics-SUTD Cyber Security Laboratory, as well as the research director of iTrust. In 2014 he co-founded Morphisec, a start-up company, that develops cyber security mechanisms related to moving target defense.

Ride sharing networks face issues of user privacy like other online platforms do. Concerns surrounding the apps include the security of financial details, and privacy of personal details and location. Privacy concerns can also rise during the ride as some drivers choose to use passenger facing cameras for their own security. As the use of ride sharing services become more widespread so do the privacy issues associated with them.

Spatial cloaking is a privacy mechanism that is used to satisfy specific privacy requirements by blurring users’ exact locations into cloaked regions. This technique is usually integrated into applications in various environments to minimize the disclosure of private information when users request location-based service. Since the database server does not receive the accurate location information, a set including the satisfying solution would be sent back to the user. General privacy requirements include K-anonymity, maximum area, and minimum area.

References

  1. "Google Maps Has Been Tracking Your Every Move, And There's A Website To Prove It". Junkee. 15 August 2014. Retrieved 2016-04-10.
  2. "How your iPhone has been tracking your every move in secret | Metro News". Metro.co.uk. 2014-09-28. Retrieved 2016-04-11.
  3. "iInside retail brochure: Leading the market in indoor location techno…". 2014-03-10.{{cite journal}}: Cite journal requires |journal= (help)
  4. echo -n "112233445566"|md5sum = eb341820cd3a3485461a61b1e97d31b1
  5. echo -n "112233445567"|md5sum = 391907146439938c9821856fa181052e
  6. Marx, Matthias; Zimmer, Ephraim; Mueller, Tobias; Blochberger, Maximilian; Federrath, Hannes (2018). Hashing of personally identifiable information is not sufficient. Gesellschaft für Informatik e.V. ISBN   978-3-88579-675-6.
  7. Fuxjaeger, P.; Ruehrup, S.; Paulin, T.; Rainer, B. (Fall 2016). "Towards Privacy-Preserving Wi-Fi Monitoring for Road Traffic Analysis". IEEE Intelligent Transportation Systems Magazine. 8 (3): 63–74. doi:10.1109/MITS.2016.2573341. ISSN   1941-1197. S2CID   2646906.
  8. Demir, Levent; Cunche, Mathieu; Lauradoux, Cédric (2014-06-11). "Analysing the privacy policies of Wi-Fi trackers". Proceedings of the 2014 workshop on physical analytics. Association for Computing Machinery. pp. 39–44. doi:10.1145/2611264.2611266. ISBN   978-1-4503-2825-8. S2CID   2624491.
  9. Martin, Jeremy; Rye, Erik; Beverly, Robert (2016-12-05). "Decomposition of MAC address structure for granular device inference". Proceedings of the 32nd Annual Conference on Computer Security Applications. Los Angeles, California, US: Association for Computing Machinery. pp. 78–88. doi: 10.1145/2991079.2991098 . ISBN   978-1-4503-4771-6.
  10. Feng, X.; Feng, Y.; Dawam, E. S. (August 2020). "Artificial Intelligence Cyber Security Strategy". 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). pp. 328–333. doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00064 . ISBN   978-1-7281-6609-4.
  11. Ali, Junade; Dyo, Vladimir (2020-12-25). Practical Hash-based Anonymity for MAC Addresses. pp. 572–579. doi: 10.5220/0009825105720579 . ISBN   978-989-758-446-6.
  12. Demir, L.; Kumar, A.; Cunche, M.; Lauradoux, C. (2018). "The Pitfalls of Hashing for Privacy". IEEE Communications Surveys and Tutorials. 20 (1): 551–565. doi:10.1109/COMST.2017.2747598. ISSN   1553-877X. S2CID   3571244.