| YARA | |
|---|---|
| Designed by | Victor Alvarez |
| First appeared | 2013 |
| Stable release | |
| Filename extensions | .yara |
| Website | virustotal |
YARA is a tool primarily used in malware research and detection.
It provides a rule-based approach to create descriptions of malware families based on regular expression, textual or binary patterns. A description is essentially a YARA rule name, where these rules consist of sets of strings and a Boolean expression. [2]
YARA was originally developed by Victor Alvarez of VirusTotal and released on GitHub in 2013. [3] The name is an abbreviation of YARA: Another Recursive Acronym or Yet Another Ridiculous Acronym. [4] In 2024, Alvarez announced that YARA would be superseded by a rewrite called YARA-X, written in Rust. [5] A first stable version of YARA-X was released in June 2025, marking the passage of the original YARA into maintenance mode. [6]
YARA by default comes with modules to process PE, ELF analysis, as well as support for the open-source Cuckoo sandbox.
Research has explored automatic construction of YARA rules from examples. One system, AutoYara, builds rules from a small set of malware files by extracting large byte n-grams and pruning generic features using entropy filters trained on a mixed benign and malicious corpus. [7]
AutoYara then applies an adaptive spectral co-clustering procedure to files and features, followed by probabilistic clustering to select groups of n-grams that tend to co-occur within subsets of the samples. The resulting rule is expressed as a disjunction of conjunctions, where strings inside each bicluster are combined with logical AND, and the biclusters are combined with logical OR. The method also chooses a threshold so that a subset of strings in a clause may be sufficient to match, which helps control false positives while keeping useful coverage. [7]
In evaluations on large malware families and on VirusTotal Retro Hunt, the approach produced rules with very low false positive rates and practical true positive rates, and in several cases matched or exceeded human-written rules. A study with industry analysts reported time savings of 44 to 86 percent when the automatically generated rules were used as starting points for rule development. [7]