Part of a series on |
Citation metrics |
---|
The h-index is an author-level metric that measures both the productivity and citation impact of the publications, initially used for an individual scientist or scholar. The h-index correlates with success indicators such as winning the Nobel Prize, being accepted for research fellowships and holding positions at top universities. [1] The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications. The index has more recently been applied to the productivity and impact of a scholarly journal [2] as well as a group of scientists, such as a department or university or country. [3] The index was suggested in 2005 by Jorge E. Hirsch, a physicist at UC San Diego, as a tool for determining theoretical physicists' relative quality [4] and is sometimes called the Hirsch index or Hirsch number.
The h-index is defined as the maximum value of h such that the given author/journal has published at least h papers that have each been cited at least h times. [4] [5] The index is designed to improve upon simpler measures such as the total number of citations or publications. The index works best when comparing scholars working in the same field, since citation conventions differ widely among different fields. [6]
The h-index is the largest h such that h articles have at least h citations each. For example, if an author has five publications, with 9, 7, 6, 2, and 1 citations (ordered from greatest to least), then the author's h-index is 3, because the author has three publications with 3 or more citations. However, the author does not have four publications with 4 or more citations.
Clearly, an author's h-index can only be as great as their number of publications. For example, an author with only one publication can have a maximum h-index of 1 (if their publication has 1 or more citations). On the other hand, an author with many publications, each with only 1 citation, would also have an h-index of 1.
Formally, if f is the function that corresponds to the number of citations for each publication, we compute the h-index as follows: First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position). For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h-index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3 citations, then the index is 3 (i.e. the 3rd position) because the fourth paper has only 3 citations.
If we have the function f ordered in decreasing order from the largest value to the lowest one, we can compute the h-index as follows:
The Hirsch index is analogous to the Eddington number, an earlier metric used for evaluating cyclists. [7] h-index is also related to Sugeno integral and Ky Fan metric. [8] The h-index serves as an alternative to more traditional journal impact factor metrics in the evaluation of the impact of the work of a particular researcher. Because only the most highly cited articles contribute to the h-index, its determination is a simpler process. Hirsch has demonstrated that h has high predictive value for whether a scientist has won honors like National Academy membership or the Nobel Prize. The h-index grows as citations accumulate and thus it depends on the "academic age" of a researcher.
The h-index can be manually determined by using citation databases or using automatic tools. Subscription-based databases such as Scopus and the Web of Science provide automated calculators. From July 2011 Google have provided an automatically calculated h-index and i10-index within their own Google Scholar profile. [9] In addition, specific databases, such as the INSPIRE-HEP database can automatically calculate the h-index for researchers working in high energy physics.
Each database is likely to produce a different h for the same scholar, because of different coverage. [10] A detailed study showed that the Web of Science has strong coverage of journal publications, but poor coverage of high impact conferences. Scopus has better coverage of conferences, but poor coverage of publications prior to 1996; Google Scholar has the best coverage of conferences and most journals (though not all), but like Scopus has limited coverage of pre-1990 publications. [11] [12] The exclusion of conference proceedings papers is a particular problem for scholars in computer science, where conference proceedings are considered an important part of the literature. [13] Google Scholar has been criticized for producing "phantom citations", including gray literature in its citation counts, and failing to follow the rules of Boolean logic when combining search terms. [14] For example, the Meho and Yang study found that Google Scholar identified 53% more citations than Web of Science and Scopus combined, but noted that because most of the additional citations reported by Google Scholar were from low-impact journals or conference proceedings, they did not significantly alter the relative ranking of the individuals. It has been suggested that in order to deal with the sometimes wide variation in h for a single academic measured across the possible citation databases, one should assume false negatives in the databases are more problematic than false positives and take the maximum h measured for an academic. [15]
Little systematic investigation has been done on how the h-index behaves over different institutions, nations, times and academic fields. [16] Hirsch suggested that, for physicists, a value for h of about 12 might be typical for advancement to tenure (associate professor) at major [US] research universities. A value of about 18 could mean a full professorship, 15–20 could mean a fellowship in the American Physical Society, and 45 or higher could mean membership in the United States National Academy of Sciences. [17] Hirsch estimated that after 20 years a "successful scientist" would have an h-index of 20, an "outstanding scientist" would have an h-index of 40, and a "truly unique" individual would have an h-index of 60. [4]
For the most highly cited scientists in the period 1983–2002, Hirsch identified the top 10 in the life sciences (in order of decreasing h): Solomon H. Snyder, h = 191; David Baltimore, h = 160; Robert C. Gallo, h = 154; Pierre Chambon, h = 153; Bert Vogelstein, h = 151; Salvador Moncada, h = 143; Charles A. Dinarello, h = 138; Tadamitsu Kishimoto, h = 134; Ronald M. Evans, h = 127; and Ralph L. Brinster, h = 126. Among 36 new inductees in the National Academy of Sciences in biological and biomedical sciences in 2005, the median h-index was 57. [4] However, Hirsch noted that values of h will vary among disparate fields. [4]
Among the 22 scientific disciplines listed in the Essential Science Indicators citation thresholds (thus excluding non-science academics), physics has the second most citations after space science. [18] During the period January 1, 2000 – February 28, 2010, a physicist had to receive 2073 citations to be among the most cited 1% of physicists in the world. [18] The threshold for space science is the highest (2236 citations), and physics is followed by clinical medicine (1390) and molecular biology & genetics (1229). Most disciplines, such as environment/ecology (390), have fewer scientists, fewer papers, and fewer citations. [18] Therefore, these disciplines have lower citation thresholds in the Essential Science Indicators, with the lowest citation thresholds observed in social sciences (154), computer science (149), and multidisciplinary sciences (147). [18]
Numbers are very different in social science disciplines: The Impact of the Social Sciences team at London School of Economics found that social scientists in the United Kingdom had lower average h-indices. The h-indices for ("full") professors, based on Google Scholar data ranged from 2.8 (in law), through 3.4 (in political science), 3.7 (in sociology), 6.5 (in geography) and 7.6 (in economics). On average across the disciplines, a professor in the social sciences had an h-index about twice that of a lecturer or a senior lecturer, though the difference was the smallest in geography. [19]
Hirsch intended the h-index to address the main disadvantages of other bibliometric indicators. The total number of papers metric does not account for the quality of scientific publications. The total number of citations metric, on the other hand, can be heavily affected by participation in a single publication of major influence (for instance, methodological papers proposing successful new techniques, methods or approximations, which can generate a large number of citations). The h-index is intended to measure simultaneously the quality and quantity of scientific output. Until 2010 the h-index showed a Kendall's correlation of 0.3 to 0.4 with scientific awards. [20]
There are a number of situations in which h may provide misleading information about a scientist's output. [21] The correlation between h-index and scientific awards dropped significantly since 2010 after the widespread usage of h-index, [20] following Goodhart's law. The decrease of correlation is partially attributed to the spread of hyperauthorship with more than 100 coauthors per paper.
Some of the following failures are not exclusive to the h-index but rather shared with other author-level metrics:
Weaknesses apply to the purely quantitative calculation of scientific or academic output. Like other metrics that count citations, the h-index can be manipulated by coercive citation, a practice in which an editor of a journal forces authors to add spurious citations to their own articles before the journal will agree to publish it. [27] [28] The h-index can be manipulated through self-citations, [29] [30] [31] [32] [33] and if based on Google Scholar output, then even computer-generated documents can be used for that purpose, e.g. using SCIgen. [34] The h-index can be also manipulated by hyperauthorship. Recent research shows clearly that the correlation of the h-index with awards that indicate recognition by the scientific community has substantially declined. [35]
The h-index has been found in one study to have slightly less predictive accuracy and precision than the simpler measure of mean citations per paper. [36] However, this finding was contradicted by another study by Hirsch. [37] The h-index does not provide a significantly more accurate measure of impact than the total number of citations for a given scholar. In particular, by modeling the distribution of citations among papers as a random integer partition and the h-index as the Durfee square of the partition, Yong [38] arrived at the formula , where N is the total number of citations, which, for mathematics members of the National Academy of Sciences, turns out to provide an accurate (with errors typically within 10–20 percent) approximation of h-index in most cases.
Various proposals to modify the h-index in order to emphasize different features have been made. [39] [40] [41] [42] [43] [44] [20] Many of these variants, such as g-index, are highly correlated with the original h-index, which has led some researchers to consider them redundant. [45] One metric which is not highly correlated with h-index and is correlated with scientific awards is h-frac. [20]
Indices similar to the h-index have been applied outside of author or journal evaluation.
The h-index has been applied to Internet Media, such as YouTube channels. It is defined as the number of videos with ≥ h × 105 views. When compared with a video creator's total view count, the h-index and g-index better capture both productivity and impact in a single metric. [46]
A successive Hirsch-type index for institutions has also been devised. [47] [48] A scientific institution has a successive Hirsch-type index of i when at least i researchers from that institution have an h-index of at least i.
A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of acknowledging the relevance of the works of others to the topic of discussion at the spot where the citation appears.
Scientific citation is providing detailed reference in a scientific publication, typically a paper or book, to previous published communications which have a bearing on the subject of the new publication. The purpose of citations in original work is to allow readers of the paper to refer to cited work to assist them in judging the new work, source background information vital for future development, and acknowledge the contributions of earlier workers. Citations in, say, a review paper bring together many sources, often recent, in one place.
A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebrew religious literature. Legal citation indexes are found in the 18th century and were made popular by citators such as Shepard's Citations (1873). In 1961, Eugene Garfield's Institute for Scientific Information (ISI) introduced the first citation index for papers published in academic journals, first the Science Citation Index (SCI), and later the Social Sciences Citation Index (SSCI) and the Arts and Humanities Citation Index (AHCI). American Chemical Society converted its printed Chemical Abstract Service into internet-accessible SciFinder in 2008. The first automated citation indexing was done by CiteSeer in 1997 and was patented. Other sources for such data include Google Scholar, Microsoft Academic, Elsevier's Scopus, and the National Institutes of Health's iCite.
The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as indexed by Clarivate's Web of Science.
Bibliometrics is the application of statistical methods to the study of bibliographic data, especially in scientific and library and information science contexts, and is closely associated with scientometrics to the point that both fields largely overlap.
Scientometrics is a subfield of informetrics that studies quantitative aspects of scholarly literature. Major research issues include the measurement of the impact of research papers and academic journals, the understanding of scientific citations, and the use of such measurements in policy and management contexts. In practice there is a significant overlap between scientometrics and other scientific fields such as information systems, information science, science of science policy, sociology of science, and metascience. Critics have argued that overreliance on scientometrics has created a system of perverse incentives, producing a publish or perish environment that leads to low-quality research.
Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citations — links from one document to another document — to reveal properties of the documents. A typical aim would be to identify the most important documents in a collection. A classic example is that of the citations between academic articles and books. For another example, judges of law support their judgements by referring back to judgements made in earlier cases. An additional example is provided by patents which contain prior art, citation of earlier patents relevant to the current claim. The digitization of patent data and increasing computing power have led to a community of practice that uses these citation data to measure innovation attributes, trace knowledge flows, and map innovation networks.
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents.
Citation impact or citation rate is a measure of how many times an academic journal article or book or author is cited by other articles, books or authors. Citation counts are interpreted as measures of the impact or influence of academic work and have given rise to the field of bibliometrics or scientometrics, specializing in the study of patterns of academic impact through citation analysis. The importance of journals can be measured by the average citation rate, the ratio of number of citations to number articles published within a given time period and in a given index, such as the journal impact factor or the citescore. It is used by academic institutions in decisions about academic tenure, promotion and hiring, and hence also used by authors in deciding which journal to publish in. Citation-like measures are also used in other fields that do ranking, such as Google's PageRank algorithm, software metrics, college and university rankings, and business performance indicators.
Journal ranking is widely used in academic circles in the evaluation of an academic journal's impact and quality. Journal rankings are intended to reflect the place of a journal within its field, the relative difficulty of being published in that journal, and the prestige associated with it. They have been introduced as official research evaluation tools in several countries.
ResearcherID is an identifying system for scientific authors. The system was introduced in January 2008 by Thomson Reuters Corporation.
The Eigenfactor score, developed by Jevin West and Carl Bergstrom at the University of Washington, is a rating of the total importance of a scientific journal. Journals are rated according to the number of incoming citations, with citations from highly ranked journals weighted to make a larger contribution to the eigenfactor than those from poorly ranked journals. As a measure of importance, the Eigenfactor score scales with the total impact of a journal. All else equal, journals generating higher impact to the field have larger Eigenfactor scores. Citation metrics like eigenfactor or PageRank-based scores reduce the effect of self-referential groups.
In scholarly and scientific publishing, altmetrics are non-traditional bibliometrics proposed as an alternative or complement to more traditional citation impact metrics, such as impact factor and h-index. The term altmetrics was proposed in 2010, as a generalization of article level metrics, and has its roots in the #altmetrics hashtag. Although altmetrics are often thought of as metrics about articles, they can be applied to people, journals, books, data sets, presentations, videos, source code repositories, web pages, etc.
Johan Lambert Trudo Maria Bollen is a scientist investigating complex systems and networks, the relation between social media and a variety of socio-economic phenomena such as the financial markets, public health, and social well-being, as well as Science of Science with a focus on impact metrics derived from usage data. He presently works as associate professor at the Indiana University School of Informatics of Indiana University Bloomington and a fellow at the SparcS Institute of Wageningen University and Research Centre in the Netherlands. He is best known for his work on scholarly impact metrics, measuring public well-being from large-scale social media data, and correlating Twitter mood to stock market prices.
The Kardashian Index (K-Index), named after media personality Kim Kardashian, is a satirical measure of the discrepancy between a scientist's social media profile and publication record. Proposed by Neil Hall in 2014, the measure compares the number of followers a research scientist has on Twitter to the number of citations they have for their peer-reviewed work.
Author-level metrics are citation metrics that measure the bibliometric impact of individual authors, researchers, academics, and scholars. Many metrics have been developed that take into account varying numbers of factors.
Microsoft Academic was a free internet-based academic search engine for academic publications and literature, developed by Microsoft Research in 2016 as a successor of Microsoft Academic Search. Microsoft Academic was shut down in 2022. Both OpenAlex and The Lens claim to be successors to Microsoft Academic.
The Leiden Manifesto for research metrics (LM) is a list of "ten principles to guide research evaluation", published as a comment in Volume 520, Issue 7548 of Nature, on 22 April 2015. It was formulated by public policy professor Diana Hicks, scientometrics professor Paul Wouters, and their colleagues at the 19th International Conference on Science and Technology Indicators, held between 3–5 September 2014 in Leiden, The Netherlands.
The science-wide author databases of standardized citation indicators is a multidimensional ranking of the world's scientists produced since 2015 by a team of researchers led by John P. A. Ioannidis at Stanford.
Our results suggest that the use of the h-index in ranking scientists should be reconsidered, and that fractional allocation measures such as h-frac provide more robust alternatives.Companion webpage