Gonzalo Navarro

Last updated
Gonzalo Navarro
Born (1969-06-09) June 9, 1969 (age 52)
Alma mater University of Chile

National University of La Plata

Escuela Superior Latinoamericana de Informática
Scientific career
Fields Computer science
Algorithms
Data structures
Data compression
Text searching
Institutions University of Chile
Thesis Approximate Text Searching  (1998)
Doctoral advisor Ricardo Baeza-Yates
Website users.dcc.uchile.cl/~gnavarro/

Gonzalo Navarro Badino (born June 9, 1969) is a full professor of computer science at the University of Chile and ACM Distinguished Member, whose interests include algorithms and data structures, data compression and text searching. He also participates in the Center for Biotechnology and Bioengineering (CeBiB) and the Millennium Institute for Foundational Research on Data (IMFD).. He obtained his PhD at the University of Chile in 1998 under the supervision of Ricardo Baeza-Yates with the thesis Approximate Text Searching, [1] then worked as a post-doctoral researcher with Esko Ukkonen and Maxime Crochemore.

Contents

He is one of the most prolific and highly cited researchers in Latin America, having authored the books Flexible Pattern Matching in Strings [2] and Compact Data Structures, [3] around 25 book chapters, over 160 journal articles and over 240 conference papers. He is editor in chief of the ACM Journal of Experimental Algorithmics (JEA) and a member of the editorial board of Information Systems, and has been guest editor of special issues of ACM SIGSPATIAL, the Journal of Discrete Algorithms, Information Systems and Algorithmica.

He created the Workshop on Compression, Text and Algorithms (WCTA) in 2005 and co-created the conference SISAP in 2008; has chaired or co-chaired SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR 2005 (posters), IFIP TCS 2006, SISAP 2008, SISAP 2012, LATIN 2016, SPIRE 2018 and CPM 2018; served on the steering committees of SPIRE, LATIN and SISAP; and has given around 50 invited talks, including 12 plenary talks and 5 tutorials in international conferences.

Education

He studied for his Licenciate in Informatics (1989–1992) (5 years plus thesis) from Latin American School of Informatics (ESLAI, Argentina). His thesis was: “A Study on Control Structures”. His advisor was Prof. Jorge Aguirre (ESLAI and Universidad de Buenos Aires, Argentina).

He studied for his Licenciate in Informatics (1986–1993) (5 years plus thesis), at the Faculty of Exact Sciences, Universidad Nacional de La Plata (UNLP, Argentina). His thesis was: “MediaCore: A Multimedia Interface Composition Toolkit”, Advisor: Prof. Jorge Sanz (IBM Argentina and Almaden Research Center).

He received a MSc. in computer science (1994–1995), from Faculty of Physics and Mathematical Sciences, Universidad de Chile with Prof. Ricardo Baeza-Yates (Universidad de Chile) as his advisor. His thesis was: “A Language for Queries on Structure and Contents of Textual Databases”.

He received his PhD in computer science (1995–1998), from Faculty of Physics and Mathematical Sciences, Universidad de Chile under advisor: Prof. Ricardo Baeza-Yates (Universidad de Chile). His thesis was: “Approximate Text Searching”.

Awards and distinctions

SPIRE 2001

Although Professor Navarro has organized and participated in a large number of conferences and seminars, his best effort in this direction was without doubt the organization of the 13th International Symposium on String Processing and Information Retrieval (SPIRE 2001), with the support of Ricardo Baeza-Yates, which brought together many professors and students for three days of talks on a boat of the company Skorpios heading to the Laguna San Rafael in Chilean Patagonia. The welcome speech included local tales of pirates and sailors, starting with the sayings neither marry nor depart on a Tuesday (because it brings bad luck) and Tuesday the 13th is a cursed day (with the conference starting on Tuesday, November 13). The conference featured high-quality works and is still known as one of the best of the SPIRE series.

Related Research Articles

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings are found within a larger string or text.

Trie K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

Heaps law

In linguistics, Heaps' law is an empirical law which describes the number of distinct words in a document as a function of the document length. It can be formulated as

Suffix tree Tree containing all suffixes of a given text

In computer science, a suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

The bitap algorithm is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance – if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.

Ricardo Baeza-Yates

Ricardo A. Baeza-Yates is a Chilean-Catalan computer scientist that currently is a Research Professor at the Institute for Experiential AI of Northeastern University in the Silicon Valley campus. He is also part-time professor at Universitat Pompeu Fabra in Barcelona and Universidad de Chile in Santiago. He is an expert member of the Global Partnership on Artificial Intelligence, a member of Spain's Advisory Council on AI, and a member of the Association for Computing Machinery's US Technology Policy Subcommittee on AI and Algorithms.

In computer science, an inverted index is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents. The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems, used on a large scale for example in search engines. Additionally, several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.

Jeffrey Vitter

Jeffrey Scott Vitter is a U.S. computer scientist and academic administrator. Born in 1955 in New Orleans, Vitter has served in several senior higher education administration posts. He is a former chancellor of the University of Mississippi. He assumed the chancellor position on January 1, 2016. His formal investiture to the chancellorship took place on November 10, 2016, at the University of Mississippi's Oxford Campus.

Prof. Athanasios K. Tsakalidis is a Greek computer scientist, a professor at the Graphics, Multimedia and GIS Laboratory, Computer Engineering and Informatics Department (CEID), University of Patras, Greece.

Approximate string matching finding strings that approximately match a pattern

In computer science, approximate string matching is the technique of finding strings that match a pattern approximately. The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.

ACM SIGACT or SIGACT is the Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory, whose purpose is support of research in theoretical computer science. It was founded in 1968 by Patrick C. Fischer.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

In computer science, the all nearest smaller values problem is the following task: for each position in a sequence of numbers, search among the previous positions for the last position that contains a smaller value. This problem can be solved efficiently both by parallel and non-parallel algorithms: Berkman, Schieber & Vishkin (1993), who first identified the procedure as a useful subroutine for other parallel programs, developed efficient algorithms to solve it in the Parallel Random Access Machine model; it may also be solved in linear time on a non-parallel computer using a stack-based algorithm. Later researchers have studied algorithms to solve it in other models of parallel computation.

In computer science, a compressed suffix array is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array. These data structures enable quick search for an arbitrary string with a comparatively small index.

Infobox Template used to collect and present a subset of information about a subject

On wikis, an infobox is a table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.

In data mining, intention mining or intent mining is the problem of determining a user's intention from logs of his/her behavior in interaction with a computer system, such as in search engines, where there has been research on user intent or query intent prediction since 2002 ; and commercial intents expressed in social media posts.

Gad Landau Israeli computer scientist

Gad Menahem Landau is an Israeli computer scientist noted for his contributions to combinatorial pattern matching and string algorithms and is the founding department chair of the Computer Science Department at the University of Haifa.

Martin Farach-Colton American computer scientist

Martin Farach-Colton is an American computer scientist, known for his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache-oblivious algorithms, and lowest common ancestor data structures. He is a professor of computer science at Rutgers University, and a co-founder of storage technology startup company Tokutek.

Jewels of Stringology: Text Algorithms is a book on algorithms for pattern matching in strings and related problems. It was written by Maxime Crochemore and Wojciech Rytter, and published by World Scientific in 2003.

References

  1. "Approximate Text Searching" (PDF).
  2. Navarro, Gonzalo; Raffinot, Mathieu (2002). Flexible pattern matching in strings : practical on-line search algorithms for texts and biological sequences. Cambridge: Cambridge University Press. ISBN   0521813077. OCLC   47930721.
  3. Navarro, Gonzalo (2016-09-08). Compact data structures : a practical approach. New York, NY. ISBN   9781107152380. OCLC   952389252.
  4. "ACM Distinguished Member". 2018 ACM Distinguished Members. Association for Computing Machinery.
  5. 1 2 Kreft, Sebastian; Navarro, Gonzalo (2013). "On compressing and indexing repetitive sequences". Theoretical Computer Science. 483: 115–113. doi: 10.1016/j.tcs.2012.02.006 .
  6. Gagie, Travis; Kärkkäinen, Juha; Navarro, Gonzalo; Simon J., Puglisi (2013). "Colored range queries and document retrieval". Theoretical Computer Science. 483: 36–50. doi: 10.1016/j.tcs.2012.08.004 .
  7. Brisaboa, Nieves R.; Ladra, Susana; Navarro, Gonzalo (2013). "DACs: Bringing direct access to variable-length codes". Information Processing & Management. 49: 392–404. doi:10.1016/j.ipm.2012.08.003. hdl: 10533/130014 .
  8. Belazzougui, Djamal; Navarro, Gonzalo; Valenzuela, Daniel (2013). "Improved compressed indexes for full-text document retrieval". Journal of Discrete Algorithms. 13: 3–13. doi: 10.1016/j.jda.2012.07.005 .
  9. "70 Historias exitosas de Innovación y Ciencia" (PDF). Ministry of Economy, Government of Chile.