Patent visualisation

Last updated August 23, 2024

Patent visualisation is an application of information visualisation. The number of patents has been increasing,^[1] encouraging companies to consider intellectual property as a part of their strategy.^[2] Patent visualisation, like patent mapping, is used to quickly view a patent portfolio.

Software dedicated to patent visualisation began to appear in 2000, for example Aureka from Aurigin (now owned by Thomson Reuters).^[3] Many patent and portfolio analytics platforms, such as Questel,^[4] Patent Forecast, PatSnap, Patentcloud, Relecura, and Patent iNSIGHT Pro,^[5] offer options to visualise specific data within patent documents by creating topic maps,^[6] priority maps, IP Landscape reports,^[7] etc. Software converts patents into infographics or maps, to allow the analyst to "get insight into the data" and draw conclusions.^[8] Also called patinformatics,^[9] it is the "science of analysing patent information to discover relationships and trends that would be difficult to see when working with patent documents on a one-and-one basis".^{[ citation needed ]}

Patents contain structured data (like publication numbers) and unstructured text (like title, abstract, claims and visual info). Structured data are processed by data-mining and unstructured data are processed with text-mining.^[10]

Data mining

The main step in processing structured information is data-mining,^[11] which emerged in the late 1980s. Data mining involves statistics, artificial intelligence, and machine learning.^[12] Patent data mining extracts information from the structured data of the patent document.^[13] These structured data are bibliographic fields such as location, date or status.

Structured fields

Structured data	Description	Business Intelligence use
Data	Patents contain identifying data including priority, publication data and the issue date. Priority data regroup priority number assigned for the first application, the corresponding date and priority country. The publication data encompasses the publication number given when the patent is published, 18 months after filling and the publication date. The issue date is the data the patent is granted, usually 3.5 years after filling depending on the patent office.	Crossing dates and locations fields offer a global vision of a technology in time and space.
Assignee	Patent assignees are organizations or individuals - the owners of the patent.	The field can offer a ranking of the principal actors of the environment, thus allowing us to visualise potential competitors or partners.
Inventor	Inventors develop the invention/patent.	Inventors' field combined with the assignee field can create a social network and provide a method to follow field experts.
Classification	The classification can regroup inventions with similar technologies. The most commonly used is the International Patent Classification (IPC). However, patent organizations have their own classification; for instance, the European Patent Office has framed the ECLA.	Grouping patents by theme offers an overview of the corpus and the potential applications of studied technology.
Status	The legal status indicates whether an application is filed, approved, or rejected.	Patent family and legal status searching is important for litigation and competitive intelligence.

Advantages

Data mining allows study of filing patterns of competitors and locates main patent filers within a specific area of technology. This approach can be helpful to monitor competitors' environments, moves and innovation trends and gives a macro view of a technology status.^{[ citation needed ]}

Text-mining

Principle

Text mining is used to search through unstructured text documents.^[14]^[15] This technique is widely used on the Internet, it has had success in bioinformatics and now in the intellectual property environment.^[16]

Text mining is based on a statistical analysis of word recurrence in a corpus.^[17] An algorithm extracts words and expressions from title, summary and claims and gathers them by declension. "And" and "if" are labeled as non-information bearing words and are stored in the stopword list. Stoplists can be specialised in order to create an accurate analysis. Next, the algorithm ranks the words by weight, according to their frequency in the patent's corpus and the document frequency containing this word. The score for each word is calculated using a formula such as:^[18]^[19]

$Weight={\frac {Term\ Frequency}{Document\ Frequency}}={\frac {Frequency\ of\ the\ word\ or\ expression\ in\ the\ Text\ Sea}{Number\ of\ documents\ containing\ the\ expression\ or\ word}}$

A frequently-used word in several documents has less weight than a word used frequently in a few patents. Words under a minimum weight are eliminated, leaving a list of pertinent words or descriptors. Each patent is associated to the descriptors found in the selected document. Further, in the process of clusterisation, these descriptors are used as subsets, in which the patent are regrouped or as tags to place the patents in predetermined categories, for example keywords from International Patent Classifications.

Four text parts can be processed with text-mining :

Title
Abstract
Claim
Patent Full-Text

Software offer different combinations but title, abstract and claim are generally the most used, providing a good balance between interferences and relevancy.

Advantages

Text-mining can be used to narrow a search or quickly evaluate a patent corpus. For instance, if a query produces irrelevant documents, a multi-level clustering hierarchy identifies them in order to delete them and refine the search. Text-mining can also be used to create internal taxonomies specific to a corpus for possible mapping.^{[ citation needed ]}

Visualisations

Allying patent analysis and informatic tools offers an overview of the environment through value-added visualisations. As patents contain structured and unstructured information, visualisations fall in two categories. Structured data can be rendered with data mining in macrothematic maps and statistical analysis. Unstructured information can be shown in like clouds, cluster maps and 2D keyword maps.

Data mining visualisation

Visualisation	Picture	Description	Business Intelligence use
Matrix chart	Picture	Graphic organizer used to summarize a multidimensional data set in a grid	Data comparison
Location map	Picture	Map with overlaid data values on geographic regions	Spatial patterns Find innovative jurisdictions
Bar chart	Picture	Graph with rectangular bars proportional to the values that they represent, useful for numerical comparisons.	Data evolution
Line graph	Picture	Graph used to summarize how two parameters are related and how they vary.	Data evolution and relationships
Pie chart	Picture	Circular chart divided into sections, to illustrate proportions.	Data comparison
Bubble chart	Picture	3-axis 2D chart which enables visualization similar to the Magic quadrant chart.	Market maturity Competitive analysis Licensing opportunities

Text mining visualisation

Visualisation	Description	Business Intelligence use
Tree list	Hierarchy list	Evaluating relevance Taxonomy Concept relationships
Tag cloud	Full text of concepts. The size of each word is determined by its frequency in the corpus	Evaluating relevance More visual than the tree list
2D keyword map ^[20]	Tomographic map with quantitative representation of relief, usually using contour lines and colors. Distance on map is proportional to the difference between themes.^[13]	Landscape vision of thematics Similarity vision with SOM Monitoring competitors
	2D hierarchical cluster map with quantitative and qualitative representation of document set association to topic, usually using quantized cells and colors. Size of topic cells may represent patent count per topic relative to overall document set. Density and distribution inside of a topic cell may be proportional to document count relative to association to the topic and strength of association, respectively.	Landscape vision of thematics Monitoring competitors or a technology space Identifying trends in a defined patent set
	Text is decomposed into logical groupings and sub-groupings, then represented as a navigable hierarchy of those groupings by means of proportionate circle arcs.	Landscape vision of thematics Monitoring a technology space Interactive navigation and granularity

Visualisation for both data-mining and text-mining

Mapping visualisations can be used for both text-mining and data-mining results.

Visualisation	Picture	Description	Business Intelligence use
Tree map	Picture	Visualization of hierarchical structures. Each data item, or row in the data set is represented by a rectangle, whose area is proportional to selected parameters.	Landscape vision of hierarchical thematics Position of competitors or technology by thematics
Network map	Picture	In a network diagram, entities are connected to each other in the form of a node and link diagram.	Relationship visions Monitoring similar competitors or technologies
Citation map	Picture	In the citation map, the date of citation is visualized on the x axis and each individual citation takes an entry on the y axis. A strong vertical line indicates the filing date, showing which citations are cited by the patent as opposed to those which cite the patent.	Qualitative and quantitative view of citation history and density

Uses

What patent visualisation can highlight:^[21]^[22]

Competitors
Partners
New innovations
Technologic environment description^[23]
Networks

Field application:^[24]^[22]

R&D strategy management
Competitive intelligence
Licensing
Strategy

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Natural language processing (NLP) is an interdisciplinary subfield of computer science and artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

Business intelligence (BI) consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in each document in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a document-feature matrix where "features" may refer to other properties of a document besides terms. It is also common to encounter the transpose, or term-document matrix where documents are the columns and terms are the rows. They are useful in the field of natural language processing and computational text analysis.

<span class="mw-page-title-main">Parallel coordinates</span> Chart displaying multivariate data

Parallel Coordinates plots are a common method of visualizing high-dimensional datasets to analyze multivariate data having multiple variables, or attributes.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols, the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Enterprise search is software technology for searching data sources internal to a company, typically intranet and database content. The search is generally offered only to users internal to the company. Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer.

Document clustering is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

WordStat is a content analysis and text mining software. It was first released in 1998 after being developed by Normand Peladeau from Provalis Research. The latest version 9 was released in 2021.

Patent analysis is the process of analyzing the texts of patent disclosures and other information from the patent lifecycle. Patent analysis is used to obtain deeper insights into different technologies and innovation. Other terms are sometimes used as synonyms for patent analytics: patent landscape, patent mapping, or cartography. However, there is no harmonized terminology in different languages, including in French and Spanish. Patent analytics encompasses the analysis of patent data, analysis of the scientific literature, data cleaning, text mining, machine learning, geographic mapping, and data visualisation.

KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.

References

↑ ^[dead link ]
↑ Kevin G. Rivette, David Kline, "Discovering new value in intellectual property", Harvard Business Review (January–February 2000)
↑ "Thomson Reuters | Aureka | Intellectual Property". Archived from the original on 4 February 2013.
↑ "Patent Analysis, Mapping, and Visualization Tools - PIUG Space - Global Site".
↑ "Patent iNSIGHT Pro". Archived from the original on 2014-02-21. Retrieved 2014-02-07.
↑ Conduct patent portfolio analysis using comparative Topic Maps
↑ Graphene Technology Insight Report
↑ Daniel A Keim et IEEE Computer Society, "Information visualization and visual data mining," IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8 (2002): 1--8.
↑ Anthony J. Trippe, "Patinformatics: Tasks to tools," World Patent Information 25, n°. 3 (September 2003): 211-221.
↑ Laura Ruotsalainen, "Data mining tools for technology and competitive intelligence" VTT Research Notes 2451(October 2008)
↑ Archived June 12, 2010, at the Wayback Machine
↑ "How Data Mining is Evolving".
1 2 Sungjoo Lee, Byungun Yoon, et Yongtae Park, "An approach to discovering new technology opportunities: Keyword-based patent map approach," Technovation 29, n°. 6 (Juin): 481-497.
↑ Archived October 17, 2010, at the Wayback Machine
↑ Bonino, Dario; Ciaramella, Alberto; Corno, Fulvio (2010). "Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics". World Patent Information. 32: 30–38. doi:10.1016/j.wpi.2009.05.008.
↑ Sholom Weiss and al, Text Mining : Predictive Methods for Analyzing Unstructured Information, 1er ed. (Springer 2004).
↑ Antoine Blanchard "La cartographie des brevets" La Recherche n°.398 (2006) : 82-83
↑ Gerard Salton et Christopher Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management 24, n°. 5 (1988): 513-523.
↑ Y Kim, J Suh, et S Park, "Visualization of patent analysis for emerging technology," Expert Systems with Applications 34, no. 3 (4, 2008): 1804–1812.
↑ "Newsmap". Archived from the original on July 8, 2010. Retrieved April 28, 2017.
↑ Miyake, M., Mune, Y. and Himeno, K. "Strategic Intellectual Property Portfolio Management: Technology Appraisal by Using the 'Technology Heat Map'", Nomura Research Institute (NRI) Papers, n°. 83, (December 2004).
1 2 Charles Boulakia "Patent mapping" Archived 2011-03-13 at the Wayback Machine
↑ Richard Seymour, "Platinum Group Metals Patent Analysis and Mapping," Platinum Metals Review 52, n°. 4 (10, 2008): 231-240.
↑ Susan E Cullen, "Introduction, From acorns to oak trees : how patent audits help innovations reach their full potential" IP Value 2010 - An International Guide for the Boardroom : 26--30

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] [dead link ]

[2] Kevin G. Rivette, David Kline, "Discovering new value in intellectual property", Harvard Business Review (January–February 2000)

[3] "Thomson Reuters | Aureka | Intellectual Property". Archived from the original on 4 February 2013.

[4] "Patent Analysis, Mapping, and Visualization Tools - PIUG Space - Global Site".

[5] "Patent iNSIGHT Pro". Archived from the original on 2014-02-21. Retrieved 2014-02-07.

[6] Conduct patent portfolio analysis using comparative Topic Maps

[7] Graphene Technology Insight Report

[8] Daniel A Keim et IEEE Computer Society, "Information visualization and visual data mining," IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8 (2002): 1--8.

[9] Anthony J. Trippe, "Patinformatics: Tasks to tools," World Patent Information 25, n°. 3 (September 2003): 211-221.

[10] Laura Ruotsalainen, "Data mining tools for technology and competitive intelligence" VTT Research Notes 2451(October 2008)

[11] Archived June 12, 2010, at the Wayback Machine

[12] "How Data Mining is Evolving".

[auto-13] 1 2 Sungjoo Lee, Byungun Yoon, et Yongtae Park, "An approach to discovering new technology opportunities: Keyword-based patent map approach," Technovation 29, n°. 6 (Juin): 481-497.

[14] Archived October 17, 2010, at the Wayback Machine

[15] Bonino, Dario; Ciaramella, Alberto; Corno, Fulvio (2010). "Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics". World Patent Information. 32: 30–38. doi:10.1016/j.wpi.2009.05.008.

[16] Sholom Weiss and al, Text Mining : Predictive Methods for Analyzing Unstructured Information, 1er ed. (Springer 2004).

[17] Antoine Blanchard "La cartographie des brevets" La Recherche n°.398 (2006) : 82-83

[18] Gerard Salton et Christopher Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management 24, n°. 5 (1988): 513-523.

[19] Y Kim, J Suh, et S Park, "Visualization of patent analysis for emerging technology," Expert Systems with Applications 34, no. 3 (4, 2008): 1804–1812.

[20] "Newsmap". Archived from the original on July 8, 2010. Retrieved April 28, 2017.

[21] Miyake, M., Mune, Y. and Himeno, K. "Strategic Intellectual Property Portfolio Management: Technology Appraisal by Using the 'Technology Heat Map'", Nomura Research Institute (NRI) Papers, n°. 83, (December 2004).

[auto1-22] 1 2 Charles Boulakia "Patent mapping" Archived 2011-03-13 at the Wayback Machine

[23] Richard Seymour, "Platinum Group Metals Patent Analysis and Mapping," Platinum Metals Review 52, n°. 4 (10, 2008): 231-240.

[24] Susan E Cullen, "Introduction, From acorns to oak trees : how patent audits help innovations reach their full potential" IP Value 2010 - An International Guide for the Boardroom : 26--30

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Patent visualisation

Contents

Data mining

Structured fields

Advantages

Text-mining

Principle

Advantages

Visualisations

Data mining visualisation

Text mining visualisation

Visualisation for both data-mining and text-mining

Uses

Related Research Articles

References