SPIF (patent identification format)

Last updated April 29, 2021

SPIF is a standard format for representing patent identification information.^[1] SPIF stands for (Simple/Standard/Specific/Special) Patent Identification Format. SPIF restricts the use of non-standard characters in listing patent publication and patent application. At its core, SPIF is a simple asset identification and disambiguation solution. SPIF does not seek to guarantee the existence of an asset, rather, SPIF seeks to uniquely identify assets.

Human readable
Backwards compatible with existing systems
Applicable to patents and utility models that are currently in force (primarily with filing dates on or after January 1, 2000)
Available for a broad list of jurisdictions with an initial focus on patents from the USPTO, EPO, JPO, KIPO, CNIPA, and Patent Cooperation Treaty filings.

The first version of SPIF was released January 29, 2021. SPIF was launched publicly on March 22, 2021.^[2] SPIF is managed by the Linux Foundation's Joint Development Foundation Projects, LLC, SPIF Series, Project Name: SPIF Patent Identification Format.^[3]

Problems of non-standard patent identification formats

Lists of patent numbers are important to facility a variety of patent management and monetization activities. For example

corporate mergers and acquisitions
patent licensing
patent sales
standards setting declarations of essential patents
academic research
transfer of patent prosecution responsibility from one law firm to another

These lists can be difficult to work with because the tool or system used to generate the list may not be the same tool used to ingest the list, and inconsistencies between tools causes mismatches of patent assets. This causes in-house teams, outside counsel, and more to spend significant work multiple times over cleaning a list and making it suitable for use across tools (e.g., Cipher, Derwent, Questel, Innography, Unified Portal, etc.).

Some samples of problematic entries:

Sample Input	Likely Meaning	Type (app or pub)	Comments
CNZL201480022610.X	CN201480022610	app	Extra check digit and prefix
ZL03827150.8	CN03827150	app	Missing country code
GB3123328	EP3123328A1	pub	Wrong country code (there's no UK patent with that number)
CH,2420637	EP2420637A2	pub	Wrong country code (there's no Swiss patent with that number)
ZA 2015/000715	ZA201500715	app	Extra zero
WO002/001258	WO02002/001258	app	Missing year digits
GB2405228,319405.7	GB2405228B8	pub	Extra stuff, app number?
US2014214418	US20140214418A1	pub	Missing zero
US7123456BB	US7123456B2	pub	Made up kind code “BB”
KR1341015B1	KR101341015B1	pub	Missing 10 prefix
20067013095,Korea	KR20067013095	app	Country name spelled out as suffix
US2017163019A1	WO02017163019A1	app	There is a US app with that number, but they meant WIPO
GB2568035	GB2568035B	pub	Ambiguous
GB2568035	EP2568035B1	pub	Ambiguous
US10229419	US10/229,419	app	Ambiguous
US10229419	US10229419B2	pub	Ambiguous

Importantly, SPIF is not a solution for patent characteristic identification, priority dates, live dead status. SPIF is a simple asset identification and disambiguation solution.

Format details

The following defines the format of the required and optional columns in a SPIF compatible list of patent assets.^[4] The required columns must be named exactly as shown. The formats of the numbers for the supported patent offices and the PCT are shown in the Country Specific Guidance Section. These formats have been selected for broad compatibility with existing tools.

Column Name	Description	Examples	Priority
Application Number - SPIF	The patent-office assigned application/serial # including the country code and omitting spaces and check digits. There should be nothing else in the field (e.g., no leading/trailing whitespace) Reasons to include country code: Need it to look up regardless. Eliminates confusion between the actual serial number and designated country for EP. It eliminates the need to merge columns for matching and also solves issues with Excel reformatting the fields into numbers. Check digits should be eliminated	US13624395 EP11759439 KR1020127027195 CN201180015433 JP2010549365 WO2011JP056984	Required where there is no Publication Number Recommended otherwise but can be blank
Publication Number - SPIF	The patent office assigned patent number (when available), or publication number (when available). including the country code and the kind code. There should be no spaces, punctuation, or other characters. There should be nothing else in the field (e.g. no leading/trailing whitespace) Check digits should be eliminated.	US9123456B2 EP2551856B1 KR101487211B1 CN102822907B JP4879373B2 US20130014973A1 EP2551856A1 KR1020127027195A CN102822907A JP2011118054A1 WO2011118054A1 (Blank is ok if no publication number is available)	Required when issued or published, otherwise blank

Additional fields

Additional fields may be provided and parsers supporting SPIF are not required to evaluate these columns when performing matching. However, if these column names are present the data in the columns must conform to the provided descriptions.

Column Name	Description	Examples	Priority
Title - SPIF	The title of the patent. This makes human verification of a file easier	High frequency cable, high frequency coil, and method for manufacturing high frequency cable	Recommended
Filing Date - SPIF	The filing date of the patent. This makes human verification and some machine verification simpler. Excel date (not text), set Excel Date format to: "yyyy-mm-dd" (ISO-8601)	2012-09-21	Recommended
Country - SPIF	Two-digit country code. This should be present in the numbers already but may be provided as a separate column. This is not the validation country for EP assets.	EP US DE GB WO	Optional
Family identifiers - Multiple potential column names are permitted as shown at right.	It is often helpful to be able to realize that multiple assets are all in the same family. Family identifiers are not mandatory; however, if they are provided the columns must be named according to the following pattern: “Family - <Type>” where <Type> is replaced with: INPADOC, DocDB, Internal, or a product-defined string, e.g. “Family - XYZTool”. This can be used to improve matching and/or spot common problems with the data.	Family - Internal 2011-01 Family - INPADOC 20110929WO2011118054A1 Family - DocDB 44672642	Recommend that at least one family column be provided; multiple columns are permitted if properly named

File format requirements and formatting

Microsoft Excel 2007+/OOXML (e.g. “.xlsx” file format)
1. Not CSV, not anything else
2. Not “classic” Excel, e.g. “.xls”
3. Note, a goal is to have the format be human-readable and machine-readable. This will help build trust in the results.
  1. We recognize the problem of doing this (people will screw it up). The alternative is people can’t check and correct their files so they will end up with a CSV version and an Excel version and ...
The sheet containing the data is called “Master Data - SPIF”
The first row has the names of the columns only (Row 1 in Excel) and starts in Column A
One row per asset only (Rows 2 and up)
1. To the extent practical, each asset should only appear once, e.g. do not list both the publication and the patent as two rows.
No merged cells anywhere in the Master Data Sheet
Column order recommendation, any order is allowed provided the mandatory columns are named exactly:
1. Application Number - SPIF
2. Publication Number - SPIF

Adoption and Industry Acceptance

SPIF launched with support from Cipher, IAM Marketplace, Richardson Oliver Insights, RPX, and Unified Patents, who have all announced support of the SPIF format in 2021. Additionally, notable members of the SPIF project include

The first commercial trial of SPIF occurred between RPX and Google in March of 2021 (announced during the keynote speech at IPBC Connect, March 22, 2021).^[5]

Related Research Articles

Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA). It has been a very widely applied spreadsheet for these platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as the industry standard for spreadsheets. Excel forms part of the Microsoft Office suite of software.

A spreadsheet is a computer application for organization, analysis, and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.

In computing, serialization or serialisation is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.

Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays, or tables. FITS is the most commonly used digital file format in astronomy. The FITS standard was designed specifically for astronomical data, and includes provisions such as describing photometric and spatial calibration information, together with image origin metadata.

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data in plain text, in which case each line will have the same number of fields.

In computing, a Personal Storage Table (.pst) is an open proprietary file format used to store copies of messages, calendar events, and other items within Microsoft software such as Microsoft Exchange Client, Windows Messaging, and Microsoft Outlook. The open format is controlled by Microsoft who provide free specifications and free irrevocable technology licensing.

Computer-assisted audit tool (CAATs) or computer-assisted audit tools and techniques (CAATs) is a growing field within the IT audit profession. CAATs is the practice of using computers to automate the IT audit processes. CAATs normally includes using basic office productivity software such as spreadsheet, word processors and text editing programs and more advanced software packages involving use statistical analysis and business intelligence tools. But also more dedicated specialized software are available.

Documents To Go is BlackBerry's cross-platform office suite for Palm OS, Windows Mobile, Maemo, BlackBerry OS, Symbian, Android, and iOS. Also, a larger-screen version would have been included with the Palm Foleo, but Palm, Inc. cancelled the device before its release. The desktop tool, which provides document synchronization between one's handheld device and one's computer, is available for both Microsoft Windows and Mac OS X. On 8 September 2010, it was announced that DataViz had sold the program along with other business assets to Research In Motion for $50 million.

A pivot table is a table of statistics that summarizes the data of a more extensive table. This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way.

Symbolic Link (SYLK) is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as databases.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

ActiveReports is a .NET reporting tool used by developers of .NET Core, MVC, JavaScript, WinForms, and ASP.NET applications. It was originally developed by Data Dynamics, which was then acquired by GrapeCity. ActiveReports is a set of components and tools that facilitates the production of reports to display data in documents and web-based formats. It is written in managed C# code and allows Visual Studio programmers to leverage their knowledge of C# or Visual Basic.NET when programming with ActiveReports.

This article refers to the last FarPoint Edition of the Spread Product line. Spread is now developed by GrapeCity, Inc. Since the acquisition, Spread for Biztalk Server has been removed from the product line and SpreadJS, a JavaScript version, has been added.

Google Contacts is Google's contact management tool that is available in its free email service Gmail, as a standalone service, and as a part of Google's business-oriented suite of web apps Google Workspace.

Google Fusion Tables was a web service provided by Google for data management. Fusion tables can be used for gathering, visualising and sharing data tables. Data are stored in multiple tables that Internet users can view and download.

OpenRefine, formerly called Google Refine and before that Freebase Gridworks, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications ; however, it behaves more like a database.

Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data.

Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.

References

↑ "SPIF Standard Launched to Help Reliably Identify Patent Assets in M&A and Other Transactions". IPWatchdog.com | Patents & Patent Law. 2021-03-31. Retrieved 2021-04-28.
↑ "Another brick in the wall towards patents becoming fully-fledged assets | IAM". www.iam-media.com. Retrieved 2021-03-30.
↑ "SPIF". SPIF. Retrieved 2021-03-30.
↑ "FAQ". SPIF. Retrieved 2021-03-30.
↑ "AGENDA - IPBC Connect 2021". web.cvent.com. Retrieved 2021-04-28.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "SPIF Standard Launched to Help Reliably Identify Patent Assets in M&A and Other Transactions". IPWatchdog.com | Patents & Patent Law. 2021-03-31. Retrieved 2021-04-28.

[2] "Another brick in the wall towards patents becoming fully-fledged assets | IAM". www.iam-media.com. Retrieved 2021-03-30.

[3] "SPIF". SPIF. Retrieved 2021-03-30.

[4] "FAQ". SPIF. Retrieved 2021-03-30.

[5] "AGENDA - IPBC Connect 2021". web.cvent.com. Retrieved 2021-04-28.

[1]

[2]

[3]

[4]

[5]