SPIF (patent identification format)

Last updated

SPIF is a standard format for representing patent identification information. [1] SPIF stands for (Simple/Standard/Specific/Special) Patent Identification Format. SPIF restricts the use of non-standard characters in listing patent publication and patent application. At its core, SPIF is a simple asset identification and disambiguation solution. SPIF does not seek to guarantee the existence of an asset, rather, SPIF seeks to uniquely identify assets.

Contents

SPIF was created with the following principles in mind:

The first version of SPIF was released January 29, 2021. SPIF was launched publicly on March 22, 2021. [2] SPIF is managed by the Linux Foundation's Joint Development Foundation Projects, LLC, SPIF Series, Project Name: SPIF Patent Identification Format. [3]

Problems of non-standard patent identification formats

Lists of patent numbers are important to facility a variety of patent management and monetization activities. For example

These lists can be difficult to work with because the tool or system used to generate the list may not be the same tool used to ingest the list, and inconsistencies between tools causes mismatches of patent assets. This causes in-house teams, outside counsel, and more to spend significant work multiple times over cleaning a list and making it suitable for use across tools (e.g., Cipher, Derwent, Questel, Innography, Unified Portal, etc.).

Some samples of problematic entries:

Sample InputLikely MeaningType (app or pub)Comments
CNZL201480022610.XCN201480022610appExtra check digit and prefix
ZL03827150.8CN03827150appMissing country code
GB3123328EP3123328A1pubWrong country code (there's no UK patent with that number)
CH,2420637EP2420637A2pubWrong country code (there's no Swiss patent with that number)
ZA 2015/000715ZA201500715appExtra zero
WO002/001258WO02002/001258appMissing year digits
GB2405228,319405.7GB2405228B8pubExtra stuff, app number?
US2014214418US20140214418A1pubMissing zero
US7123456BBUS7123456B2pubMade up kind code “BB”
KR1341015B1KR101341015B1pubMissing 10 prefix
20067013095,KoreaKR20067013095appCountry name spelled out as suffix
US2017163019A1WO02017163019A1appThere is a US app with that number, but they meant WIPO
GB2568035GB2568035BpubAmbiguous
GB2568035EP2568035B1pub
US10229419US10/229,419appAmbiguous
US10229419US10229419B2pub

Importantly, SPIF is not a solution for patent characteristic identification, priority dates, live dead status. SPIF is a simple asset identification and disambiguation solution.

Format details

The following defines the format of the required and optional columns in a SPIF compatible list of patent assets. [4] The required columns must be named exactly as shown. The formats of the numbers for the supported patent offices and the PCT are shown in the Country Specific Guidance Section. These formats have been selected for broad compatibility with existing tools.

Column NameDescriptionExamplesPriority
Application Number - SPIFThe patent-office assigned application/serial # including the country code and omitting spaces and check digits. There should be nothing else in the field (e.g., no leading/trailing whitespace)

Reasons to include country code: Need it to look up regardless.  Eliminates confusion between the actual serial number and designated country for EP. It eliminates the need to merge columns for matching and also solves issues with Excel reformatting the fields into numbers.

Check digits should be eliminated

US13624395

EP11759439

KR1020127027195

CN201180015433

JP2010549365

WO2011JP056984

Required where there is no Publication Number

Recommended otherwise but can be blank

Publication Number - SPIFThe patent office assigned patent number (when available), or publication number (when available). including the country code and the kind code. There should be no spaces, punctuation, or other characters. There should be nothing else in the field (e.g. no leading/trailing whitespace)

Check digits should be eliminated.

US9123456B2

EP2551856B1

KR101487211B1

CN102822907B

JP4879373B2

US20130014973A1

EP2551856A1

KR1020127027195A

CN102822907A

JP2011118054A1

WO2011118054A1

(Blank is ok if no publication number is available)

Required when issued or published, otherwise blank

Additional fields

Additional fields may be provided and parsers supporting SPIF are not required to evaluate these columns when performing matching. However, if these column names are present the data in the columns must conform to the provided descriptions.

Column NameDescriptionExamplesPriority
Title - SPIFThe title of the patent. This makes human verification of a file easierHigh frequency cable, high frequency coil, and method for manufacturing high frequency cableRecommended
Filing Date - SPIFThe filing date of the patent. This makes human verification and some machine verification simpler.

Excel date (not text), set Excel Date format to: "yyyy-mm-dd" (ISO-8601)

2012-09-21Recommended
Country - SPIFTwo-digit country code. This should be present in the numbers already but may be provided as a separate column. This is not the validation country for EP assets.EP

US

DE

GB

WO

Optional
Family identifiers - Multiple potential column names are permitted as shown at right.It is often helpful to be able to realize that multiple assets are all in the same family. Family identifiers are not mandatory; however, if they are provided the columns must be named according to the following pattern:

“Family - <Type>” where <Type> is replaced with: INPADOC,

DocDB, Internal, or a product-defined string, e.g. “Family - XYZTool”.

This can be used to improve matching and/or spot common problems with the data.

Family - Internal

2011-01

Family - INPADOC 20110929WO2011118054A1

Family - DocDB 44672642

Recommend that at least one family column be provided; multiple columns are permitted if properly named

File format requirements and formatting

  1. Microsoft Excel 2007+/OOXML (e.g. “.xlsx” file format)
    1. Not CSV, not anything else
    2. Not “classic” Excel, e.g. “.xls”
    3. Note, a goal is to have the format be human-readable and machine-readable. This will help build trust in the results.
      1. We recognize the problem of doing this (people will screw it up). The alternative is people can’t check and correct their files so they will end up with a CSV version and an Excel version and ...
  2. The sheet containing the data is called “Master Data - SPIF”
  3. The first row has the names of the columns only (Row 1 in Excel) and starts in Column A
  4. One row per asset only (Rows 2 and up)
    1. To the extent practical, each asset should only appear once, e.g. do not list both the publication and the patent as two rows.
  5. No merged cells anywhere in the Master Data Sheet
  6. Column order recommendation, any order is allowed provided the mandatory columns are named exactly:
    1. Application Number - SPIF
    2. Publication Number - SPIF

Adoption and Industry Acceptance

SPIF launched with support from Cipher, IAM Marketplace, Richardson Oliver Insights, RPX, and Unified Patents, who have all announced support of the SPIF format in 2021. Additionally, notable members of the SPIF project include

The first commercial trial of SPIF occurred between RPX and Google in March of 2021 (announced during the keynote speech at IPBC Connect, March 22, 2021). [5]

Related Research Articles

Microsoft Excel Spreadsheet, part of Microsoft Office

Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA). It has been a very widely applied spreadsheet for these platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as the industry standard for spreadsheets. Excel forms part of the Microsoft Office suite of software.

A spreadsheet is a computer application for organization, analysis, and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.

In computing, serialization or serialisation is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.

Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays, or tables. FITS is the most commonly used digital file format in astronomy. The FITS standard was designed specifically for astronomical data, and includes provisions such as describing photometric and spatial calibration information, together with image origin metadata.

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format. A CSV file typically stores tabular data in plain text, in which case each line will have the same number of fields.

In computing, a Personal Storage Table (.pst) is an open proprietary file format used to store copies of messages, calendar events, and other items within Microsoft software such as Microsoft Exchange Client, Windows Messaging, and Microsoft Outlook. The open format is controlled by Microsoft who provide free specifications and free irrevocable technology licensing.

Computer-assisted audit tool (CAATs) or computer-assisted audit tools and techniques (CAATs) is a growing field within the IT audit profession. CAATs is the practice of using computers to automate the IT audit processes. CAATs normally includes using basic office productivity software such as spreadsheet, word processors and text editing programs and more advanced software packages involving use statistical analysis and business intelligence tools. But also more dedicated specialized software are available.

Documents To Go

Documents To Go is BlackBerry's cross-platform office suite for Palm OS, Windows Mobile, Maemo, BlackBerry OS, Symbian, Android, and iOS. Also, a larger-screen version would have been included with the Palm Foleo, but Palm, Inc. cancelled the device before its release. The desktop tool, which provides document synchronization between one's handheld device and one's computer, is available for both Microsoft Windows and Mac OS X. On 8 September 2010, it was announced that DataViz had sold the program along with other business assets to Research In Motion for $50 million.

A pivot table is a table of statistics that summarizes the data of a more extensive table. This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way.

Symbolic Link (SYLK) is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as databases.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

ActiveReports

ActiveReports is a .NET reporting tool used by developers of .NET Core, MVC, JavaScript, WinForms, and ASP.NET applications. It was originally developed by Data Dynamics, which was then acquired by GrapeCity. ActiveReports is a set of components and tools that facilitates the production of reports to display data in documents and web-based formats. It is written in managed C# code and allows Visual Studio programmers to leverage their knowledge of C# or Visual Basic.NET when programming with ActiveReports.

FarPoint Spread

This article refers to the last FarPoint Edition of the Spread Product line. Spread is now developed by GrapeCity, Inc. Since the acquisition, Spread for Biztalk Server has been removed from the product line and SpreadJS, a JavaScript version, has been added.

Google Contacts Googles contact management tool

Google Contacts is Google's contact management tool that is available in its free email service Gmail, as a standalone service, and as a part of Google's business-oriented suite of web apps Google Workspace.

Google Fusion Tables Data management web service

Google Fusion Tables was a web service provided by Google for data management. Fusion tables can be used for gathering, visualising and sharing data tables. Data are stored in multiple tables that Internet users can view and download.

OpenRefine

OpenRefine, formerly called Google Refine and before that Freebase Gridworks, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications ; however, it behaves more like a database.

Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data.

Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.

References

  1. "SPIF Standard Launched to Help Reliably Identify Patent Assets in M&A and Other Transactions". IPWatchdog.com | Patents & Patent Law. 2021-03-31. Retrieved 2021-04-28.
  2. "Another brick in the wall towards patents becoming fully-fledged assets | IAM". www.iam-media.com. Retrieved 2021-03-30.
  3. "SPIF". SPIF. Retrieved 2021-03-30.
  4. "FAQ". SPIF. Retrieved 2021-03-30.
  5. "AGENDA - IPBC Connect 2021". web.cvent.com. Retrieved 2021-04-28.