This article was first published on IAM. Please contact us for any reprint.
Patent specifications and drawings are important carriers of technical information. Once they are published by the China National Intellectual Property Administration (CNIPA), they become publicly available knowledge resources. With the rapid development of generative artificial intelligence, a key question has emerged: May published patent specifications and drawings be used as training data for AI models? In China, what legal and compliance risks would such use entail?
The copyright nature of patent specifications and drawings and the requirements for publication
According to the Copyright Law, a work must possess originality and be expressed in a tangible form. Although the drafting of patent specifications and drawings is subject to formal requirements under the Patent Law and they must be examined by the patent administrative authority and published by way of announcement, in essence they are not administrative documents but technical literature created by the applicant. As long as they meet the constitutive elements of a work, they should be protected by the Copyright Law.
There remains room for creativity in the wording, sentence arrangement and design of drawings in patent specifications and drawings, which can reflect the author’s individualized choices and the “beauty of science”. Therefore, patent specifications and drawings that meet the minimum standard of creativity may well constitute works within the meaning of the Copyright Law.
Judicial practice has already provided clear recognition in this regard. For example, in (2021) Jing 73 Min Zhong No. 4384, the Beijing Intellectual Property Court held that patent specifications and drawings do not fall within the category of administrative documents, and that examination and publication by the national patent administrative authority do not alter their nature. The court found that the drawings in the specification demonstrate originality in their structural arrangement and choice of lines and should be protected as graphic works. Similarly, in (2022) Shan Zhi Min Zhong No. 112, the Shaanxi High People’s Court emphasized that the patent specification possesses originality in the expression of the technical solution, the choice of wording and the arrangement of sentences, and constitutes a literary work; its copyright nature cannot be denied on the ground of formal requirements.
These cases indicate that courts generally recognize that patent specifications and drawings should be protected by copyright when they satisfy the requirement of originality, rather than being regarded merely as technical literature. However, the purpose of publishing patent specifications and drawings is to disseminate technical knowledge, and this function naturally limits the exercise of copyright. The acts of the national patent administrative authority in the course of examination and publication, as well as those of the public in reproducing and disseminating these documents to obtain technical information, are generally regarded as reasonable conduct, provided that they do not affect the normal exploitation of the works by the copyright holder or cause unreasonable prejudice. The academic community has also pointed out that the technical disclosure nature of patent specifications and drawings determines that the scope of their copyright protection should be appropriately narrowed so as to avoid impeding the circulation and utilization of knowledge.
Legal risks and compliance framework for AI training
In the development of artificial intelligence, the lawfulness of training data has always been one of the core issues. Including published patent specifications and drawings in training data reflects the reuse of technical resources, but also raises complex considerations under the Copyright Law. Where patent specifications and drawings possess originality and are thus protected as works under the Copyright Law, training activities may involve the right of reproduction or the right of communication through information networks, thereby giving rise to potential infringement risks.
How to strike a balance between copyright protection and technological innovation remains a challenge jointly faced by the judiciary and policymakers. The “three-step test” established under China’s Copyright Law requires that the use must fall within statutory circumstances, must not conflict with the normal exploitation of the work, and must not unreasonably prejudice the legitimate interests of the copyright holder. Whether AI training satisfies this framework has yet to see a uniform conclusion. Although the training process is essentially an analytical use of works rather than a direct substitute for the originals, the delineation of “normal exploitation” and “unreasonable prejudice” still requires further clarification by the judiciary.
In judicial practice, the Shanghai Intellectual Property Court, in the 2023 LoRA/Altman case, introduced the concept of “analytical use”, holding that the purpose of generative AI training is to analyze the ideological elements and patterns of expression of works rather than to reproduce the originals, and that such use may therefore be recognized as fair use. However, the court also stressed that AI service providers must take measures to prevent users from generating infringing content, and that fair use does not equate to exemption from liability. This judicial practice provides an important reference for AI training, while also highlighting the responsibility requirements under the compliance framework.
Meanwhile, policy-level regulations are being continuously refined. The Interim Measures for the Administration of Generative Artificial Intelligence Services explicitly require AI service providers to use training data from lawful sources during the training process, to respect intellectual property rights, to enhance the authenticity and diversity of data, and to establish transparent mechanisms to prevent the generation of infringing content. These provisions not only provide enterprises with a compliance framework, but also mean that even if judicial practice adopts a tolerant attitude toward training activities, enterprises must still assume compliance responsibilities at the institutional level.
In industry practice, some developers of AI patent tools have indicated that their products are based on large models (such as those of Google and OpenAI), and that their training data cover a large amount of publicly available information. Whether patent specifications and drawings are included often depends on data collection strategies and copyright compliance measures. This shows that while industry practice is exploring technological possibilities, it is also continuously adjusting compliance pathways. Although copyright risks cannot be completely eliminated, if the use is limited to internal analysis and model optimization, and does not affect the normal exploitation of patent specifications and drawings or cause unreasonable prejudice to the rights holders, it may meet the requirements of fair use. To this end, enterprises should adopt multiple measures in practice: retain records of the sources of training data and compliance, establish internal review mechanisms, and clearly define the scope of data use and allocation of responsibilities in cooperation agreements.
Conclusion and international comparison
Published patent specifications and drawings provide an important data resource for the development of artificial intelligence. In China, their use for AI training lies at the intersection of the Copyright Law, the doctrine of fair use, and emerging regulatory requirements. Future developments may lie in, on the one hand, gradually clarifying the boundaries of fair use through judicial precedents, and on the other hand, establishing an operable compliance framework through policies and industry norms, thereby promoting innovation while maintaining the order of intellectual property protection.
In international comparison, the U.S. Copyright Office’s third policy report on artificial intelligence, issued in May 2025, focuses on the training of generative AI models. The United States relies on the doctrine of “fair use”, and courts, in multiple cases involving search engines and data mining, tend to find that analytical use rather than substitutive use may constitute fair use.
The European Union, through the Copyright Directive (2019/790/EU), has established a “text and data mining exception”, allowing research institutions and certain commercial entities to use protected works under specific conditions, while granting rights holders the option of “reservation of rights” to exclude their works from being used for data mining or AI training.
This institutional divergence indicates that, in its future institutional design, China may draw on the case-law approach of the United States as well as the legislative model of the European Union, and, in light of its own needs for intellectual property protection and industrial development, explore an appropriate balancing mechanism.







