Read Pdf Metadata

-->

This post of the example tutorial series describes how to read Metadata from a PDF document using Java iText library. For those, who are beginners to the concept of Metadata, a small definition is provided below to get started. PDF metadata. When creating a PDF document, you might want to make sure that people could find out information about the PDF document. You can accomplish this task by adding metadata to the PDF document. The code shown below adds the title, the subject, the author, and its.

Some image files contain metadata that you can read to determine features of the image. For example, a digital photograph might contain metadata that you can read to determine the make and model of the camera used to capture the image. With GDI+, you can read existing metadata, and you can also write new metadata to image files.

GDI+ stores an individual piece of metadata in a PropertyItem object. You can read the PropertyItems property of an Image object to retrieve all the metadata from a file. The PropertyItems property returns an array of PropertyItem objects.

A PropertyItem object has the following four properties: Id, Value, Len, and Type.

Id

A tag that identifies the metadata item. Some values that can be assigned to Id are shown in the following table.

Hexadecimal valueDescription
0x0320
0x010F
0x0110
0x9003
0x829A
0x5090
0x5091
Image title
Equipment manufacturer
Equipment model
ExifDTOriginal
Exif exposure time
Luminance table
Chrominance table

Value

An array of values. The format of the values is determined by the Type property.

Len

The length (in bytes) of the array of values pointed to by the Value property.

Type

The data type of the values in the array pointed to by the Value property. The formats indicated by the Type property values are shown in the following table

Numeric valueDescription
1A Byte
2An array of Byte objects encoded as ASCII
3A 16-bit integer
4A 32-bit integer
5An array of two Byte objects that represent a rational number
6Not used
7Undefined
8Not used
9SLong
10SRational

Example

Description

The following code example reads and displays the seven pieces of metadata in the file FakePhoto.jpg. The second (index 1) property item in the list has Id 0x010F (equipment manufacturer) and Type 2 (ASCII-encoded byte array). The code example displays the value of that property item.

The code produces output similar to the following:

Code

Compiling the Code

The preceding example is designed for use with Windows Forms, and it requires PaintEventArgse, which is a parameter of the Paint event handler. Handle the form's Paint event and paste this code into the paint event handler. You must replace FakePhoto.jpg with an image name and path valid on your system and import the System.Drawing.Imaging namespace.

See also

Active2 years, 2 months ago

I'm trying to read metadata attached to arbitrary PDFs: title, author, subject, and keywords.

Is there a PHP library, preferably open-source, that can read PDF metadata? If so, or if there isn't, how would one use the library (or lack thereof) to extract the metadata?

To be clear, I'm not interested in creating or modifying PDFs or their metadata, and I don't care about the PDF bodies. I've looked at a number of libraries, including FPDF (which everyone seems to recommend), but it appears only to be for PDF creation, not metadata extraction.

user113292

6 Answers

The Zend framework includes Zend_Pdf, which makes this really easy:

Limitations: Works only on files without encryption smaller then 16MB.

Community
user113292

Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last 'endstream'.

Try to open a pdf on a text editor, a parser shouldn't take more than five lines.

user113292
Pdf metadata viewer onlinecbrandolinocbrandolino
5,0322 gold badges15 silver badges27 bronze badges

PDF Parser does exactly what you want and it's pretty straightforward to use:

You can try it in the demo page.

Alessandro CosentinoAlessandro Cosentino

I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.

The creator says:

Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.

For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)

maxpower9000maxpower9000
joan16v
3,7822 gold badges41 silver badges43 bronze badges
ved uniyalasved uniyalas
Read Pdf Metadata

You may use PDFtk to extract the page count:

If ImageMagick is available you may also use:

Pdf

Bash Read Pdf Metadata

Execute in PHP via shell_exec():

Read Pdf Metadata Command Line

maxpower9000maxpower9000