Read Pdf Metadata

-->

This post of the example tutorial series describes how to read Metadata from a PDF document using Java iText library. For those, who are beginners to the concept of Metadata, a small definition is provided below to get started. PDF metadata. When creating a PDF document, you might want to make sure that people could find out information about the PDF document. You can accomplish this task by adding metadata to the PDF document. The code shown below adds the title, the subject, the author, and its.

Some image files contain metadata that you can read to determine features of the image. For example, a digital photograph might contain metadata that you can read to determine the make and model of the camera used to capture the image. With GDI+, you can read existing metadata, and you can also write new metadata to image files.

GDI+ stores an individual piece of metadata in a PropertyItem object. You can read the PropertyItems property of an Image object to retrieve all the metadata from a file. The PropertyItems property returns an array of PropertyItem objects.

A PropertyItem object has the following four properties: Id, Value, Len, and Type.

Id

A tag that identifies the metadata item. Some values that can be assigned to Id are shown in the following table.

Hexadecimal value	Description
0x0320 0x010F 0x0110 0x9003 0x829A 0x5090 0x5091	Image title Equipment manufacturer Equipment model ExifDTOriginal Exif exposure time Luminance table Chrominance table

Value

An array of values. The format of the values is determined by the Type property.

Len

The length (in bytes) of the array of values pointed to by the Value property.

Type

The data type of the values in the array pointed to by the Value property. The formats indicated by the Type property values are shown in the following table

Numeric value	Description
1	A `Byte`
2	An array of `Byte` objects encoded as ASCII
3	A 16-bit integer
4	A 32-bit integer
5	An array of two `Byte` objects that represent a rational number
6	Not used
7	Undefined
8	Not used
9	`SLong`
10	`SRational`

Example

Description

The following code example reads and displays the seven pieces of metadata in the file FakePhoto.jpg. The second (index 1) property item in the list has Id 0x010F (equipment manufacturer) and Type 2 (ASCII-encoded byte array). The code example displays the value of that property item.

The code produces output similar to the following:

Code

Compiling the Code

The preceding example is designed for use with Windows Forms, and it requires PaintEventArgse, which is a parameter of the Paint event handler. Handle the form's Paint event and paste this code into the paint event handler. You must replace FakePhoto.jpg with an image name and path valid on your system and import the System.Drawing.Imaging namespace.

6 Answers

The Zend framework includes Zend_Pdf, which makes this really easy:

Limitations: Works only on files without encryption smaller then 16MB.

Community♦

user113292

Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last 'endstream'.

Try to open a pdf on a text editor, a parser shouldn't take more than five lines.

user113292

cbrandolinocbrandolino

5,0322 gold badges15 silver badges27 bronze badges

PDF Parser does exactly what you want and it's pretty straightforward to use:

You can try it in the demo page.

Alessandro CosentinoAlessandro Cosentino

I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.

The creator says:

Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.

For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)

maxpower9000maxpower9000

joan16v

3,7822 gold badges41 silver badges43 bronze badges

ved uniyalasved uniyalas

You may use PDFtk to extract the page count:

If ImageMagick is available you may also use:

Bash Read Pdf Metadata

Execute in PHP via shell_exec():