Mapping PDF fields
-
I’m having trouble understanding the process of what needs to be done to successfully import two pieces of metadata from the PDFs I’m uploading to the Media Library. Each PDF includes (an example of a PDF metadata entry is shown below):
pdf:Title => The Role of Politics in the Appointment of Board Members at Sydney Organizing Committee of the Olympic Games (SOCOG)
pdf:Author => Stylianos Daskalakis, Dimitrios Gargalianos & Evangelos Albanidis
When I upload a PDF, the TITLE field, by default, contains the file name, which is useless to users when they execute a search. I want this TITLE field to be replaced by the pdf:Title.
I would also like the new “AUTHOR” field (created using the Custom Fields plugin) to contain the data from the pdf:Author metadata.
I’m assuming that I somehow have to map the IPTC fields so MLA knows what metadata to harvest from the PDFs I upload to my Media Library, but I’m at a loss to know how/where to introduce that mapping and what the code should look like.
Thanks for helping.
-
Thanks for your question, and thanks as well for including the “metadata entry” for the fields you’re interested in; very helpful. It looks like you found the “Attachment File Metadata” box MLA adds to the Media/Edit Media admin page. Note that the two fields you give as examples are not IPTC fields; they are PDF “Document Information Dictionary” properties.
You can use mapping rules to accomplish your goal. For the Title field:
- Go to the Settings/Media Library Assistant IPTC/EXIF tab.
- If you want to apply the rule to new items as they are uploaded, check the “Enable IPTC/EXIF Mapping when adding new media” and “Enable IPTC/EXIF Mapping when updating media metadata” boxes.
- Locate the “Title” rule entry in the table and click the “Edit” rollover action.
- In the “IPTC Value” dropdown control leave the default, “- None (select a value) -” value in place.
- In the “EXIF/Template Value” text box, enter
template:([+pdf:Title+])
. - In the “Priority” dropdown, select “EXIF”.
- In the “Existing Text” dropdown, select “Replace” to replace the WordPress default (file name) value.
- In the “Status” text box, select “Active”.
- Scroll down to the bottom of the screen and click “Update”.
Once you define the rule you can apply it to a single item, multiple items or all items:
- To map a single item, go to the Media/Assistant submenu and click the thumbnail of the item you want (or click the “Edit” rollover action) to get the Media/Edit Media screen. You can click the “Map IPTC/EXIF metadata” link to run your rules on this item, then look at the “Title” text box to inspect the results.
- To map two or more items, go to the Media/Assistant submenu and click the checkbox next to the items you want. Then, select “Edit” from the “Bulk Actions” dropdown above the checkboxes and click “Apply” to open the Bulk Edit area. Click the “Map IPTC/EXIF metadata” button to run your rule on the selected items.
- To map all of your items, stay on the Settings/Media Library Assistant IPTC/EXIF tab and click the “Execute” rollover action for the Title rule. This may take a while.
These methods will apply all IPTC/EXIF rules to the selected item(s). This won’t be a problem if you have only the one rule. The third method might be the best for your application. Because your template includes parentheses around the
([+pdf:Title+])
value the rule will only affect PDF items that have a value file.For your “AUTHOR” custom field (created by Advanced Custom Fields?):
- Navigate to the Settings/Media Library Assistant “Custom Fields” tab.
- Scroll down to the “Add New Custom Field Rule” area below the “Enable” checkboxes.
- From the “Name” dropdown control, select the name of your custom field, something like “author”. For ACF fields this will be the “Field Name”, not the “Field Label”.
- If you don’t see your field. click “Enter new field” to enter the custom field name manually. In the “Name” text box, enter your field name, e.g., “author”.
- From the Data Source dropdown list, select “– Template (see below) –”.
- In the Meta/Template text box, enter
([+pdf:Author+])
. - Click the “MLA Column” check box if you want to make the field available in the Media/Assistant submenu table. You can also click the “Quick Edit” and “Bulk Edit” check boxes to make the field available in the Media/Assistant submenu table Quick Edit and Bulk Edit areas if that’s useful for you.
- In the “Existing Text” dropdown, select “Replace” to replace existing values or “Keep” to retain any existing values.
- In the “Format” dropdown list, select “Native”.
- In the “Option:” dropdown list, select “Text”.
- Click the “Delete NULL Values” checkbox.
- Leave the “Status” set to “Active”.
- Click the “Add Rule” button to save your work.
You can apply the rule with any of the three methods outlined above; use the “Map Custom Field metadata” links for this rule.
I hope that gives you the details you need to complete your application. I will leave this topic unresolved and hope to hear back from you soon.
Hi David. Your directions were perfect and easy to follow. Exactly what I was looking for. Only one problem: when I click the “Map IPTC/EXIF metadata” link on a PDF in my Library, the Title field still shows the permafile name, and the Author field remains unpopulated. I double checked the “Attachment File Metadata” field at the bottom of the record to make sure there were entries for both Title and Author … there were. Can’t figure out where I went wrong.
Thanks for following the steps I outlined and for your positive feedback.
For the Title issue, carefully review your rule, especially the “EXIF/Template Value” value
template:([+pdf:Title+])
the “Existing Text”Replace
setting and the “Status”Active
setting. This rule is working on my system. You can also try sourcing the title from another field, such astemplate:([+xmp:title+])
. Note that the field names are case sensitive, i.e., “Title” and “title” are different.For the author issue, remember you must click the “Map Custom Field metadata” link to update this field. The IPTC/EXIF and Custom Field rules are separate (for mostly historical reasons).
Let me know if the above suggestions are helpful. Thanks.
Saw my error with respect to the Title … I forgot to include “template:” before ([+pdf:Title+]). This one works fine. Do I also need to include “template:” before the Author code?
Still having problems with the Author field. It does seem to be recording the Author field in the Media/Assistant table. But the Author field on the media page is not being completed. Is it possible that I didn’t set up this field properly? Perhaps I should delete that field and set it up again using MLA, if that’s possible?
Thanks again for your patience!
Thanks for your update with the good news on your progress.
No, you do not need
template:
in the custom field rule, which knows it’s a template because of the value in the Data Source dropdown list.You asked “It does seem to be recording the Author field in the Media/Assistant table. But the Author field on the media page is not being completed. Is it possible that I didn’t set up this field properly?” Yes, it’s possible that you now have two separate “author” fields. When you set up the rule, did you find the field name in the dropdown list or did you have to enter it as a new field? If you go to the Media/Edit Media screen, do you see more than one “author” field? When you write “the media page” do you mean the front end page for the item or something else? How does the author value end up on the media page; is it a function of your theme?
These are some of the clues you can look for. Any additional details you can provide will help me be more specific. Thanks.
I think there’s a conflict as there was already an “author” field which records the name of the person who posted a page, post, or some other WP element. So I deleted the second author field completely to start fresh.
What I would like to do is create 2 new metadata fields for this grouping of about 1500 PDFs. Each PDF is a magazine article. The metadata that I’m trying to capture from the PDFs is each article’s “Title” (which we’ve already been successful in doing), and each article’s “Author”. Perhaps rather than naming this field “Author” it might be better to name it “Article Author” to differentiate it from the existing WP author field.
Can MLA set up this new field for me, or do I still need to create the new field using the Custom Fields plugin?
Again thanks for sticking with me on this. I think we’re nearly there!
Thanks for your update on your progress. Yes, WordPress does define an “author” of its own and the confusion is understandable. Creating a new field with a different name, e.g., Article Author, is an excellent idea.
There are at least two ways you can define a new custom field with a simple text value that do not require the Advanced Custom Fields or another similar plugin:
- You can use WordPress itself. Navigate to the Media/Edit Media screen for one of your Media Library items. Make sure the Screen Options box for “Custom Fields” is checked. Scroll to the bottom of the Custom Fields box and find “Add New Custom Field:”. Click the “Enter new” link below the Name dropdown control. Enter your field name, e.g., “Article Author”, in the text box that shows up. Give your field a value, and click “Add Custom Field.
- You can use MLA’s Custom Fields tab to define a new mapping rule, following the steps I gave in my first response. Step 4. tells you how to define a new field using this method. The field will be created when you execute the new rule for one or more items.
If you create the field with WordPress it will show up in the “Name” dropdown list, but it must have a value defined in one or more items to do so. If you select “Replace” for the “Existing Text” setting of your rule it won’t matter what you enter in the WordPress text box.
I hope that gets you the results you are seeking. I am marking this topic resolved, but please update it if you have any problems or further questions regarding MLA’s custom field support. Thanks for working with me on this topic.
Finally got it all working! Very much appreciate your help.
One further question: is it possible to add another column to the Media Library Assistant page to display my “categories” so that I can do a Bulk edit on them at the same time I’m double checking my entries? “Categories” is not an option in the “Screen Options” pull down at the top of the page.
Thanks!
Thanks for your update with the good news regarding your progress.
Displaying your “Categories” depends on where they are stored. If you mean the WordPress Categories taxonomy the answer is simple. Navigate to the Settings/Media Library Assistant General tab and scroll down to the Taxonomy Support section. There you can check the appropriate boxes to add any of your taxonomies to the Media/Assistant submenu table.
If your “categories” are stored somewhere else, give me more details so I can be specifically helpful. Thanks.
David, everything was going fine for awhile. Now, not all the metadata is being read by MLA. I know the Adobe Acrobat Properties metadata is being stored because I can view it. But for some reason many property fields in quite a few files are being ignored by MLA. For example, below is metadata for two sequentially numbered PDF files. You can see that a lot of metadata fields are not being read for the first file. The second file shows all the fields being read by MLA including the crucial “pdf:Author” and “pdf:Title” fields. Is there anyway to force MLA to read all the metadata fields? Thanks!
FIRST FILE
post_id => 5356
pdf:PDF_Version => PDF-1.6
pdf:PDF_VersionNumber => 1.6
pdf:First => 121
pdf:N => 17
pdf:Type => XObject
pdf:BitsPerComponent => 8
pdf:ColorSpace => DeviceGray
pdf:Height => 1656
pdf:Interpolate => true
pdf:Mask => 23 0 R
pdf:Subtype => Image
pdf:Width => 2339
pdf:ImageMask => true*************
SECOND Filepost_id => 5357
pdf:PDF_Version => PDF-1.6
pdf:PDF_VersionNumber => 1.6
pdf:Author => Philip Barker
pdf:CreationDate => 2014-10-28 22:08:37
pdf:Creator => Canon iR-ADV C5250
pdf:Keywords => 2010 July Vol. 18 No.2 p. 32-37
pdf:ModDate => 2017-01-18 15:23:56
pdf:Producer => ABBYY FineReader 11
pdf:Subject => Journal of Olympic History
pdf:Title => The Hidden Legacies of Moscow ’80: Changes in Ceremonial and Attitudes
pdf:BitsPerComponent => 8
pdf:ColorSpace => DeviceGray
pdf:Height => 2341
pdf:Interpolate => true
pdf:Mask => 43 0 R
pdf:Subtype => Image
pdf:Type => XObject
pdf:Width => 1630`Thanks for your report; I regret the trouble you are having.
If you can post a link to one or more of the files that are not being processed correctly I can investigate further. You can also contact me at my web site if sending the file(s) by email would be easier. Thanks for any examples you can provide.
Thanks David. Here are links to two PDFs which are not being processed correctly:
https://isoh.org/wp-content/uploads/JOH-Archives/JOHv18n3g.pdf
https://isoh.org/wp-content/uploads/JOH-Archives/JOHv18n2j.pdfFor comparison, here are links to two PDFs which were correctly processed:
https://isoh.org/wp-content/uploads/JOH-Archives/JOHv18n1k.pdf
https://isoh.org/wp-content/uploads/JOH-Archives/JOHv18n1i.pdfThanks for the links to good and bad documents. I have downloaded them and run some quick tests.
The bad documents have values for Author, etc. embedded in XMP metadata blocks, but these blocks are not referenced in the “Root Dictionary” of the document. I will have to investigate further to see how MLA can access and extract this data. I will post an update here when I have progress to report. Thanks for your help and your patience.
Hi David. For some unknown reason, MLA seems to be reading the PDF metadata with no problem now! If you made any changes, I guess they worked! Thanks for all your help on this. MLA saved me so much time.
You wrote “For some unknown reason,“. Indeed; I have not changed anything and I don’t believe in spontaneous fixes.
If you go to the Media/Edit Media screen for one of the two “bad” files you posted a link to does it now show more metadata than it did a few days ago?
I have reproduced the problem in my copies of your two files and I am working on a more deliberate fix. What an interesting issue…
- The topic ‘Mapping PDF fields’ is closed to new replies.