OPC IT. Managed services and web development in Canberra.

Feeds: Consume everything

Feeds: Consume everything
Date: 
2 August 2016
Authored by: 
Toby Wild

This post will come in two parts.  First, a bit of a blog about why Feeds is great, and second, a tutorial on how to tamper and import a CSV file.

 

One of my favourite Drupal modules of all time is the Feeds module.

It’s not only a brilliant module during the development phase of a website as a powerful content importer to take the pain out of manual migration, but it’s also a great on-going live site module due to its ability to consume content and create nodes and users on a schedule.

Its basic functionality is that it can import structured data and create nodes, users or taxonomy terms.

This means you can take a CSV or XML file, or point to an RSS feed, and use Feeds to create, update and delete nodes or users on your site.

And once a piece of content is a Node, you have full access over every aspect of it, including using other modules to manipulate the data, and Views to display, or even export it back out.

More than once I have used Drupal Feeds and Views to import an Excel file, mess around with the data, and then export another Excel file, only to throw away the Drupal installation because it was no longer needed.

Feeds can be used for one off imports of 10,000 news items your importing from an old website, you can use it to run hourly imports of JSON data to keep your website up to date with external systems, or you can use it to sync users between two sites, ensuring that only new records are created, and old users aren’t touched.

It even has a robust API and hook system inside Drupal which can allow you to make and manage your feed importers without even using the friendly (but slightly complicated) GUI that comes with it.

But the truly great thing about Feeds is a few of its companion modules namely Feeds Tamper and Feeds extensible parsers.

Feeds extensible parsers (Feeds Ex) adds some additional parsers to the basic Feeds.  It includes the ability to import both XML and JSON structured files.

Feeds Tamper is a must have addition.  It allows you to ‘tamper’ with the data between the source and the database.  This is great if you need to automate an import and the output from a third party system isn’t just right.  You can use re-writes, find/replace, convert URL’s, combine fields and much more.

The one feature I use from Tamper the most, is Explode, which allows you to take a comma separated list, such as a list of taxonomy terms, and import them as multiple separate terms, rather than one long string.

And now, the tutorial on how to actually use Feeds…

What you need

Aside from a Drupal installation, and a source of structured data, at the very least, you will need:

  • Feeds, the core module you need
  • CTools, which I’m sure you have already
  • Job Scheduler, which allows you to create the repeating imports

While those three are the core of what you WILL need, for this example, I will be also using Feeds Tamper.

How you can put it all together

In this example, we will go through how to take a simple CSV file with a few columns, and import them into the website as Nodes.

In order to complete this task, we need three things.

  1. Structured Data saved as a CSV file
  2. A content type for the data to be inserted into
  3. A Feed Importer

Here is some fake data that should work well for this example.

*Note: I have created two Basic Pages in the site, one called Website and one called Test Item.

1 row heading, which will allow us to map which column goes into which node field.
8 rows which will turn into 8 nodes.
3 special fields, taxonomy terms, Image URL and Entity Reference.

The first column is our GUID, our unique ID.  Every row you import, through all your feeds importers, requires a unique GUID.  This allows Feeds to track which content can be “Updated” or “Replaced” without worrying about Drupal’s inbuilt Node ID’s.

Now, we need the content type that we want to import this content into.

I created a new content type, and have the following fields:

Title

title

Text

 

Taxonomy

field_taxonomy

Term Reference

Unlimited values

Image

field_image

Image

 

Reference

field_reference

Entity Reference

Node: Basic Page

 

Note how the field machine names don’t match the Excel files column headings, that’s fine.  The actual mapping between the fields and the source data is done when we setup the Feed.

Once you have your modules in the file system, go ahead and activate:

  • Feeds
  • Feeds Admin UI
  • Feeds Tamper
  • Feeds Tamper Admin UI

You will be prompted to enable Job Scheduler, which is fine.  You will notice we skipped Feeds Import and Feeds News, those are just two example modules.

Open up [site:url]/admin/structure/feeds and click “Add Importer”

Provide a name and description and click “Create”

Now, there are quite a number of configuration options in a Feed, so I will break down each field below, and how I configured it for this one off data import.

Feeds Importer Fields

Each of the settings are held in a group, I will break each section down into its own Group

Basic Settings

Click on ‘Settings’ under the Basic Settings heading on the left.

Name and Description are self-explanatory.

Attach to content type should pretty much always be on ‘Use standalone form’.  Attaching a Feed to a Content Type would only be useful if you were creating a node for every news website you read, and the Feed could be that sites RSS feed.

Periodic Import can be used if you want an import to be run continuously.  This will only run as often as your Cron runs.  This example is a one off import, so I will turn this off.

Import on submission is pretty obvious and should be checked, unless you’re doing some funky API, Cron or Job Scheduler shenanigans.

Process in background allows you to import HUGE numbers of nodes and it’s processed by a batch API inside Drupal.  I’ve never had much luck with this, instead you’re better off breaking your Excel file down into multiple files with x thousand rows each and running multiple imports.

Don’t forget to Save between each settings page.

Fetcher

Here we can set if we will be importing this content from a file uploaded manually by a person, or from a web URL address.

For the most part, and in this example, we will be providing a File Upload.

Depending on your choice, you have different settings.

File Upload Settings allows you to set where your file will be temporarily uploaded to, and what formats it allows.

HTTP Fetcher Settings allows you to configure a few options regarding RSS Feeds.

Parser

Select a Parser allows you to choose how your file will be dissected, or “parsed” by the importer.  In this example, we will be providing a CSV file, so we need the CSV Parser.  But this is where you would select XPath for XML file for example if you had installed the Feeds extensible module.

CSV Parser Settings allows you to select your delimiter, which is your character between the fields of data.  Since we are using CSV, our delimiter is a Comma.

No Headers should be left unticked, as we will be providing headings.

File encoding, I usually leave this as UTF-8, however if you have encoding troubles, try Auto instead.

Processor

Select a processor allows you to choose what you are importing, be it Nodes, Terms or Users.

Node Processor Settings is where the first section of important settings are held.

First up, select the ‘Bundle’ or ‘Content Type’ we will be importing into.

Insert new nodes allows you to import all your items, or only import items that match your unique ID (GUID) that have been imported previously.

Update existing nodes allows you to overwrite existing imported data.  Handy if you are periodically updating the same nodes.  In this sample instance, it’s a once off import, so I leave it on “Do not update existing nodes”.

If Skip hash check is unchecked, then if you try to import the same file twice, the second time will not run because the data hasn’t changed.  It doesn’t take into account if you have tweaked the importer or a Tamper setting.  Handy if you are pulling content from a remote URL and only want to bother updating the nodes if something has changed.  However, since no import ever runs smoothly, we want to be able to import the same file over a few times while we tweak the importer, so I usually tick this to Skip the Hash check.

Text format governs the default format for any text areas that you import.

Action to take when previously imported nodes are missing in the feed means that if you have previously imported nodes with GUID001, GUID002 and GUID003, and now your importing only GUID001 and GUID003, what does Feeds do with GUID002.

This can be handy if items are being removed from your master import list, then anything that was previously imported that isn’t in the most recent list, can be deleted.

Author is which Drupal user should be assigned as the author of the nodes.  I usually set this to my user account.

Authorize checks that the selected user has the correct permission to create the content.  Assuming the Author you have chosen is an Admin, it’s best to uncheck this, as it is one less function the import process has to run.

Expire nodes sets a timer for when the nodes should be automatically deleted.

Mapping

The final setting button, and the one where we assign which Excel column will match into which Node field.

The text in the Source column needs to match, letter for letter, to the column name in your Excel file.

The target is where your field data should be saved.

Some Target fields have multiple sub-fields.  For example, if you are importing an Image, you have a chance to include the Alt text field.  For the file itself, point it to the :uri target, as this will take any accessible URL, and download the file into Drupal.

As mentioned before, we need to have a unique identifier for each row.  This has to be flagged in the mapping. In this example I have clicked the cog for the guid row and select that row will be my unique identifier.  And for taxonomy, I clicked its cog, and ticked ‘Auto create’ so that any missing terms are automatically generated.

And the target I selected for Entity is a simple Entity Label lookup, meaning it will look for a simple text match between what I provide, and what it can assign.

Now, we are almost done, except we need to Tamper with the data before it’s imported.

Click on the Tamper button in the top right of the page.

This allows us to alter the data is various different ways.

The GUID and Title fields are fine, they are text into text.

The Taxonomy field however, if we import as is, the entire Excel Cell will be imported as a term.

Instead, what we need to is Explode the list so we get an array of multiple terms.

So for the entry below tax -> Taxonomy click ‘Add Plugin’.

Here you can see the list of various plugins you can add to tamper with the data.

Select Explode.  The default options it provides is suitable for a standard comma separated list, so click Add.

Image -> Image: URI is also, absolutely fine.  Feeds is smart enough that if given a URL, it will fetch the file and download it.  This goes for Images and Documents.

Finally

We have our Excel file, saved as a .CSV file.

We have our Content Type with our fields.

We have our Feeds importer setup with the field mapping.

And finally, we have our Feeds Tamper setup to alter the data between the CSV file and Drupal.

Importing

Now, the final step is to put them all together.

Open up [site:url]/import

Select the importer we created

Use the ‘Choose File’ button and select your CSV file and click Import.

A progress bar will fly along, and then your content will be imported.

If anything goes wrong, you can select the ‘Delete Items’ button in the top right, then alter the various settings and try again.