The National Institute of Standards and Technology (NIST) is one of the nation's oldest physical science laboratories with the largest number a physics Nobel prizes (4) of any U.S. government laboratory. It’s a combination government agency and research university with all the complexity that entails—numerous authors, lots of content of all quality levels, lots of rogue systems, etc.
NIST has been planning to move nist.gov from a proprietary CMS to Drupal for about a year. A large part of that was figuring out how to move the large amount of content that exists in the current system to Drupal. Our migration as currently configured is pulling:
- 50,000+ nodes
- 20,000 images
- 16000 uploaded docs
- 4600 Staff/Users
- 3 systems-of-record
This session will go over our path from staring at the mountain of content, to starting to tame it, to making sense of it. We’ll cover the basics of the Migrate framework and go over some techniques we implemented such as:
- database and JSON sources
- maintaining source data relationships
- fixing broken links
- high-water marks to import just deltas
- use Migrate to pull content from outside systems-of-record
- massaging data before insert
We’ll have some code examples and demos to show and will field questions with the caveat that Migrate is incredibly deep and we are by no means experts but some folks who grokked it enough to get our project rolling.
Presentation: https://github.com/johnnykrisma/migrate-presentation
Source Code: https://github.com/johnnykrisma/nist-migrate-package