National Academies Press Presentation
Discovery, Exploration, Distillation: Implicit Connections, Extractive Power

Michael Jensen (
Director of Publishing Technologies, the National Academies Press (
Director of Web Communications, the National Academies (

Presentational Plan

  • Overview of NAP -- why we do what we do
  • Description/Demonstration of Knowledge Discovery/ Exploration/Distillation tools in NAP's controlled environment
  • Discussion of Possible Applications
  • Summary of KD/E/D Implications
National Academies Press

  • Publisher for National Academy of Sciences, National Academy of Engineering, Institute of Medicine, National Research Council
  • Publishes ~200 reports/year advising the nation on science, engineering, technology, medicine, and health policy
  • > 3200 reports fully, freely browsable online
    (> 550,000 pages available, each printable)
  • > 12,000,000 visitors/year
    (~ 250,000 per week, 2003)
  • 110,000,000 page views/year (65 million Openbook pages, 45 million other) (~ 2,000,000 per week, 2003)
  • NAP has been digitizing publications for free online dissemination since 1996 (page images, page-based HTML, PDFs, TEI XML)
Overall Missions of
The National Academies Press

Dual, Competing Missions:

Dissemination: generate the most influence and impact by getting reports into the most hands and minds

Cost Recovery/Self-Sustainability: NAP is required to be self-sustaining through sales of content

These two missions are drivers for all NAP activities.
Sample Web Postings from
Sept 6 through Sept 10, 04

How does this pertain to current discussion?
  • Search, exploration, and navigation tools for diverse resource set need more than precision
  • Visually demonstrating what is likely to be the expectation from users, in five or so years
  • Examples of extractive power (distillations, content re-use, etc.) as "brain food" as you continue to develop your applications
Knowledge Discovery Tools and
Presentational Models

  • Discovery Engine, an integrated search results set with intrinsic exploration tools
  • Find More Like Anything, (in this case "Meeting the Energy Needs of Future Warriors" book) designed to allow people to "home in" on resources. Uses algorithmically extracted key terms from any chapter, document, or book
  • Research Dashboard (for the same book), a means of targeted exploration, and reuse of resources to build targeted searches of Google, and soon to be others, as well as the National Academies Press.
Knowledge Discovery Tools and
Presentational Models, II

  • Reference Finder, a Web form into which one can drop a rough draft or an article to "find more like" it.
  • Within-Pub Options, to point to appropriate pages of a document
  • Skim View of any chapter; identifies the most relevant sequential sentences from every page, and displays for user.
Knowledge Discovery Tools and
Presentational Models, III
(Innovations in Alpha and Beta Testing)

  • Search must be more than search
  • Precision has value, but can be limiting
  • Knowledge discovery (allowing serendipitous discovery) of resources must be considered
  • Once content found, enable exploration via abstract, or skim, or targetted search
  • Every document should be a a "portal" to existing similar resources
  • Skim options are valuable
  • Many possibilities for Knowledge Discovery, Knowledge Exploration, and Content Distillation