An Interagency Model for Collaboration and Operation
Background - Relationship to CENDI - Funding - Operations - Milestones
PowerPoint file of presentation (11-MB file)
Slide 1: An Interagency Model for Collaboration and Operation
Background - Relationship to CENDI - Funding - Operations - Milestones
CENDI Meeting, Nov. 4, 2010
Sharon Jordan
Assistant Director
DOE Office of Scientific & Technical Information
(Operating Agent for Science.gov)
Slide 2: What Is Science.gov?
A Unique Collaboration with Tangible Results!
- An interagency science discovery tool, providing single-query access to multiple government-sponsored R&D results and other S&T information
- A cross-agency search that integrates and simplifies access to 200 million pages of content from 14 U.S. science agencies
- The "USA.gov" science portal (formerly "FirstGov for Science")
- A voluntary large-scale collaboration of U.S. government agencies
Drills down to selected databases and websites in parallel, then presents relevancy-ranked search results
Slide 3: How Did It Begin?
- Two workshops spawned origin:
- 2000: Blue-ribbon panel explored concept of a physical science information infrastructure. http://www.osti.gov/physicalsciences/wkshprpt.pdf This prompted interagency involvement.
- 2001: "Strengthening the Public Information Infrastructure for Science" http://www.science.gov/workshop/index.html Here the interagency Science.gov Alliance was formed
- Participants included federal agencies, academia, information professionals and science experts.
- Science.gov gained approval as "Firstgov for Science" in early 2002
- Science.gov was launched in December 2002.
Slide 4:
Founding Agencies in 2001- Department of Agriculture
- Department of Commerce
- Department of Defense
- Department of Education
- Department of Energy
- Department of Health and Human Services
- Department of Interior
- Environmental Protection Agency
- National Aeronautics and Space Administration
- National Science Foundation
- Department of Transportation
- Library of Congress
- United States Government Printing Office
- National Archives and Records Administration
- United States Forest Service
- National Institute of Standards and Technology
Support and coordination by CENDI – an interagency forum of senior information managers
Slide 5:
Shared Premises
- Science is not bounded by agency, organization or geography
- Each agency has vast stores of information that fulfill its mission
- A single web gateway is the tool of choice*
- A commitment to voluntary collaboration is necessary
*In OCLC Perceptions of Library and Information Resources, it was reported that 84% of public began search using search engines; only 1% began with online databases. Thus a "Google-like" easy search of authoritative sources with relevant results was desired.
Slide 6:
Integration Challenges
- Broad scope of Federal science and technology research and development missions
- Wide-ranging interest of potential audiences
- Information organization (taxonomy) issues given the broad scope of disciplines and audiences
- Blending information resources from different agencies into cohesive functionality and page design
- Politics, human resources, funding, sustainability
Slide 7: Guiding Principles for Content
√ Select authoritative web-based government-sponsored information resources
√ Rich science content, not merely organization pages
√ Databases contain primarily R&D results in the form of STI (bibliographic data and/or full documents)
√ Supplemented by websites for currency
√ Only freely available content that is well maintained
√ Our audience is "the science-attentive citizen"!
Slide 8:
Agency Potluck
- Agencies brought to the Internet table their unique information specialties and resources
- Flagship service a commitment
- Notable contributions of many:
- Science.gov Alliance and CENDI - seized opportunity without mandate
- FirstGov.gov - supported the early stages with advice and two grants
- Member agencies - provided participation of 200 staff members to working teams
- NLM – provided usability testing prior to initial launch
- USGS – managed original website search engine (surface web search)
- NTIS - created initial catalog of S&T websites
- IIa Inc. – provided secretariat support (CENDI special task)
- DOE/OSTI - conceived idea, developed technologies/deep web search and hosted website
- NAL and USGS – provided Science.gov Alliance co-chairs
Slide 9: Collaboration Is Key
- Alliance enjoyed extraordinary voluntary collaboration
- Vision and strategic direction provided by Alliance principals
- Administration provided by Chair(s) selected from Alliance
- Technical team provided original technical direction and recommendations
- Major support provided by CENDI
- Additional task groups formed as needed
- Science.gov taxonomy
- Content guidance and development
- Website management and redesign
- Outreach activities
- Enhancement development
- Subject expansion
- Image library
Slide 10: The Funding Approach
- Built and maintained with "in-kind" contributions: each agency's staff time and existing information resources
- Initial development benefitted from CIO Council e-gov grants for catalog + initial deep web search
- Alliance annual dues help fund routine operations
- CENDI support leverages resources
- In-kind contributions supported special events
- SBIR R&D resulted in innovations that were implemented in subsequent versions
- "Pass the hat" contributions to take advantage of an opportunity, such as Version 3.0 development
Slide 11: Science.gov Funding
Doing "a lot with a little" by implementing creative funding methods
- 2001: Cross agency portal grants: $170,000
- 2002: DOE SBIR conducts relevancy ranking research
- 2003-2004: Voluntary Pass-the-Hat contributions: $200,000
- 2001-Present: Participating agencies and in-kind support develop and maintain Science.gov. Average since 2005 = approx $180K annually (fees plus in-kind support)
Slide 12: CENDI
- CENDI promotes the productive intersection of science content, technology and interrelationships
- The Alliance, made up of CENDI agencies plus others, provides direction and support for this intersection in the form of Science.gov
- Through financial and in-kind commitments from its agencies, CENDI provides the ongoing infrastructure needed to offer a large-scale collaboration across organizational boundaries
Slide 13: Overview of CENDI Finances
Total Membership Funds Are Combined into One "Pot"
CENDI Reserve
Executive Secretariat for CENDI Includes Science.gov Support
Maintenance Costs include Alliance Only dues*
A portion of Secretariat effort is used for Science.gov Tasks
*Science.gov Alliance Only dues are deposited into the CENDI treasury, with option of being used for direct costs/purchases for Science.gov (such as exhibit expenses) or being included in funding for overall Secretariat support of Science.gov.
Slide 14: Content Management Is Distributed
- NTIS developed the original "catalog" with input from agencies
- CENDI Secretariat now maintains catalog with agency participation
- Agency content managers submit and edit their information via a web form
- Websites identified in the catalog were indexed by USGS; now done by OSTI
- Deep web databases are identified by agencies and reviewed by team for suitability
- Real-time search of content in large databases is maintained by OSTI, which continues to host the website and serve as operations manager
Slide 15: The Alliance Members' Page
Provides administrative information, meeting minutes, usage statistics, content selection and cataloging guidelines, subject category information, and outreach materials such as presentations and flyers.
Slide 16: Metadata Input System: For Websites in Searchable Index ("Surface Web" portion of Science.gov)
Provides Alliance members and content managers a secure tool to quickly retrieve Agency metadata, add or edit resource records, and expedite the maintenance and quality control of the metadata and URLs.
Slide 17: Development Milestones
- Science.gov Phase 1 (2001-2002)
- Established policy & governance, technical design teams
- Agreed on goals, policies, website look & feel
- Created taxonomy
- Selected, cataloged and indexed agency resources
- Version 2.0 launched May 2004
- Introduced relevancy ranking of metasearch results
- One-step search across ALL databases
- Added advanced search
- Version 3.0
- Enhanced precision searching, metarank & boolean/fielded searching
- Other types of science content explored
- Version 4.0
- Enhanced relevancy ranking, also full-text relevancy ranking
Slide 18: Development Milestones
- Version 5.0 (Sept 2008)
- Clustering of results by subtopics or dates to help target your search
- Wikipedia results related to your search terms
- EurekaAlert News results related to your search terms
- Mark-and-send option to email results to friends and colleagues
- More science sources for a more thorough search
- Enhanced information related to your real-time search
- New look and feel
- Updated Alerts Service
- Standardized citation formats available for download
- Version 5.1 Aggregated news feeds from 11 science agencies
- Internships and Fellowships section made searchable
- Image Search Library (Coming soon!)
Slide 19: Science.gov Today
Science.gov: Finds Content from 200 Million Pages at 2100+ Websites and 42 Databases with One Query
- Searches selected websites ("surface web") and databases ("deep web") from one search point
- Combines results from all sources, ranks and displays by relevance and clusters
- Sends weekly "alerts" for user-defined topics of interest
- Displays related Wikipedia and EurekAlert items
- Provides browsing of selected websites
- Displays an integrated news feed from science agencies
- Links to special collections and other information
- Featured search and sites highlight hot topics
Slide 20: 42 Large Scientific Databases
Slide 21: Easy-to-Use Search
Get the simplicity of a "Google-type" search box; get results that are not "Google-like" at all.
Less than 1% overlap with Google; approximately 3.2% overlap with Google Scholar
Slide 22: Precise, Accurate Results
Slide 23: More About Science.gov You May Not Know
- Goes where traditional search engines cannot go. Full-text documents if searchable on the target site are searchable via Science.gov.
- Real-time search: If a target database adds a document or record, it is available on Science.gov immediately
- During the query, the most-relevant documents or records from each source are gathered – approx 100-200 from each source – and then the combined set is relevancy ranked
- Topic and date clusters for search results – subtopics, publication years displayed on-the-fly to enable efficient drilling down
Slide 24: Usage Continues to Grow
Science.gov Page View Totals (Dec 02 - Sep 10)
FY10 - 5,166,126
FY09 - 4,074,747
FY08 - 2,946,801
FY07 - 2,591,717
FY06 - 2,593,449
FY05 - 1,793,483
FY04 - 965,146
FY03 - 751,180
Slide 25: Notable Achievements
- Large voluntary collaboration between agencies is often cited as a model
- Collaboration AND infrastructure served as model for WorldWideScience.org; then Science.gov became U.S.'s contributed content
- Also a model for ScienceEducation.gov
- A top 10 Google result for "science" with other major science outlets
- Provides core project for spin-offs such as Science Internships, Aggregated Science News, Science Image Search – and more!
Slide 26: Science.gov In the News
Science.gov is among 10 government websites "meeting and exceeding" the Obama Administration's transparency goals, according to a special report by Government Computer
News, released July 27, 2009.
Slide 27: U.S. Department of Energy Office of Science
osti.gov
Real Time Search?
|
Relevancy Ranked? |
All Govt. Science?
|
Known Sources?
|
Scholarly Info?
|
Ads?
|
|
science.gov 5.0 | X | X | X | X | X | |
WorldWideScience.org | X | X | X | X | X | |
Google Scholar BETA | X | X | X | |||
X | X |
Slide 28: Content and Purpose: Science.gov vs Data.gov
- Searches for science topics at the full record level
- Ease of searching, with immediate, useful results
- For the science-attentive citizen including researchers, teachers, students, business people, and the general public
- A Google-like interface with an advanced option for power users
- Drills down into the "deep web"
Examples:
- 2668 results for diabetes from 35 sources;
- 2772 results for climate change from 38 sources
- Searches at the source level only, not at the record level
- Interface with search results pointing only to sources or databases
- Emphasizes machine-readable datasets, available in raw formats; some files are quite large, ranging up to hundreds of megabytes
- Data generally requires additional manipulation; of limited use to general public. Expect public interest groups, reporters, academics, and others to review information, build interfaces, and report on findings
Examples:
- Zero results for specific terms such as diabetes
- One result (database pointer) for climate change
Slide 29: U.S. Department of Energy Office of Science
science.gov
|
data.gov |
|
Ready to use info. with user friendly interface?
|
X | |
Record level information? |
X | |
Science research and results only? |
X | |
Information from multiple agencies? |
X | X |
Repository of datasets and tools?
|
X | |
Provides pointer to database/source? |
X | X |
Slide 30:
√ A perfect platform on which to launch new technologies
- Access to new forms of STI
- Translation
- Precision searches
- Image searching
Current Science.gov Prototype
Slide 31: Future Opportunities
What will Science.gov 10.0 look like?