.
.
.
.
.
.
.
.
.

 

 
SAIC

1710 Goodridge Drive, MS T2-5-1

McLean, VA 22102

 

 

 


horizontal rulehorizontal rule

.
.
.
.
.
..
.
.
.

 

 

spacer

 

..........

 
Science Applications International Corporation

Intelligence Community
Extensible Markup Language (XML)    Final Report

 


P
repared for the Office of Advanced Analytical Tools,
Central Intelligence Agency




3 November 1999

 

 

 

 


 

spacerIntelligence Community Extensible

Markup Language (XML) Prototype

Final Report

Table of Contents

Table of Contents................................................................................................................................................................................. i

List of Figures.......................................................................................................................................................................................... ii

Executive Summary............................................................................................................................................................................... 1

Background.......................................................................................................................................................................................... 1

Findings (Design).................................................................................................................................................................................. 1

Shortcomings/Issues........................................................................................................................................................................ 2

Recommendations............................................................................................................................................................................. 2

1        Scope.................................................................................................................................................................................................... 3

1.1      Identification and Purpose.............................................................................................................................................. 3

1.2      Overview....................................................................................................................................................................................... 3

2        Referenced Documents............................................................................................................................................................ 3

3        Project Background............................................................................................................................................................... 4

3.1      Project Scope.............................................................................................................................................................................. 4

3.2      Problem Statement............................................................................................................................................................... 4

3.3      Requirements Overview....................................................................................................................................................... 4

3.3.1       Content Manipulation......................................................................................................................................................... 4

3.3.2       Content Storage.................................................................................................................................................................... 5

3.3.3       Content Delivery................................................................................................................................................................... 5

3.3.4       Security and Other Enhancements.................................................................................................................................... 5

3.4      Design Overview........................................................................................................................................................................ 5

3.4.1       Design Concept Architecture............................................................................................................................................. 5

3.4.2       Content Manipulation......................................................................................................................................................... 6

3.4.3       Content Storage.................................................................................................................................................................... 7

3.4.4       Content Delivery................................................................................................................................................................... 7

4        Configuration Management................................................................................................................................................ 8

5        Design and Development Process.................................................................................................................................... 9

5.1      Content Manipulation........................................................................................................................................................ 9

5.1.1       Structured Authoring........................................................................................................................................................... 9

5.1.1.1     Candidate Tools.................................................................................................................................................................. 9

5.1.1.2     Technical Highlights.......................................................................................................................................................... 10

5.1.2       Unstructured Authoring.................................................................................................................................................... 10

5.1.2.1     Candidate Tools................................................................................................................................................................ 11

5.1.3       Comment and Review........................................................................................................................................................ 12

5.1.4       Lessons Learned (Recommendations)............................................................................................................................ 12

5.2      Content Storage................................................................................................................................................................... 13

5.2.1       Design Rationale................................................................................................................................................................ 13

5.2.2       Candidate Tools................................................................................................................................................................. 13

5.2.3       Technical Highlights......................................................................................................................................................... 13

5.2.3.1     Content Management System........................................................................................................................................... 13

5.2.3.2     Database Access................................................................................................................................................................ 14

5.2.4       Lessons Learned (Recommendations)............................................................................................................................ 15

5.3      Content Delivery................................................................................................................................................................... 15

5.3.1       Design Rationale................................................................................................................................................................ 15

5.3.2       Candidate Tools................................................................................................................................................................. 15

5.3.3       Technical Highlights......................................................................................................................................................... 16

5.3.4       Lessons Learned (Recommendations)............................................................................................................................ 16

5.4      Security and Other Enhancements........................................................................................................................... 17

5.4.1       Security................................................................................................................................................................................ 17

5.4.2       Metadata.............................................................................................................................................................................. 17

5.4.3       Searching............................................................................................................................................................................. 17

5.4.4       Linking................................................................................................................................................................................. 17

5.4.5       Geographic Information Systems.................................................................................................................................... 18

6        Conclusions................................................................................................................................................................................. 18

6.1      Project Highlights................................................................................................................................................................ 18

6.2      Lessons Learned..................................................................................................................................................................... 18

6.3      XML’s Applicability to the Intelligence Community...................................................................................... 19

APPENDIX A   Software Cost........................................................................................................................................................ a-1

 

List of Figures

Figure 1:  IC Functional Design Architecture......................................................................................................................... 6

 

 


spacerIntelligence Community Extensible Markup Language (XML) Prototype

Final Report Executive Summary

 

Executive Summary

Background

SAIC coordinated the activities of the Intelligence Community (IC) Extensible Markup Language (XML) Study Group to demonstrate the capabilities of a comprehensive publishing management prototype based on XML. SAIC assisted the Study Group in defining requirements for an XML prototype that would satisfy many of the needs expressed by Study Group members for daily production and dissemination of intelligence data.  This prototype was to utilize the latest and best commercial off the shelf XML technology and explore and develop this technology into a system that allowing users to author, store, and disseminate disparate types of data in XML format.  The Study Group was sponsored by the Community Management Staff (CMS) .The CIA Office of Advanced Analytical Tools (AAT) acted as Executive Agent.

SAIC was charged with demonstrating what, if any, applicability XML could have on the creation, management, and dissemination of content for the Intelligence Community.   AAT directed SAIC to design and develop a prototype that integrated an XML publishing environment with a content management system for documents (or fragments thereof) and other data sources.  The system was to include dynamic linking of data and modularized conversion of discrete elements for fast, efficient XML and HTML delivery via Intelink.

Findings (Design)

Through the IC XML Prototype, SAIC makes the following finding:

Ø      XML is well suited for creation of Intelligence content.

·         There are several solutions to the form of authoring:

·         Structured authoring tools are the most effective, but require the greatest amount of institutional change.

·         Unstructured authoring tools, while requiring less learning by analysts, require additional steps for data validation and integration.

 

Ø      XML is well suited for management of intelligence content.

·         XML enables management of the data at a highly granular level if desired.

·         XML enables management of many disparate sources of data.

 

Ø      XML is well suited for dissemination of intelligence content.

·         XML enables separation of content from format.

·         XML enables the ability to write once and use many times by applying a CD-ROM, print, and web style sheet to the same data.

·         XML enables dissemination of data based on target security domain.

·         XML enables dissemination of user-specific views.

·         XML enables reuse of XML-tagged content across organizations or agencies.

Shortcomings/Issues

Though the project showed many advantages of XML, several shortfalls were also discovered:

Ø      XML standards for linking are immature and are not immediately useable .

Ø      Software vendors do not uniformly implement recommendations to the XML Specification.

Ø      Integration of tools remains the most costly, time-consuming portion of a system development. Data analysis also requires considerable resource investment..

Ø      Vendors do not handle common XML/SGML files such as catalog, entity, or initialization files in a uniform manner.

Recommendations

Through the activities of the IC XML Study Group, the following recommendations are made:

Ø      The IC should move to a structured authoring environment using XML.

Ø      The IC should move to an object-oriented content management/storage environment using XML.

Ø      The IC should move to an XML environment for the delivery of content via primarily Intelink, but also for hardcopy and CD-ROM media delivery.

 


spacerIntelligence Community Extensible Markup Language (XML) Prototype

Final Report

 

1                        Scope

1.1        Identification and Purpose

This Final Report is prepared under the direction of the Office of Advanced Analytical Tools, Central Intelligence Agency (hereafter referred to as AAT), by the Science Applications International Corporation Applied Content Technologies Team (hereafter referred to as SAIC) for the Intelligence Community (IC) XML Study Group. The purpose of this Final Report is to present the findings of the development of the prototype with emphasis on the technical highlights, advantages, disadvantages and shortfalls of utilizing XML for production, storage, and dissemination of intelligence data.  Additionally, this report provides recommendations for utilizing XML technology in the Intelligence Community.

1.2        Overview

This Final Report summarizes the activity of the SAIC team during the design and development of the XML prototype for the IC.  This report includes discussion of tools considered and selected during the design and development process.  Issues, concerns, and technical shortcomings of the prototype are discussed.  In addition, the future of XML technology, particularly how it is relevant to the IC, is presented and recommendations are provided with respect to development and deployment of an XML-based environment for authoring, storage, and dissemination of intelligence data.

2                        Referenced Documents

Ø      Intelligence Community Extensible Markup Language (XML) Study and Technical Demonstrations, Technical Proposal, August 1998.

Ø      Prototype Requirements, February 1999.

Ø      Intelligence Community Program Plan, Revised 26 March 1999

Ø      Preliminary Design Review Briefing (PowerPoint), March 1999

Ø      Critical Design Review Briefing (PowerPoint), April 1999.

Ø      Science Applications International Corporation Applied Content Technologies Team, Prototype System Design Document, May 1999 and Revision dated 21 September 1999.

Ø      The IC Extensible Markup Language (XML) Study and Technical Demonstrations Prototype Delivery Demonstrations (PowerPoint), 7 July 1999.

Ø      Science Applications International Corporation, Intelligence Community Extensible Markup Language (XML) Prototype System Design Document Addendum, 3 November 1999

Ø      Science Applications International Corporation Applied Content Technologies Team, Prototype Interface Control Document, August 1999 and Revision dated 3 November 1999.

Ø      Science Applications International Corporation Applied Content Technologies Team, Prototype System Test Plan, 3 November 1999.

3                        Project Background

3.1        Project Scope

SAIC worked with the IC XML Study Group to demonstrate the capabilities of a comprehensive publishing management prototype based on XML.  XML technology provides an environment, a structure, and a methodology for creating, managing, and producing documents for printed, CD-ROM, and electronic Web/Intelink distribution.  Working with the Study Group, SAIC developed requirements for an XML prototype that would satisfy many of the needs of Study Group members for their daily production and dissemination of intelligence data.  This prototype was to utilize the latest and best available XML technology and explore and develop this technology into a system that would allow users to author, store, and disseminate disparate types of data in XML format.

3.2        Problem Statement

AAT charged SAIC with designing and developing a prototype that integrated an XML publishing environment with a content management system for documents (or fragments thereof) and other data sources.  The system would include dynamic linking of data and modularized conversion of discrete elements for fast, efficient XML and HTML delivery via Intelink.  Much of the initial conceptualization for this phase was laid out in the August 1998 Technical Proposal, Intelligence Community Extensible Markup Language (XML) Study and Technical Demonstrations.

3.3        Requirements Overview

The Study Group, comprised of members of the IC, along with SAIC, developed the requirements for the prototype.  These requirements were based on needs that community members knew existed within the IC and known capabilities of XML/SGML technology.  The complete set of requirements agreed upon by SAIC and the Study Group may be found in the Prototype System Design Document. A summary of those requirements has been organized based upon the three functional areas defined for the prototype and is described below.

3.3.1       Content Manipulation

SAIC was required to select a native XML authoring tool for creation and modification of the prototype data set.  The XML authoring tool ideally interfaces with the selected content management system.  To simplify this process, the selected candidate had this functionality already established.  Additionally, a separate unstructured word processing tool was required.  The word processor chosen must either save the documents directly as XML or save the document in a format that allows for conversion to XML through a post process.  Finally, Extensible Style Language (XSL) style sheets will be created using the XML authoring tool or by a separate third party application.  The authoring environment must also support an XML version of the CALS table model and provide support for mathematical equations, possibly through Math Markup Language (an XML application).

3.3.2       Content Storage

SAIC was required to select a content management system to integrate with the XML authoring tool and delivery system.  The system selected will be capable of document type definition (DTD) management as well as information management at a granular level (e.g. the chapter level of the CIA World Factbook).  The system will be capable of managing both document and fragment versions, and manage and identify dynamic data.  Additionally, the content repository must interface with various other data sources including, but not limited to, graphics, multimedia, other databases, and Geographic Information Systems (GIS).

3.3.3       Content Delivery

The main delivery mechanism involves a means to deliver XML documents via the Web (Intelink).  In addition, multiple style sheets will be developed for XSL control of imaging.  These formatting capabilities will address the issue of security markings for both viewing and convenience printing documents from a Web browser.  Additionally, XSL transformation of XML will be used for conversion to HTML, enabling dynamic information delivery to Intelink with minimal user intervention.  The prototype will take maximum advantage of existing Internet browser technology to seamlessly display textual data in XML while supporting a myriad of graphics formats (JPEG, TIFF, GIF, etc.).

3.3.4       Security and Other Enhancements

In addition to the above three areas, the Study Group formed a Security and Other Enhancements Working Group to address matters not directly subordinate to any of the other areas.  This Working Group directed SAIC to ensure the prototype followed CAPCO security markings, incorporated the IC metadata standards, and demonstrate enhanced XML linking capabilities.

3.4        Design Overview

3.4.1       Design Concept Architecture

As described in the Prototype System Design Document, the design concept architecture consists of three main functional areas: Authoring or Content Manipulation, Content Storage, and Content Delivery.  The goal of the prototype development effort was to integrate these areas together providing a seamless collection of environments for performing specialized functions within the IC Prototype.  Figure 1 visually depicts the functions and the interfaces used in the prototype as well as the applications used to provide the functionality.  The sections that follow provide an overview of the functionality of the prototype.


IC Functional Design Architecture

 

3.4.2       Content Manipulation

Content manipulation consists of two distinct sub-functions, an authoring environment for both structured XML and unstructured non-XML authoring, and comment/review.

An analyst using a structured XML editor creates XML documents directly based on the rules in a DTD.  The DTD provides the structure and business rules that must be followed as XML content is created or modified.  Valid XML instances created in this environment are ensured due to the interactive parser.  The nominal result of a structured XML authoring environment is “valid” XML content.

Most structured XML editors appear as a cross between a common word processor and web page authoring tool.  Analysts require training in the proper use of the application.  The training most likely will center on the rules of the content being created and, to a lesser degree, on the actual workings of the software.

The structured XML editor may also interface with the dissemination/delivery environment by serving as an on-call editor that can be spawned directly from a web browser by specified users.

An analyst using a non-XML editor creates "structured" documents, using formatting styles, from templates (.dot files) that are pre-generated from the DTD.  This provides some form of control over free-flowing content, but does not insure a valid XML instances will be produced.  Validation of the created instance occurs after the instance is saved to the file system in some non-proprietary format and then converted to XML.  If the instance is valid, then the process can proceed.  If it is invalid, then the original saved file must be returned to the non-XML editor for modification.  It can then be re-tested for XML compliance.  This process continues until the document is valid XML.

It is assumed that once documents are in XML, either natively or after conversion, they are treated uniformly.

Ideally, XML and non-XML structured editors would have tight interfaces with a content management system (CMS); however, in reality only the XML editor has this functionality.  An analyst using a structured XML editor has the capability of interfacing (through a “bridge”) with the content management system to check-in/check-out objects from the repository and to query and retrieve various repositories of information.  From the structured XML editor, an analyst can query both internal and external sources of information and import various types of media into the XML content.

The comment/review process allows reviewers outside of the authoring environment to access, review, and comment on document content.  Reviewers may access the XML documents or fragments through a Web browser application that allows them to view and comment on the selected document.  Comments are stored as custom attributes and maintained in the content repository separate from the original document.  This allows multiple reviewers to provide comments without altering the original document.

3.4.3       Content Storage

The content storage functional area consists of the content management system (CMS), an objected-oriented content repository, and all other internal and external data sources. 

The CMS object-oriented architecture allows an organization to manage structured data such as SGML or XML as well as other types of documents at the component level. The CMS provides a variety of functions.  It enables general document management capabilities such as security, access control, check in / check out, and version control / history. It provides the interface to allow system administrators to set up user access privileges for databases and various services, database applications, and SGML or XML applications (DTDs and XML instances).  The CMS provides the interface through which the XML structured editor accesses the repository (through the “bridge”). Users can manage and reuse document components using fine-grained access and version control. The CMS also controls the access to documents in the repository for services such as comment/review, and modification of documents through a Web browser initiation of the XML structured editor.

The majority of the prototype content will be stored in the content repository.  The content will include text in XML format as well as other files.  Examples of the other files include graphics such as TIFF, GIF, and JPEG, streaming video, and GIS information. In addition to the data sources stored in the XML content repository and accessed through the CMS, additional data sources may be accessed through the Web.  Additional data sources accessed by the prototype include an RDBMS that contains unclassified sample intelligence information and a GIS data source and application.

3.4.4       Content Delivery

The content delivery mechanism is separated into three functional areas: domain filtering, internal user functions, and external user functions.  Internal users are production management, staff, and authors/analysts.  Their job is to fuse information to create and distribute intelligence products stored in the content repository.  External users are anyone outside of the production environment that does not contribute directly to the production of the distributed content.

The domain filter is a series of scripting filters designed to separate XML instances into various versions based on security classification metadata.  On export from the content repository, the XML data is processed through the appropriate filter, which separates information by security class and distributes that information to the appropriate domain servers.

Internal user functions include comment/review of repository content, convenience printing capabilities, search and retrieval of distributed content and external repositories, and the capability to edit source content in the XML content repository through a link to the repository database and a structured XML editor.

External user functions include search and retrieval of distributed content and external repositories, and the capability to capture distributed content locally (as XML), spawn an XML or non-XML editor of choice, or initiate a convenience print. 

For both internal and external users, these functions are provided through a Web browser interface of various pages that are based on user profiles and security access level privileges.  The customized Web pages are designed around user types to provide specific views into the distributed content repository and enable specific functions. Due to the current status of XML as a work in progress, the software support to develop, distribute, and browse XML content is limited.  Because of this, the products we have chosen to distribute our XML content via the web convert the XML into HTML using the XSL style sheets.  This XSL Transformation language, a new standard of its own, allows the XML to be managed and maintained in the repository, but sends styled HTML to the end user.  This method allows the information to be viewed through any standard World Wide Web browser currently and allows direct application of the XML data in the future.

After the domain filter has separated content by classification markings, different style sheets format the XML data for HEADLINES, POLICY MAKER, and THEATER views.  Each view filters out detail not needed by a particular viewer.  An additional Print style sheet formats the information for a convenience print.  A final style sends the XML to the viewer, allowing reuse of that element in any standard structured or non-structured editor.

Internal Users have access to the repository through an EDITOR Style, which allows access to particular elements in the repository through a unique ID number.  This feature currently allows comment and review and in the future will allow for structured editing through the browser.

4                        Configuration Management

By definition, the IC XML Prototype strove to use as much commercial off-the-shelf (COTS) technology as possible.  This was as much an effort to evaluate what the current vendors are able to support as well as to find cost-effective applications.  In the course of this system integration, SAIC discovered many lessons that have little to do with XML, but everything to do with basic tool integration.  More specifically, SAIC discovered that, though XML vendors may be standardizing on XML, their implementation of it is anything but standard.

Configuration management is the blanket term SAIC used to address these non-XML system integration issues.  Collectively, this shows the inherent problems associated with selecting “best-of-breed” tools by category and then integrating them together.

For instance, XML (and SGML) applications may use catalog files.  Catalog files are used by applications as a reference to external resources.  Many catalog files contain references to International Standards Organization (ISO) publication standards for items such as generic representation of alphanumeric characters and special symbols.  In the course of integration, SAIC discovered that, although there are standards for what goes into a catalog file, there are no standards for how one should name the file, or even where it is located on the file system.  For instance, one application might require a “isopub1.cat” file reference, whereas another application would refer to the same file reference as a “iso-pub1.cat”.  The content of the file is the same, but the name and absolute file location are different.  This makes trading XML DTDs and instances tricky, if not outright impossible in some cases.  If one application is looking for isopub1.cat, but the DTD calls for iso-pub1.cat, the file will not be located.  Public identifiers that provide a naming convention that is not tied to a file name or specific location can help reduce this dependence.  However, not all vendors support the use of public identifiers.

The lack of a uniform manner of naming and addressing standardized, common files amongst applications created difficulties during the integration effort.  SAIC eventually created and enforced its own schema for naming and addressing these files, which necessitated a change in every installed application.  The greatest lesson learned is that COTS does not automatically equal out-of-the-box functionality when combining applications.  Though the vendor community is making great strides in moving towards standardization, they still lack uniformity at a very basic level.

5                        Design and Development Process

The sections that follow provide an overview of the design and development process for the prototype and identify the COTS products selected.  Each section includes a discussion of the candidate tools, technical highlights, and lessons learned/recommendations.  Appendix A provides list prices for selected software.  As previously described the design concept architecture consists of three main functional areas: Authoring or Content Manipulation, Content Storage, and Content Delivery.

5.1        Content Manipulation

5.1.1       Structured Authoring

5.1.1.1     Candidate Tools

The current commercial market has two basic categories of structured XML authoring tools.  There are tools that were originally designed for SGML and have been modified to accept XML, and native XML tools.

In selecting an authoring tool, SAIC rated candidates on degree of actual XML support, ease-of-use, and integration with content management systems.  Tools evaluated included:

·         ArborText Adept 8

·         Softquad XMetaL

·         Adobe FrameMaker+SGML 5.5 (XML support)

The selected tool was ArborText Adept 8.  Adept is a tool that originally was an SGML-only editor but has evolved to support XML natively as well.  This is unlike Adobe FrameMaker+SGML, which still works in SGML but can “Save As XML” only.   Both FrameMaker +SGML and Adept have a “bridge” that integrates them with the Chrystal Astoria Content Management System.

Softquad’s XMetaL was originally not selected due to its immaturity (product was in early beta testing).  However, Softquad embarked on a rapid development cycle and was able to release XMetaL 1.0 just prior to the end of the system integration effort by SAIC.  As a result, SAIC was able to do a simple integration of XMetaL with the content management system.

5.1.1.2     Technical Highlights

ArborText Adept 8 proves itself as a powerful XML authoring tool.  The application validates content as analysts enter it by enforcing the DTD structure and business rules.  The end result is a valid XML instance.

Much more importantly, the use of the bridge between Chrystal Astoria (the selected Content Management environment, see below) and Adept has many advantages.  The user, by learning the Adept environment, uses the CMS without ever realizing it.  The CMS environment appears as nothing more than another menu selection for the author.  For instance to edit a piece of content, an analyst would go to the pull-down "Astoria" menu in Adept, navigate to the item to be edited in the repository, and then checkout the item into Adept.  When editing is complete, the analyst would save and check-in the item to the repository.  Thus, the user learns one application, but in reality benefits from two.

Although Adept 8 has a Microsoft Windows environment look and feel, it is a content creation environment unlike any common word processor.  Analysts will have to become familiar with the DTD for the content they are creating to maximize their ability to rapidly create content.  Adept 8 has many features including an interactive environment showing legal elements for insertion and hints for attribute entry.

One technical obstacle uncovered in Adept comes from its SGML legacy.  SGML is not, by default, case sensitive, whereas XML is.  As installed, Adept 8 will automatically normalize all element tags and attribute values in an instance to lowercase.  This created problems when Adept 8 went to save the instance into the repository, which checked it against a mixed-case DTD.  The instance was deemed invalid.  A simple setting made during the Adept 8 installation process that allows for mixed-case sensitivity can solve the problem.  However, this incident is a sample of the issues that surround adaptation of SGML legacy applications to the XML environment.  Both SGML and XML are structured languages, but both are just different enough that it is difficult for a single application to be in total compliance with both standards.

Softquad’s XMetaL bills itself as the world’s first true XML editor.  XMetaL owes much of its roots not to the word processor world but to the world of web-page editors.  Specifically, XMetaL can be seen as an outgrowth of Softquad’s HoTMetaL HTML editor.  XMetaL is as easy to learn as an HTML-only editor like HoTMetaL (indeed, it can do SGML and HTML). Given the emphasis in the IC today with Intelink delivery, many organizations are already authoring natively in HTML.  For these organizations, an XML editor is a natural evolution.

The integration of XMetaL took a more traditional line.  Since there is no bridge for this application, one must open the content manager and find the content piece to be edited.  At this point, XMetaL can be used as a designated editor, to which the CMS checks-out the item, sends it to XMetaL, and the editing session begins.  Once editing is complete, the author saves the item, but then must reenter the CMS and check-in the item.  This integration was very simple (adding XMetaL to the .ini file for Chrystal Astoria) but not as clean as a bridge.  The user is forced to learn and utilize two applications, the CMS to store and/or locate content, and the editor to modify or create content.

5.1.2       Unstructured Authoring

The second authoring environment the Study Group directed to be demonstrated was the use of an unstructured editor for creation or modification of XML content.  Through the course of the prototype development, this actually took two sub-forms: a highly modified structured editor (i4i S4Text) and the use of templates and macros to simulate a structured editor both embedded into Microsoft Word.

5.1.2.1     Candidate Tools

When the Study Group directed an unstructured editor be demonstrated for XML use, what they really meant was “show us how Microsoft Word can do XML.”   To this end, SAIC looked at three tools:

·         i4i S4TEXT

·         Interleaf Bladerunner content creation module

·         SAIC-created Visual Basic macros for Microsoft Word, and ArborText Epic Author for conversion.

S4TEXT is a set of macros that provide a highly modified Microsoft Word environment.  S4TEXT uses the DTD to create document templates, which in turn creates a structured XML authoring environment.  The result is an authoring environment in nearly identical to a native XML editor.  S4TEXT includes an interactive parser that checks validity as the author enters the content.  Indeed, S4TEXT so extensively modifies Microsoft Word that several Study Group members classified it as a structured editor, more akin to Adept or XMetaL.  S4TEXT is such a modification to Word that it becomes a new application, thus creating training requirements like any other new piece of software.  S4TEXT tries to position itself as a low-cost alternative to purchasing a whole new authoring application, and as recently as a year ago when professional XML editors cost thousands of dollars this was true.  However, the recent trend is to more cost-efficient XML tools (XMetaL is $499) which makes the $199 S4TEXT seen a less appealing alternative, because XMetaL's capabilities far exceed those of S4/TEXT.

Interleaf Bladerunner content creation module is part of the Bladerunner suite that includes DTD creation, authoring, content storage, manipulation, and dissemination.  It is a set of pre-done macros that create document templates (.dot files) to guide authoring from within Microsoft Word.  Files are saved as RTF, then converted to XML in accordance with the DTD.  Validation occurs at the point the RTF becomes XML.

Due to Interleaf’s unwillingness to separate the content creation module from the rest of their suite, SAIC was forced to develop their own version.  This is a very laborious process which includes creating templates that utilize Word formatting styles that closely match the element names used in the selected DTD.  Visual basic (VB) macros are then developed to assist the author in the document creation process by applying rules to the formatting styles.  This is an attempt to provide a form of structure to an otherwise unstructured environment.

The end result is that when an author wants to create XML content, they open Word and select the XML document template.  This process is just like one opening Word and going to a FAX or Cover Letter template.  The embedded macros in the template walks the author through the creation of the document, inserting formatting styles in their proper order.  Although the authoring of the content is just as easy as using out-of-the-box Microsoft Word, the end result is a styled Word document that can be more easily converted to XML.

Several products were evaluated for conversion of Word (or RTF) documents to XML.  Our goal was to find a product that already had this conversion capability versus developing our own using Omnimark or similar conversion language.  Unfortunately, most of the available products were either in Alpha or Beta form and most of those products had very limited capabilities.  SAIC selected ArborText's Epic Author to perform the Word-to-XML conversion process.

Epic Author is actually a simplified version of ArborText's Adept Editor that has been tailored to handle XML for authoring and for publishing to paper, CD-ROM and the World Wide Web (WWW).  It also happens to have a good Word (or RTF) to XML converter built-in.  The converter allows a user to map Word styles to DTD elements for both import and export of documents.  In other words, the converter allows authors to create their content in Word and import it to Epic Author for conversion to valid XML, or create the content in Epic Author as XML and convert to a Word document on export.  We found the converter works well for simple DTDs.  Its major drawback is that it cannot process elements that have more than one attribute associated with them.  This feature is to be added in a later release of the product.

The document is checked for XML validity on import into Epic Author.  Invalid portions of the document are converted to comments.  The author then has the choice of making the corrections in Epic Author, where it can be checked for validity interactively, or make the changes to the original Word document in Word and then perform the import process again.  Once converted to valid XML, it must be saved to the file system.  The author must then open the CMS and check-in the file to the content repository.

This approach to authoring could work if the vast majority of analysts only have Microsoft Word and the organization is willing have a group of editors to perform the conversion and cleanup process.

Two major drawbacks to this approach stand out: the conversion and cleanup process, and the creation and maintenance of the Word templates.  The structured editor approach eliminates these drawbacks.

5.1.3       Comment and Review

Another major component of content manipulation is comment and review.  SAIC selected Chrystal Web Services application to demonstrate the potential use of XML for comment and review.  Web Services is an application that integrates with the Chrystal Astoria Content Management System.  It is a web-based (Java-driven) application that allows users to attach comments to various nodes of content.  The comments can be tracked and reviewed as necessary from a users web browser.  Though not directly XML, this comment and review technology takes advantage of the object-oriented database used for the content repository.  XML, being object-oriented, enables the comment and review application to easily attach information to nodes (objects) and ensure that data travels with the object regardless of where it is used.

The Study Group explicitly directed that a workflow process was NOT to be part of the prototype effort.  However, the comment and review activity shows how object-oriented databases can be used within a workflow environment.  A reviewer can access a node of content and add comments after reviewing the item.  These comments are associated with the node and stored within the repository.  When the node is accessed, the comments may also be viewed.

5.1.4       Lessons Learned (Recommendations)

In the question of structured versus unstructured, the structured editors clearly win.  The native creation, editing, and storage of data as XML; an integrated parser to validate content as it is being entered; to automatically save the data as XML; and in some cases tight integration with a CMS make structured editors a clear choice over unstructured editors.

Unstructured editors, by forcing reliance on a conversion process, make for a less dynamic, less responsive environment.  One can easily envision a lone analyst, under pressures of time and management, finishing up a document, running it through a conversion process, only to find out that it is invalid and must be reworked.  With luck, the system may tell him why it is invalid.  More likely, the analyst would be on his/her own to discover the errors and make the necessary corrections.

Training also becomes an issue.  Many members of the Study Group expressed the concern that “the analyst is not going to learn another application!”  Therefore, the inference is, Word must somehow be made to do XML.  But in reality, how many users already use the template functions in Microsoft Word?  Though no empirical studies exist for the IC, anecdotal appraisals lead one to conclude that the number is small, and therefore users will have to be trained anyway.  In the end, training becomes a non-issue.  Regardless of the application chosen, training will be necessary.  Authors will have to learn a new editor (or at least, learn more about the one they may already have).  If the editor does not have a bridge to the CMS, then authors will have to learn at least some functionality of the CMS application too, if for no other reason than to simply navigate to the content to be edited.  Furthermore, some members of the IC have already made the move to authoring in HTML.  For these members, the transition to XML is more evolutionary than revolutionary.  Furthermore, they are proof that analysts can learn a new application.  The key here is not getting analysts to accept learning a new application, but to make management understand that this new way of doing business will improve product quality and user productivity.

Finally, comment and review, though not strictly an XML activity, is enabled under an object-oriented environment.  The XML objects can easily have additional information, such as comments, attached to them and retained by the database for both tracking and utilization.

5.2        Content Storage

5.2.1       Design Rationale

The content storage module of the prototype is an object-oriented database with a content management system controlling user interaction with the database.  This content repository is used to house the XML content.  The repository has the ability to store XML content at a very granular level while allowing for check-in, checkout, and versioning of the content.  Additionally, the content repository must have the ability to connect to other data sources, especially relational databases (i.e., Oracle).

5.2.2       Candidate Tools

Selection criteria included an object-oriented database, content management system allowing for element or fragment (sub-element level) management and manipulation.  Additionally, a content repository that was already integrated with an editor was viewed as favorable.  Candidates included:

Ø      Chrystal Astoria

Ø      Bladerunner Content Management Suite (formerly Texcel Information Manager)

Ø      POET CMS

The selected tool was Chrystal Astoria.  The Chrystal content management system sits atop an ObjectStore database.  Astoria allows for element or fragment check-in and checkout, as well as version control of content.  Most importantly, Astoria is available with a bridge to ArborText Adept (authoring environment).  This warm integration integrates Astoria CMS functions within Adept, making the repository accessible through the editor's GUI interface.

5.2.3       Technical Highlights

5.2.3.1     Content Management System

Chrystal Astoria, being object-based, is well suited for manipulation of XML content.  The database naturally handles elements or fragments thereof.  The object-oriented allows one to manipulate objects or, as in the case of comment and review, attach additional items to an object.

One absolute requirement of a content management system is the ability to perform versioning of the content.  This is all the more important in the world of intelligence where analysts must be able to explain their decisions and verify their information sources.  A major issue facing today’s webmasters is how to keep a dynamic, living site operational with the latest information, yet be able to show what “has been” that may have contributed to a decision. The content management system must have the ability to recreate the view not only as it is, but also as it was.

Astoria addresses this issue through the use of versioning and editions.  Every time content is modified (or created) a version is created.  These versions are retained in the system and can be recalled or reused as necessary.  Astoria also has a process called “editioning” where a snapshot of the content can be taken as required.  Editions can then be saved (archived) and recalled as needed.

5.2.3.2     Database Access

The Study Group was very adamant in the requirement that the content repository (Astoria) have an ability to connect to existing storehouses of information.  SAIC addressed this requirement by using a two-prong approach: showing access to a relational database (Oracle) and access to a web page (HTML).  Both of these approaches would demonstrate how a simple HTTP (Web) server could be used as a conduit between the repository and external data sources.  An Oracle relational database containing weapons reference data for tanks, helicopters, etc., was used as the source data for the database access portion of this exercise.  The HTML data source was the United Nations resolutions' page located at http://www.un.org/Docs/scres/1999/sc99.htm.

To perform either of these functions, a user would locate the desired content in Astoria through the bridge functions in Adept and load the content into the editor.  In the case of the UN resolution portion of the demonstration, a user at the allowed location inserted a UNRESOLUTION tag set into the instance and its associated attributes were filled with the UN Web address, etc.  The instance was then checked into the repository, and checked back out into the editor.  On checkout the UNRESOLUTION tags and subelements were automatically inserted into the instance with content pulled from the UN Web page.  For the Oracle database access portion of the demonstration, a WPNSREF tag set was inserted into the instance by a user at the allowed location, the desired subelement category ("TANK" for example) was inserted and its associated attributes filled in with Web address of the Oracle database.  A query parameter ("T-" for example) was also entered into the content of the WPNSYSTEM tag set. The instance was then checked into the repository, and checked back out into the editor.  On checkout the WPNSREF tags and subelements were filled in with content pulled directly from the Oracle database tables.  The data was displayed in the editor as a FOSI-generated table.  Both of these data extractions provide the user with a snapshot of the data at the time of insertion.  To save this content, the user would have to remove the attribute values entered initially.  Otherwise, a new data extraction would be attempted when the instance was checked back out into the editor.

In addition, the prototype demonstrated RDBMS retrieval into XML format via the database import wizard available in SoftQuad's XMetaL authoring tool.  The wizard provides connectivity to any ODBC compliant database.  For the prototype, data was pulled from a Microsoft Access database and from a Microsoft Excel spreadsheet.  The Excel pull to XML formatted data demonstrated the access to a different form of legacy data organization. 

The XMetaL database import macro allows a user to select the desired data extraction driver, be it for an Access database, Excel spreadsheet, etc.  The data extracted is displayed in the table format required by the DTD, but selectable by the user prior to import.  In addition, a second macro provided an update capability that allows users to update the data in the instance at any time.  So if changes are made to the original database, spreadsheet, etc., change, the user can perform regular updates on their instances to reflect these changes without having to perform the entire data extraction process from scratch.  This process provides real-time like capabilities, without the associated overhead.

5.2.4       Lessons Learned (Recommendations)

The major drawback to an object-oriented system is performance.  Relational databases, with a table-row-field schema, are easily indexed and therefore easy to query and rapidly retrieve information.  Object-oriented databases, lacking an easily cross-referenced indexing schema, require more system performance time to accomplish the same tasks.  As the absolute size of your data collection grows, so too does the required system function to accomplish search and retrieval.

Astoria does not offer translation to an RDBMS.  Any desired RDBMS connectivity must occur through Astoria’s SDK programming interface.  Astoria’s API provides the foundation on which applications are built that wish to access the Astoria database or external databases from within Astoria’s XML model. The API provides references to the database objects (XD_Attribute, XD_Entity, XD_Element and many other object classes) and executes on the client.  As a result, all API-based DLLs and EXEs execute on the client.  Therefore, Astoria can be very client heavy depending on how much functionality is desired.  The more customization developed, the more each client will have to be updated to perform the new functions.

SAIC attempted to use existing vendor tools to provide the translation capabilities desired.  One candidate for translation was Bluestone’s XML Server with its Visual Mapper application. This Java-based mapping tool provides the user the ability to connect the user’s DTD with the rows and columns of an RDBMS. The result set is either an XML document or fragment thereof that can be inserted into the viewer’s document.  Unfortunately, we were unable to get Bluestone's product to perform properly.  Even with the vendor's assistance, too many bugs and incompatibilities existed to justify Bluestones involvement in the prototype.  We settled on developing our own homegrown application using Astoria's SDK.

XMetaL's database import wizard performed quite well for the prototype.  It is easy to set up and use.  The only drawback to this feature is that this functionality is only available for object linking and embedding (OLE) applications.  In other words, this is a Microsoft windows-only application.

XML lends itself well to data translation.  Due to its structured, but neutral, text-based nature, mapping XML elements to database table fields is fairly straightforward.  The prototype also proved that a couple of different communication techniques could be employed to accomplish the interconnection between applications.  For IC analysts, the interconnection to dynamic or legacy data will allow them to gather information from sources that may have been difficult or impossible to include in the past.  As better commercial products become available to further, this capability will be further enhanced.

5.3        Content Delivery

5.3.1       Design Rationale

One of the goals for the delivery portion of the prototype was to add formatting characteristics to content-specific XML elements using XSL.  To further illustrate the capabilities of XML, the prototype used several different XSL style sheets for user and purpose-specific views.  Additionally, the ability to provide the user with convenience printing capability is illustrated through the use of an additional XSL style.

5.3.2       Candidate Tools

When the Study Group started defining requirements, one of the first laid down for the dissemination side was “deliver intelligence content via Intelink.”  This directed a web environment, and at first both the Study Group and SAIC fixated on web browsers with XML ability.  However, in the course of looking at XML browsers, SAIC determined the best method of showing how XML could be delivered was to show how a middleware application could be used to transform the XML content and deliver it to Intelink, regardless of the browser’s XML capability.  LivePage Enterprise, the product selected for dissemination, is a unique tool which stores and converts XML data to HTML using standard XSL style sheets.  This server-side conversion allows the end user to view the information through any of the current Internet browsers. 

5.3.3       Technical Highlights

LivePage takes XML (or SGML or HTML) content and stores it within a web server.  Utilizing the power of Extensible Style Sheets – Transformation (XSL-T), LivePage takes XML content and transforms it to HTML for delivery to the web.  The browser never “sees” the XML, and therefore any HTML browser (Internet Explorer, Netscape Communicator, Opera, etc) can be used to view “XML” content.

LivePage allows multiple views of the same document by using server-side style sheets.  This is important to the IC.  SAIC views server-side style sheets as critical to the security environment.  Server-side style sheets ensure that the content being served out is appropriate to the user.  To use client-side style sheets is to introduce security dangers.  In a client-side model, all the content is sent to the client, though only apart of it may be viewed based on the style sheet.  Take for instance, a hypothetical foreign disclosure item.  In a server-side model, the viewer receives a piece of intelligence content styled to the appropriate security level.  The item received at the client is the appropriate items and nothing more because the style sheet on the server separated the content.  However, in a client-side model, all the content is passed to the client, and though the view may be of only a portion of it, all the content is sent to the users machine, where it can easily be saved and viewed through the “view source” feature of the browser.

LivePage demonstrated the ability to deliver XML content in most every manner like HTML.  Embedded links to graphics or multimedia objects work just like they do in HTML.  JAVA applets can also be embedded.  By converting XML to HTML, the power and support given to HTML are now available to the XML style author.

LivePage also has a built-in repository search engine that can be tailored to search by elements, attributes, ancestors, keywords, or full text.  This built-in engine demonstrates the power of XML-enabled searching, especially the use of metadata tagging.  In addition, the Internet users have access to a text-based search engine which returns results based on accuracy probability.  This is an additional feature based on the relational database utilized by this software.

The major shortcoming of LivePage is that it is not a dynamic pull of content from the content repository. Thus, the content in LivePage is a snapshot export of the content repository.  This requires that users have updating policies and procedures in place.  Fortunately, LivePage is capable of handling granular objects, and therefore a tailored export update routine is possible, and much more desirable than having to export/import the entire content repository.

5.3.4       Lessons Learned (Recommendations)

Given that browsers do not uniformly support XML, the use of a middleware application becomes highly desirable since it offers the ability to use XML content yet deliver it to a browser-independent audience.  Even after the day all browsers support XML has arrived, the use of server-side style sheets still has definite security advantages and by itself might force continued reliance on a middleware application.

LivePage is a great demonstration of the power a middleware application has for delivery.  It is both a content delivery engine and an information portal application.  The ability for it to use any markup content (XML, SGML, and HTML) could make it a logical candidate to assist organizations as they transition to XML.

5.4        Security and Other Enhancements

5.4.1       Security

Security is always a major factor in the IC.  The Study Group charged SAIC with demonstrating how XML could work both within the current security environment as well as demonstrate usefulness for future ones.

There is nothing in XML that inherently prohibits its use in the current security environment.  Indeed, the ability for XML to attach security attributes (and hence, markings) to granular elements exceeds what is currently possible in the HTML Intelink environment.  Additionally, if security markings are stored as attributes rather than actual content, then changes to marking guides, like the latest CAPCO release, becomes a matter of updating style sheets rather than content.  Security markings are a great example of how XML helps separate content from format.  In an XML future, if one decides to change how content security markings are formatted, then one changes a style sheet that potentially affects every piece of delivered content.  This is much better than today’s environment, often times exemplified by content marked up by how it is to look (format) versus what it is.  Security markings are a great example of why XML is better than HTML for storage of intelligence content.

5.4.2       Metadata

XML elements, if properly named, become your metadata.  Thus, while Intelink tries to capture data such as creation date and authoring agency using metadata, XML allows you to go a step further and capture metadata not only about the properties of the data, but of the nature of the data itself.  The ability to search on element names is potentially far more powerful than a search of keywords associated with a file.  XML Knowledge Maps, metadata by another name, could be the practical implementation of XML metadata that enables users to find the relevant piece of information the first time.

5.4.3       Searching

Metadata goes hand-in-hand with searching.  Full-text retrieval, though comprehensive, requires the content to be fully indexed.  Even if the collection is fully indexed, the need for relevancy rankings makes finding the desired item questionable.  XML searching, which is really nothing more than searching against elements or attributes, promises to help solve the problem of irrelevant search results.  However, in order to be effective, the elements must be named or have content of an appropriate manner.  In the case of the prototype, the data set used was the CIA World Factbook.  The World Factbook was originally an SGML publication, and the data is laid out in a very generic manner (FIELD/SUBFIELD names).  Thus, searches must look at the content of elements.

5.4.4       Linking

This is definitely the low-point of the prototype.  XML has great promise of XLink and XPointer, but neither of these recommendations is past a very early stage of its development.  Therefore, the prototype was unable to demonstrate the advantages of XML linking.  Currently, the XSL-T standard allows only for simple hyper-linking.  As the XLink and XPointer recommendations mature, and applications are developed to take advantage of them, their true usefulness and applications will then be seen. 

5.4.5       Geographic Information Systems

The Study Group directed SAIC to explore GIS integration.  This ability to visualize data is deemed of high importance to the IC, and a demonstration of how XML and GIS works together was deemed interesting enough to warrant exploration.

At first, SAIC talked with an outside vendor of GIS products and viewed a prototype GIS-enabled World Factbook.  The system viewed allowed a user to navigate the Factbook via a GIS-aware globe.  As countries were selected, relevant entries from the Factbook were displayed.  Though interesting, SAIC did not see any real breakthrough in technology here and opted to keep looking.

Fortunately, the prototype development team talked to a sister SAIC division responsible for GIS applications.  That division saw great potential in XML for GIS information exchange.  One problem plaguing the GIS community is the rapid exchange of data.  Often, the data, which starts from a standard (like vector product format (VPF) from the National Imaging and Mapping Agency (NIMA)) is converted to a proprietary format for use by a specific application or application family.  There is a lag time between production and dissemination, sometimes measured in months.  XML seemed a natural answer to a rapid, vendor-neutral, non-proprietary format for data interchange.

The resultant activities show how rapidly XML can be applied.  Within the span of one week, the spatial technologies division had taken VPF data and converted it to valid XML.  They then were able to save this data to a web page (in XML) and run a Java applet using the Extensible Query Language (XQL) against the data to generate maps.  The data could also be queried and results returned to the user.  There is nothing revolutionary in the way the content was created or stored, but it opens up a potential revolution in how it is applied.

The Spatial Technology team envisions a future where data is posted to a common site in XML.   A responsible agency ensures that this sole data source is updated with the latest information.  Users of the data can either use it in its native XML format or, if they chose, convert it from XML to their proprietary format for their discrete application.

The paradigm demonstrated, a sole data source in a neutral format useable as-is or with conversion, shows at once the greatest potential of XML.  XML enables data producers to create and store their data in a recognized, standardized format against which a wide variety of applications can work.  Rather than needlessly duplicating data in proprietary formats, the producer can “produce once,” and users can “apply many.”

6                        Conclusions

6.1        Project Highlights

Briefly put, this highlight of the project was that an end-to-end production, storage, and dissemination system using XML is possible with COTS technology.  However, there are fundamental system integration issues (collectively referred to as configuration management) that must be addressed. 

6.2        Lessons Learned

Structured versus Unstructured Authoring.  Structured authoring is the clear winner.  Though unstructured (i.e., Microsoft Word) can be made to do XML, the necessity of transforming it from one format (native Word or RTF) to another (XML) makes it less desirable than a native XML authoring environment.  The old SGML worry of high-costs for seat licenses is also disappearing, making these applications nearly as affordable as a word processor.  The greatest advantage is the ability of structured editors to integrate with the content management environment.  Training, often pointed to as a great determinant, is really nothing more than a convenient excuse to avoid adoption of new methods of creating content.  Indeed, the rapid rise of HTML editors within the IC already belittles those who claim nothing can be written without Microsoft Word!

Data translation to XML.  Data translation has been in process for many years in the RDBMS world.  However, data translation into XML is a rather new area of interest.  The prototype proved that simple translations are quite possible, due to the structured, neutral nature of XML.  Homegrown applications can be developed to meet specific needs, but maintenance of the translators could prove to be unwieldy if there are constant structure changes.  There are currently a few vendors developing generic translation tools for creation of these translation products.  Inevitably, more COTS packages will appear to provide data extraction or translation functionality, probably through the work being done in e-commerce applications.

Dissemination of XML.  Too often this is viewed as another form of the never-ending browser wars.  In reality, a smart middleware application is probably the best long-term solution for the IC.  Security concerns will drive the community to server-side style sheets.  Until the day arrives where web servers easily handle XML and XSL natively, a middleware application is the best solution. 

6.3        XML’s Applicability to the Intelligence Community

Taken as whole, XML shows great applicability by providing a vendor-neutral, non-proprietary format for exchanging content.  This has great application to the IC as it strives not only to enable better communications amongst its own members but also as it tries for better interoperability with its customers. 

XML has definite advantages for an authoring environment.  The ability to divorce format (appearance) from content and manipulate that content as needed has tremendous potential.  XML tagged data makes the data “smarter” and much more readily searched against and retrievable.  The ability to add security markings as attributes is a prime example of how XML captures data about content which machines can then be programmed to operate against.  The primary complaint regarding XML authoring tools, that they are too difficult to learn, stems from a lack of understanding more than physical or system limitations.  XML editors are more akin to HTML editors and not word processors.  In today’s Internet dominated world, the reluctance of new users to use cutting-edge applications is less than it was just a few years ago.  If the predictions are right, as much as 50% of current intelligence analysts will be retiring or otherwise leaving in the next 3-5 years.  What do you chose…an editor for the current generation or one for the future? 

XML content repositories are also much aligned by critics.  The critics point out that object-oriented databases are not as responsive or speedy as older relational designs.  But one must realize that the relational databases have had 15 years to get their act together, and speed is just one of many factors to be considered.  Where XML excels in how it can take advantage of the object-oriented databases.  Objects can be elements, collections of elements, or fragments thereof.  Object-oriented databases are uniquely suited to take advantage of the smart data created by XML.  Element check-in, checkout, fragment editing, reuse and re-purposing of content as well as attaching comments and reviewers remarks, not to mention version control makes an object-oriented database just as useful as a speedy relational database.  Besides, performance issues can be addressed up front through logical subdividing of data collections and smart data archiving or indexing procedures.  Furthermore, there is nothing in the design of object-oriented systems that prevent it from interfacing where necessary to relational legacy systems.   

The real shining glory of XML is in the delivery of the content.  Smart tags lay the foundation for a future multi-domain (and maybe even multi-level) security environment.  Server-side processes can take advantage of XML tags to ensure that only the right content is delivered given security constraints.  Furthermore, the views can be tailored based on what information the user wants to see.  Most importantly, the delivered content can be made reusable while ensuring the XML richness is preserved.  Regardless of what critics may lead one to believe, XML is useable today in any browser.  All you need is a middleware application.  While right now a middleware application best serves the needs of server-side processing, the future will eventually see this same functionality embedded in the web server.

XML data tagging provides a degree of security control in content never before deemed possible.  Rather than have the security markings as part of the text, the security data can be made an attribute that a computer can be made to understand.  This security mark can be used to sort data by classification, as well as mark it appropriately.  The XML separation of content and format means that when the next generation of CAPCO arrives, the change to an XML environment would simply be a change to the style sheet (appearance).

XML tags as metadata also hold great potential for addressing the problem of full-text searching.  Searches can be directed at elements or attributes, reducing what has to be indexed and searched in a time where explosive growth of web content threatens to overwhelm search engines.  In the commercial world, the search engine Northern Light was recognized as the most comprehensive since it reached an estimated 16% of the web pages in the world.  Though Intelink currently has great coverage, what will happen on the SIPRNET side when “operations” and “intelligence” come together in a common environment?  Is 16% coverage good enough?  Even if it is 32%, or even 64%, is that enough?  XML tagged content is the way to assist users in finding the right information in a time when the amount of content grows at exponential rates.

The XML GIS application demonstrates the power of XML for data interchange.  Using XML, data can be hosted in a neutral manner that any application can use.  It can be used in its native XML format or converted.  The key is that the data can be rapidly updated and disseminated.

XML is real and ready for use today.  The tools are available and capable of accomplishing tasks while meeting stringent security requirements (indeed, with XML some of these security requirements may actually be met for the first time!).  XML allows for the creation and manipulation of smart content.  XML enables storage of content in reusable, retrievable and searchable manners.  XML can be used to disseminate content as needed while enforcing security concerns or catering to user preferences.  The online revolution is here, and HTML is not good enough.  XML is more than good enough; it is also ready.  Only the question remains “Is the Intelligence Community ready for XML?”


APPENDIX A   Software Cost

The following prices are list prices for quantities of one.  Please note, all prices are usually negotiable.  Each software vendor has additional components as well and the actual component mixture and quantity required would need analysis per community/organization.   Maintenance costs are additional.

 

Appendix A - Software Cost

Software

No of Units

Unit Price

Total  Price

 

 

 

 

STRUCTURED CONTENT AUTHORING

 

 

 

ArborText ADEPT Components

 

 

 

                ADEPT Editor (Fixed/Concurrent*)

1

     1,100 / 2,200

 

                ADEPT Publisher   (Fixed/Concurrent*)

                        (Required if producing paper)    

1

     2,100 / 4,200

 

                Document Architect

1

                 4,950

 

                ACL Designer

1

                 3,495

 

     Total – ArborText ADEPT Components

 

 

    11,645 / 14,845

 

 

 

 

XMetal

1

                    495

                       495

 

 

 

 

STRUCTURED NON-XML AUTHORING

 

 

 

ArborText EPIC Components

 

 

 

                EPIC Editor  (Fixed/Concurrent*)

1

     1,400 / 2,800

 

                EPIC Publisher  (Fixed/Concurrent*)

                        (Required if producing paper)    

1

     4,000 / 8,000

 

                Word Interchange   (Fixed/Concurrent*)

1

           400 / 800

 

     Total – ArborText EPIC Components

 

 

      5,800 / 11,600

 

 

 

 

CONTENT STORAGE / MANAGEMENT

 

 

 

Chrystal Astoria Components

 

 

 

                Astoria Server

1

               42,500

 

                Astoria SGML/XML Client

1

                 1,940

 

                Adept Editor Bridge

1

                 8,500

 

                Web Services Server

1

               15,000

 

                Web Services Additional Capacity Licenses

1

                 7,500

 

                Web Services Advisor Server

1

                 7,500

 

     Total – Chrystal Astoria Components

 

 

                  82,940

 

 

 

 

CONTENT DELIVERY

 

 

 

LivePage Enterprise Components

 

 

 

                LivePage Enterprise (includes 5 Mgr units)

1

                 8,000

 

                LivePage Enterprise Internet Connector

1

               10,000

 

                LivePage Manager 5-pack

1

                 3,000

 

     Total – LivePage Enterprise Components

 

 

                  21,000

 

 

 

 

 

*  A fixed license is fixed per PC.  A concurrent license is one license but it is loaded from a server so that multiple individuals may utilize it but still only one at a time.

Certified 508 Compliant 03/30/01