SensorGrid Background
side3

SensorGrid Background

Advances in Internet and distributed systems helped academia, governments and businesses to provide access to a substantial amount of geospatial data. The GIS community must face the following challenges:

  • Adoption of universal standards: Over the years organizations have produced geospatial data in proprietary formats and developed services by adhering to differing methodologies;
  • Distributed nature of geospatial data: Because the data sources are owned and operated by individual groups or organizations, geospatial data is in vastly distributed repositories,
  • Service interoperability: Computational resources used to analyze geospatial data are also distributed and require the ability to be integrated when necessary.
  • The Open Geospatial Consortium, Inc (OGC) represents a major effort to address some of these problems. The OGC is an international industry consortium of more than 270 companies, government agencies and universities participating in a consensus process to develop publicly available interface specifications. OGC Specifications support interoperable solutions that "geo-enable" the Web, wireless and location-based services, and mainstream IT. OGC has produced many specifications for web based GIS applications such as Web Feature Service (WFS) and the Web Map Service (WMS). Geography Markup Language (GML) is widely accepted as the universal encoding for geo-referenced data. In addition to the more traditional HTTP request/response style services, the OGC is also defining the SensorML family of services.
  • The GIS community quite obviously represents a major sub-domain in the “Grid of Grids” [Building a Grid of Grids- Messaging Substrates and Information Management] picture. By architecting GIS services using Web Services, and be placing these services within a SOA messaging substrate, we may integrate GIS Grid Services with other applications. Our work on GIS services as Web Services is previously described in more detail in these papers: "Complexity Computational Environment: Data Assimilation SERVOGrid" and "Numerical Simulations for Active Tectonic Processes".

    GIS applications developed by various vendors and academic institutions have become more complex as they are required to process larger data sets, utilize more computing power and increasingly in some cases need to collect data from distributed sources. Traditionally GIS applications are data centric: they deal with archived data. However, with sensor-based applications gaining momentum the need of coupling real-time data sources such as sensors, radars, or satellites to high end computing platforms such as simulation, visualization or data mining applications introduces several important distributed computing challenges to GIS community.

    Although commercial GIS applications provide various solutions to these problems, most of the solutions are based on more traditional distributed computing paradigms such as static server-client approaches. Traditional point to point communication approaches tend to result in more centralized, tightly coupled and synchronous applications which results in harder management practices for large scale systems. Modern large scale systems on the other hand require more flexible asynchronous communication models to cope with the high number of participants and transfer of larger data sets between them.

    Defining a Common Data Format

    The first step for building such services is to decide appropriate encodings for describing the data. The importance of the data format lies in the fact that it becomes the basic building block of the system which in turn determines the level of interoperability. Use of a universal standard like XML greatly increases the number of users from different backgrounds and platforms who can easily incorporate our data products into their systems. Furthermore, services and applications are built to parse, understand and use this format to support various operations on data. So in a sense the type and variety of the tools being used in the development and data assimilation processes depend on the format initially agreed.

    For these reasons we use GML, a commonly accepted XML based encoding for geospatial data, as our data format in GIS-related applications. One important fact about GML is that, although it offers particular complex types for various geospatial phenomena, users can employ a variety of XML Schema development techniques to describe their data using GML types. This provides a certain degree of flexibility both in the development process and in the resulting data products. For instance, depending on the capability of the environment schema developers may exclusively use certain XML Schema types and choose not to incorporate more obscure ones because of incompatibility issues. As a result a particular geospatial phenomenon can be described by different valid GML schemas.

    By incorporating GML in our systems as de facto data format we gain several advantages:

    1. It allows us to unify different data formats. For instance, various organizations offer different formats for position information collected from GPS stations. GML provides suitable geospatial and temporal types for this information, and by using these types a common GML schema can be produced. (See http://www.crisisgrid.org/html/servo.html for sample GML schemas for GPS and Seismic data)

    2. As more GIS vendors are releasing compatible products and more academic institutions use OGC standards in their research and implementations, OGC specifications are becoming universal standards in GIS community and GML is rapidly emerging as the standard XML encoding for geographic information. By using GML we open the door of interoperability to this growing community.

    3. GML and related technologies allow us to build general set of tools to access and manipulate data. Since GML is an XML dialect, any XML related technology can be utilized for application development purposes. Considering the fact that in most cases the technologies for collecting data and consecutively the nature of the collected data product would stay the same for a long period of time the interfaces we create for sharing data won’t change either. This ensures having stable interfaces and libraries.

    Data Binding

    Establishing XML or some flavor of it as the default message/data format for the global system requires consideration of a Data Binding Framework (DBF) for generating, parsing, marshalling and un-marshalling XML messages. Marshalling and un-marshalling operations convert between XML-encoded formats and (typically Java) binding classes that can be used to simplify data manipulation.

    Being able to generate XML instances and parsing them in a tolerable amount of time is one of the criteria while choosing such a framework, because message processing time would affect overall system performance as well as the performance of the individual XML processing component.

    Another criterion to consider is the ability of the binding framework to successfully generate valid instances according to the Schema definitions. This is a major problem for DBFs since not all of the XML Schema types can be directly mapped to Object Oriented Programming constructs. Some of the XML Schema types (such as Substitution Groups which are heavily used in GML Schemas) do not correspond to types in Object Oriented world and this causes difficulties while processing the XML documents. Various Data Binding Frameworks offer different solutions, some of which are more elaborate than the other and depending of the nature of the data a suitable framework must be chosen.

    Data Services

    GIS systems are supposed to provide data access tools to the users as well as manipulation tools to the administrators. In principle the process of serving data in a particular format is pretty simple when it is made accessible as files on an HTTP or FTP server. But additional features like query capabilities on data or real-time access in a streaming fashion require more complicated services. As the complexity of the services grows, the client’s chance of easily accessing data products decreases, because every proprietary application developed for some type of data require its own specialized clients. Web Services help us overcome this difficulty by providing standard interfaces to the tools or applications we develop.

    No matter how complex the application itself, its WSDL interface will have standard elements and attributes, and the clients using this interface can easily generate methods for invoking the service and receiving the results. This method allows providers to make their applications available to others in a standard way.

    The usefulness of Web Services is constrained by several factors. They can be used in several cases such as

    · The volume of data transferred between the server and the client is not high. Actual amount of data can be transferred depends on a number of factors like the protocol being used to communicate or maximum allowed size by HTTP;

    · Time is not a determining factor. Despite the obvious advantages, current HTTP-based implementations do not provide desirable results for systems that require fast response and high performance. This is simply due to the delays caused by data transfer over network, network constraints, and HTTP request-response overhead.

    Most scientific applications that couple high performance computing, simulation or visualization codes with databases or real-time data sources require more than mere remote procedure call message patterns. These applications are sometimes composite systems where some of the components require output from others and they are asynchronous, it may take hours or days to complete. Such properties require additional layers of control and capabilities from Web Services which introduces the necessity for a messaging substrate that can provide these extra features.

    Streaming with NaradaBrokering

    Community Grids Lab has been developing NaradaBrokering; a distributed messaging infrastructure which goes beyond the remote procedure call methodology pure Web Services approach is based on. It provides two related capabilities. First, it provides a message oriented middleware (MoM) which facilitates communications between entities (which includes clients, resources, services and proxies) through the exchange of messages. Second, it provides a notification framework by efficiently routing messages from the originators to only the registered consumers of the message in question.

    NaradaBrokering facilitates the idea of loosely coupled systems by supporting asynchronous communication and it can be used to support different interactions by encapsulating them in specialized messages called events. Events can encapsulate information pertaining to transactions, data interchange, method invocations, system conditions and finally the search, discovery and subsequent sharing of resources.

    Some of the important features of NaradaBrokering can be summarized as follows:

    • Ensures reliable delivery of events in the case of broker or client failures and prolonged entity disconnects.
    • Provides compressing and decompressing services to deal with events with large payloads. Additionally there is also a fragmentation service which fragments large file-based payloads into smaller ones. A coalescing service then merges these fragments into the large file at the receiver side.
    • Provides support for multiple transport protocols such as TCP (blocking and non-blocking), UDP, SSL, HTTP, RTP, HHMS (optimized for PDA and cell-phone access) and GridFTP with protocol chosen independently at each link
    • Implements high-performance protocols (message transit time of 1 to 2 ms per hop)
    • Order-preserving optimized message delivery
    • Quality of Service (QoS) and security profiles for sent and received messages
    • Interface with reliable storage for persistent events, reliable delivery via WS-Reliable Messaging.
    • Discovery Service to find nearest brokers /resources

    Next: Initial Implementation

     

    [Home] [Projects] [Publications] [Contact] [Applications] [Software] [People]