Data-Intensive Computing

| Bottom | Home | Article | Bookshelf | Keyword | Author | Oxymoron |

Data-Intensive Computing

(from "The Grid")

Cat: ICT
Pub: 1999
#: 9907b

Reagan W. Moore, Chaitanya Baru, etc.

99331u/18219r

Title

Data-Intensive Computing

データ・インテンシブ・コンピューティング

Subtitle

from "The Grid" edited by Ian Foster, etc.

I.フォスター他編のTHE GRIDから

Author

Reagan W. Moore, Chaitanya Baru, Richard Marciano, Arcot Rajasekar, and Michael Wan
R.W.ムーア、C.バルー、R. マルチアノ、A.ラジャセカル、M.ワン

Published

1999

1999年

Index

Data-Intensive applications:

Distributed Data Analysis:

Information Discovery:

Data-Naming Systems:

Data-Storage Systems:

Data-Handling Systems:

Data Service Systems:

Data Publication Systems:

Data Presentation Systems:

Data-Handling Paradigms:

データ集中アプリケーション:

分散データ解析:

情報の発見:

データネーミングシステム:

データ蓄積ジシステム:

データ処理システム:

データサービスシステム:

データ発行システム:<

データプレゼンテーションシステム:

1データ処理のパラダイム:

Why
this
report?

What is Data, Information, Contents, Database, Archive, Library, Resources, also metadata, and so on?

All computer and network are roots, trunks, branches, and leaves. The data or the contents are fruits.

This chapter of the compiled book of "The Grid.", describes about information discovery environment.

When we think to eat a fruit, we must know where is the fruit, how it is composed, weather it is eatable. Then we try to access that fruit. We, the primates, have thought out how to get fruits dexterously during our long evolution period.

データとは何か。情報、コンテンツ、データベース、アーカイブ、ライブラリ、リソース、およびメタデータなどである。

全てのコンピュータおよびネットワークは根、幹、枝および葉である。データやコンテンツは果実である。

これは、"The Grid"の本の中の１章として、情報を見出す環境について書かれてものである。

我々が果実を食べようと考える場合、まずその果実がどこにあり、どのような構成になっているか、食べられるのかどうかを知らなければならない。そうしてその果実に近づこうとするのである。我々霊長類は、進化の長い間にどうしたら果実を巧みに得られるのかをずーと考え続けてきたのだ。

Summary

要約

> Top 1. Data-Intensive applications:

Computational grids provide access to distributed compute resources and distributed data resources, creating unique opportunities for improved access to information. Users access data that has been turned into information through the addition of metadata that describes its origin and quality. Information-based computing within computational grids will enable collective advances in knowledge.

1. データ集中アプリケーション:

コンピュータ・グリッドは分散コンピュータ資源や分散データ資源にユニークにアクセスすることで、情報を更新する。ユーザのアクセスするデータはその起源や品質を記述したメタデータを付加することで情報に変える。コンピュータ・グリッド内の情報ベースのコンピューティングは知識の集約的な進歩をもたらす。

Data-Intensive applications requires the manipulation of terabytes of data aggregated across hundreds of files, with data transfer rates 10 GB/s. (petabyte of data per day)

For data sets to be remotely accessible, metadata must be provided. Metadata catalog constitutes a form of publication.

Peer-reviewed publication of data solves both problems by providing metadata that can be used to access high-quality curated data globally, effectively turning data into information.

Several scientific disciplines (e.g. high-energy physics, molecular science, astronomy, and fluid dynamics) are aggregating domain-specific results into data repositories. The data is reviewed, annotated, and made accessible through a common, uniform interface. (Digital library)

Computational bandwidth:
the number of bytes of data processed per floating-point operation. (Cf: Supercomputer sustain 7 bytes per FLOPS)

For well-balanced applications, this ratio should match the memory bandwidth divided by CPU execution rate. When data transmission rates between computer memory and local disk are examined, memory acts as a cache that greatly reduces the disk bandwidth. (CF: Supercomputer 1 byte per 70 FLOPS, a factor of 490 smaller)

This ratio should match the disk bandwidth divided by CPU execution rate.

Data-intensive application need a high bandwidth data access rate to a remote data repository.

Network bandwidth performance tends to be smaller than local disk bandwidth performance.

Distributed processing:
preferred when data is processed through CPU once.

Distributed caching:
preferred when data is read multiple times by the application

データ集中アプリケーションは、何百というファイルの保管されたテラバイト規模のデータ操作を前提とする。この場合のデータ転送速度は10 GB/sである。（毎日のペタバイトの転送量。Peta = 10の15乗= 1,000 tera ）

データセットをリモートでアクセスするためにメタデータが付加される。メタデータのカタログは発行フォームを構成する。

高品質のデータにグローバルな環境でアクセスするにはメタデータを付加することでPeerベースでデータの発行が可能、そうすることデータを情報にする。

いくつかの科学専門分野（例：高エネルギー物理、分子科学、天文学、流体力学）では、その分野の計算結果をデータセンタに集中している。データは吟味、評価され共通のインターフェイスを通して共用される。（デジタルライブラリ）

コンピュータの帯域幅：
膨大なデータが浮動点少数計算で処理されている。（スーパーコンピュータは7バイト/FLOPS）

バランスのとれたアプリケーションでは、この比率はメモリー帯域幅をCPU処理能力で割った数に相当する。コンピュータメモリとローカルディスク間のデータ転送レートでは、メモリはキャッシュのようにふるまい、ディスクの帯域幅を減らす。（スパコンでは、1バイト/FLOPS、490より小さな因数）

この比率はディスク帯域幅をCPU処理能力で割った数に対応すべきである。

データ集中アプリケーションは沿革のデータセンタへの広帯域のアクセスを必要とする。

ネットワークの帯域幅はローカルディスクの帯域幅よりも小さい。

分散処理：
CPUによるデータ処理として有効。

分散キャッシング：
アプリケーションによりデータが何回も読み出される場合に有効。

> Top Data Assimilation:

A good example: assimilating remote satellite observations into a comprehensive data set for weather forecast. Satellite observations provide measurement only for the portion of the earth and only for the time period over a given area.

At NASA, the Goddard Earth Observing System (GEOS) Data Assimilation System (DAS) is used.
2 GB data per day are collected in NASA center in Maryland. The data is sent to NASA computer facility in CA for analysis 6 GB per day, and then sent back.

The analysis requires running a General Circulation Model (GCM) to predict the global weather patterns, calculating the discrepancies between the observed and predicted data. The assimilation cycle is repeated every 6 hours.

T3 network can transmit 400 GB data per day.

Eventually, the data assimilation will require scheduling of data transmission and of disk cache space utilization, in addition to scheduling of CPU access.

Data-handling process:

Identify the required raw data sets.

Retrieve the data from a data repository.

Subdivide the data to generate the required input set.

Cache the data at the site where the computation will take place.

Analyze the data, and generate new data products.

Transmit the results back to the data repository.

Publish (register) the new data sets in the repository.

Upper limit for data transmission:

Network
connection

Bandwidth
(Mb/s)

Daily transfer
(GB/day)

T1

1.4

15

T2

45

486

OC-3

155

1,670

OC-12

622

6,720

OC-48

2,488

26,870

OC-192

9.952

107,480

A second component of the DAs mission is to support reanalysis of the data. The reanalysis will be done using data over 10-year periods, requiring the movement of 29 TB of data stored in more than 47,000 files; which can be automated if general interfaces is designed.

One difficulty occurs in creating a logical handle for the input file.

A second difficulty occurs in determining where the input file is located, since in general it will not be on local disk.

データの同期化：

典型的な例としては、気象予報のための遠隔の衛星による観察データを同期して活用する。衛星観測データは地表の一部分でかつ限定した時刻におけるデータである。

NASAでは、ゴダード地球観測システム(GEOS）データ統合システム(DAs）が利用されている。
毎日2 GBのデータがメリーランド州のNASAセンタに集められている。そのデータはカルフォルニア州のNASAのコンピュータセンタに送られ、毎日6 GBも処理され、その結果を返送している。

解析には地球規模の気候パターンを予測するために一般循環モデル（GCM）を使い、観測値と予測値の間の誤差を計算している。データの同期化は６時間毎に繰り返している。

T3ネットワークを利用しえ400 GB/日を伝送している。

結果として、データの同期化には、データ伝送のスケジューリング、データキャッシュスペースの利用、さらにCPUアクセスのスケジューリングが必要となる。

データ処理プロセス：

必要な生データセットを特定

データリポジトリからのデータ検索

データを細分化し、必要なインプットセットを生成

計算の実行される場所にデータをキャッシュする。

データを解析し新たな結果のデータを生成

その結果を再びデータレポシトリに伝送

リポジトリに新たなデータセットを発行（登録）する。

データ伝送の上限（左表参照）

DAsの別の役割としてはデータの再解析がある。再解析は10年間に亘るデータ、47,000ファイルの中の29 TB以上のデータを利用して行う。一般的なインタフェース設計がされていれば自動処理が可能。

一番困難なのは、インプットデータに対する論理的な処理をする時に発生する。

次に困難なのは、インプットデータがどこにあるのか決める時に起こる。それは通常はローカルディスク上には存在しないからである。

> Top 2. Distributed Data Analysis:

What if the total amount of data becomes much larger than the transmission capacity? Then additional data-handling systems are needed to support processing of the data within the repository.

Recent astronomy research created digital image of large areas of the sky by digitizing existing astronomical photographic plates. (range in size from 3-40 TB of pixel data.) This turns into 2 B pixels where light is detected. The size is still large, 250 GB for the object metadata. The pixel images of each object must be saved to allow reanalysis of each object to verify whether it is a star or galaxy.

Such analyses require the ability to generate database queries based on object attributes or image specifications. Through the use of advance database and archival storage technology, the goal is to do the analysis in days instead of years.

A second requirement on the metadata comes from the need to integrate data from multiple digital sky survey repositories.
In the Digital Sky project, it will be necessary to integrate access to combinations of relational databases, object-oriented databases, and archival storage systems, which can be done only if system-level metadata is kept the describes the access protocol, network location, and local data-set identifiers for each of the data repositories.

A third requirement is the need to support multiple copies of a data set. Having multiple distributed copies of data sets minimizes network traffic, ensures fault tolerance, improves disaster recovery, and allows the data to be stored in different formats to minimize presentation time. If the copy becomes inaccessible for any reason, the data-handling system can automatically redirect a data access retrieval request to a backup site.

Multiple data repositories are also being created in neuroscience. Detailed images of brains for primates are being collected in repositories at USCG, UCLA and Washington University. The size of a current brain image is on the order of 50 GB. But high resolution images will be 1 TB per image; a collection of 1,000 brain images could be as large as 1 PB.

2. 分散データ解析:

インターネットの歴史は25年前からある。
データ量の全体が伝送容量より大きくなるとどうすればいいか？　その場合は、レポシトリ内のデータを処理できる追加のデータ処理システムが必要となる。

最近の天文学の研究では、既存の天文写真版をデジタル化して全天の大部分のデジタルイメージを作成した。（3-40 TBのサイズのデータ）これは、光が検知される場所は、2 Bピクセルに変える。サイズはまだ大きくて対象のメタデータとして250 GBになる。各対象のピクセルイメージは、それが恒星か銀河かを検証するために各オブジェクトの再解析ができるように保存される。

この分析には、オブジェクト属性やイメージ仕様に基づくデータベースクエリーを生成できることが必要となる。先進データベースやアーカイブストレッジ技術を活用して、目標としては年の単位ではなく日の単位で分析を行うことである。

メタデータに対する第二の要求は複数のデジタル全天探査レポシトリからのデータを統合する必要があるからでである。

デジタルスカイプロジェクトでは、リレーショナルデータベース、オブジェクト指向データベース、およびアーカイブストレッジシステムの組合せへのアクセルを統合する必要がでてくる。それはシステムレベルのメタデータがそれぞれのデータリポジトリに関するアクセルプロトコル、ネットワークロケーションおよびローカルデータセット識別子を保持している場合にのみ可能となる。

第三の要求は、データセットの複数のコピーをサポートする必要があるということである。複数の分散データセットｇあることでネットワークトラフィックを最小化し、フォールトトレランスを保証し、大障害からの復旧を改善し、データを異なるフォーマットで保管することでプレゼンテーション時間を最小化できる。もしそのコピーが何らかの理由でアクセスできない場合には、データ処理システムは自動的にデータアクセス検索をバックアップサイトに対して行うことができる。

複数のデータリポジトリは神経科学の分野でも作られている。霊長目の詳細な大脳画像はUSCG、UCLAおよびWashington大学のリポジトリで収集されている。現在の大脳画像は50 GBのサイズである。もしこれが1 TBの各々高精細画像となると、1000枚の大脳画像で合計1 PBにもなる。

> Top 3. Information Discovery:

Grid users will include citizens who want access to information and decision makers who need access to scientific knowledge; focussing on how to turn data into information and how to use information to support predictive modeling, problem analysis, and decision making.

Knowledge Networks:
defined to represent the multiple sets of discipline expertise, information, and knowledge that can be aggregated to analyze a problem of scientific or societal interest.

Individual researchers comprise a knowledge network enclave the includes their expertise, data collections, and analysis tools.

Each knowledge network enclave may impose unique requirements on the data-handling environment of computational grids.

Some may keep data private until it can be analyzed for new effects.

Some may publish immediately to establish precedence.

Some may organize information for either scientists' or the public's use.

Digital library technology:

Attributes are defined that encapsulate the information as metadata, which are then organized into a schema.

Definitions of each attribute are specified by semantics that have a common meaning within the discipline.

When world views evolve, ontology, schema, metadata, and semantics may all need to change.

Semantics: the vocabulary used to describe the metadata. Persons without that cultural background may not understand the underlying ontology used within the domain or the vocabulary used to convey meaning.

3. 情報の発見:

グリッドユーザには、情報にアクセスする必要がある市民や科学的な知識にアクセスする必要がある意思決定者が含まれる。彼らは、データをいかに情報に変え、そしそれをて予測モデリングや、問題分析や意思決定に役立てることに着目しているからである。

ナレッジ・ネットワーク：
科学的な問題あるいは興味を分析するために集積し得る多数の専門分野のノウハウ、情報、知識セットを表現するものと定義される。

個々の研究者のナレッジネットワークには、彼らの専門知識、データ収集、分析ツールを含む他分野の知識が含まれる。

それぞれのナレッジネットワークの他分野知識はコンピュータ・グリッド環境でのデータ処理には固有の要件を必要とすることになろう。

あるデータは新たな効果が分析されるまでは秘密にしておくことになろう。

他のあるデータは即刻発表して優位に立とうとするかも知れない。

またあるデータは科学者あるいは公開のために情報の整理が必要となろう。

デジタルライブラリ技術：

属性はメタデータとしての情報をカプセル化して定義され、一覧表（schema）として組織化されている。

各属性の定義は、専門分野中の共有の意味をもつ意味論（semantics）として分類される。

世界の認識が進化する都度、存在論(ontology)、一覧表(schema)、メタデータおよび意味論(semantics)もすべて変化することになろう。

意味論：メタデータを記述するためのボキャブラリ。文化的な背景を持たない人は、意味を伝えるために使われた特定分野やそのボキャブラリの底流にある存在論を理解できない。

> Top 4. Data-Naming Systems:

The evolution of data-handling environments has been driven by the need;

to develop storage systems to hold data,

information discovery mechanism to locate data sets,

data-handling mechanism to retrieve data sets,

publication mechanism to populate high-quality data repositories, and

systems to support data manipulation services.

Traditionally, applications use input data sets to define the problem of Internet and store results in output files written to local disk. Identifying the data sets is handled by specifying a unique UNIX pathname on a given host. The user maintains a private metadata catalog to equate the pathname with the unique attributes that identify the contents of the data set.

With the advent of Web, a URL specifies both the Internet address of a server and the pathname of each object. URNs are unique names across the Web that can map to multiple URLs. Users must still individually learn the URN that corresponds to a given object to build their own metadata catalog of interesting data object names.

Lightweight Directory Access Protocol (LDAP) organized entities in a hierarchical structure. LDAP is a protocol for accessing online directory services and is used by X.500 directory clients.

4. データネーミングシステム:

データ処理環境の進化は必要性に応じて行われてきた。

データを保有するためにストレッジシステムを進化させる。

データセットを特定するための情報発見メカニズム

データセットを取り出すためのデータ処理メカニズム

高品質のデータリポジトリを移植するための発行メカニズム

データ操作サービスをサポートするシステム

伝統的には、アプリケーションはインプットデータセットを使ってインターネットの問題を定義し、アウトプットファイルとして結果をローカルディスクに保存している。データセットを特定するには、あるホスト上にユニークなUNIXパスネームを指定することで操作してきた。ユーザはそのデータセットの内容を特定するようなユニークな属性をもつパスネームに対応するプライベートメタデータのカタログを保持する。

Webの登場によって、URLはサービスのインターネットアドレスおよび各オブジェクトのパスネームの両方を特定する。URNは、複数のURLをマッピングするように、Web間を超えたユニークな名前である。ユーザは、個別に特定のオブジェクトに対応するURNを学ぶ必要があり、そうすることで興味のあるデータオブジェクト名のメタデータカタログを構築できる。

Lightweight Directory Access Protocol (LDAP) が階層構造の中のエンティティ(entity）を編成する。LDAPは、オンラインディレクトリサービスにアクセスするためのプロトコルで、X.500のディレクトリクライアントによって使用される。

> Top 5. Data-Storage Systems:

At supercomputer centers, archival storage systems are used to maintain copies of data sets on tape robots that back-end large disk caches. Data written to the archive migrates from the cache to the robot based on the frequency of access. Archives typically store millions of files in terabyte capacity.

A current research topic is how to integrate object-relational database technology with archival storage systems to enable attribute-based access to the data sets within the archive.

Database technology is being considered for its ability to aggregate data into containers, which are the entities stored in the archive.

The storage of data within archives needs to be controlled by clustering algorithms that automatically aggregate jointly accessed data sets in the same database container.

Some archival storage systems support movement of a single data set across parallel I/O channels from tape to disk.

Current technology is an access rate of 1GB/s per terabyte disk cache in the archive. A 10 TB disk cache enables data-intensive problems requiring the movement of a petabyte of data per day.

5. データ蓄積システム１:

スーパーコンピュータでは、アーカイブ蓄積システムは、バックエンドの大型ディスクがキャッシュするテープロボット上のデータセットのコピーを維持するようになっていた。アーカイブに書かれたデータは、アクセスの頻度に基づいてキャッシュからロボットへ転送される。

現在の検索の話題としては、おいかにオブジェクト・リレーショナルデータベース技術とアーカイブストーレッジシステムとを統合して、アーカイブ内のデータセットに対して属性に基づくアクセスをできるようにするかである。

データベース技術とは、データをアーカイブの中に保管されるエンティティであるコンテナに統合できる能力と見なされている。

アーカイブ中のデータストーレッジは、同じデータベースコンテナに同時にアクセスしたデータセットを自動的に統合するクラスタリング・アルゴリズムによって管理される必要がある。

一部のアーカイブストーレッジシステムは、テープからディスクへの併行的I/Oチャネルを通じて単一のデータセットの移動をサポートする。

現在の技術では、アーカイブ中のテラバイト・ディスクのアクセスレートは1 GB/sである。10TBのディスクキャッシュとなると、ペタバイト／日のデータの移動が必要となるようなデータ集中の課題が可能となる。

> Top 6. Data-Handling Systems:

Data-intensive applications are labor-intensive, requiring manual intervention to identify the data sets and to move the data from a remote repository to a cache on the local disk. In addition, the application must be told the local filenames before execution.

Distributed file systems to provide a global name space.: Distributed File System (DFS), Network File System (NFS). In each case, the user must know the unique UNIX pathname for each data set. Data repositories that do not provide an NFS or DFS interface must be accessed separately.

Retrieving data from a data repository requires the user to generate SQL syntax to identify the data set.

SRB (Storage Resource Broker) supports protocol conversion between the UNIX-style streaming interface used by most applications and the protocols used by the storage resources. In the SRB, storage interface definitions (SID) provide alternate interfaces that the application may choose to access. The SRB then converts the data request into the format needed by a particular storage system.

6. データ処理システム:

データ集中アプリケーションは、労働集約的であり、データセットの確認やリモートリポジトリからローカルディスクへキャッシュ移動するためにマニュアルでの介入が必要となる。さらに、アプリケーションは実行の前にローカルファイルネームで指定しなければならない。

分散ファイルシステムは、グローバルネーム君官を提供する。Distributed File System (DFS)、およびNetwork File System (NFS)である。いずれも、ユーザはそれぞれのデータセットに対し、ユニークなUNIXパスネームを知る必要がある。NFSまたはDFSを提供しないデータリポジトリは、個別にアクセスしなければならない。

データリポジトリからデータを抽出するためには、ユーザは、そのデータセットを特定するSQL文を生成する必要がある。

SRB (Storage Resource Broker)は、多くのアプリケーションで使われるUNIXスタイルのストリーミングインターフェイスとストーレッジリソースで使われるプロトコルとの間のプロトコル変換をサポートする。SRBでは、ストーレッジインターフェイスの定義（SID）はアプリケーションがアクセスするために選ぶ代替のインターフェイスを提供する。SRBはデータリクエストを特定のストーレッジシステムが必要とするフォーマットに変革する。

> Top 7. Data Service Systems:

Data sets may require preprocessing before they are accessed by an application. A data-handling infrastructure is needed to encapsulate the services as methods that can be applied within computational grids:

One is CORBA (Common Object Request Broker Architecture) , in which data set are encapsulated within objects that provide the required manipulation.

Digital libraries provide a more comprehensive and powerful set of tools to manipulate data sets, by supporting services on top of data repositories.

7. データサービスシステム:

データセットはアプリケーションによるアクセスの事前処理が必要となる。データ処理インフラストラクチャはサービスをカプセル化が必要で、それによってでコンピュータ・グリッド内で適用可能な方法となる。

一つは、CORBA (Common Object Request Broker Architecture)であって、この中ではデータセットは必要とされる操作を提供するオブジェクトの中でカプセル化される。

デジタルライブラリは、データリポジトリ上のサービスをサポートすることによって、複数のデータセットを操作するためのさらに普遍的かつ強力なツールを提供する。

> Top 8. Data Publication Systems:

Data publication provides the quality assessment needed to turn data into information. This capability is provided by the experts who assemble data repositories for a given discipline.

For the large data sets accessed by data-intensive applications, research topics include how to generate statistical properties when the size of the data set exceeds the storage capacity of the local resources.

For publishing data across multiple repositories, the coordination of data privacy requires cryptographically guaranteed properties for authorship and modification records.

Publication also involves developing peer review mechanism to validate the worth of the data set.

8. データ発行システム:

データ発行は、データを情報に変えるための必要な品質評価を提供する。この能力は、特定の専門分野ためにデータリポジトリをアセンブルする専門家によって提供される。

データ集中アプリケーションによってアクセスされる大きなデータセットにとっては、データセットのサイズがローカルリソースのストレッジ容量を超えた場合には、どのように統計的な特性を生成するのかといった検索上の課題がある。

多数のリポジトリに渡ってデータを発行するためには、データプライバシィの対応には、著作権やレコードの修正を保証するための暗号化が必要となる。

発行はまたデータセットの有効性を確認するためにピアレビューメカニズミを開発することが必要となる。

> Top 9. Data Presentation Systems:

Unifying data presentation architectures are needed to enable collaborative examination of results. The data presentation may innovate teleinstrumentation, with data streaming from instruments in real time for display on collaborators' workstations.

The associated data-handling systems require support for asynchronous communication mechanisms, so that processing of the data stream will not impede the executing application or interrupt the instrument-driven stream.

The real-time constraints associated with collaborative examination of data will also affect the design of grid data-caching infrastructure. Multiple representations of the data sets at different resolutions may be needed to maintain interactive response.

For data-intensive applications, the resolution of the data set can be finer than the resolution of the display device.

Presentation environment will need to be interpreted across a combination of computational grids, CORBA object services, Java presentation services, and digital library publication services.

9. データプレゼンテーションシステム:

データプレゼンテーション・アーキテクチャを統一するためには結果の協同検証が可能になることが必要となる。データプレゼンテーションは遠隔操作の革新が必要となる。これは協同作業者のワークステーション上に、リアルタイムでの操作によるデータストリーミングを伴う。

関連づけたデータ処理システムは、非同期通信メカニズムのサポートが必要となり、その結果、データストリームの処理が既存のアプリケーションをじゃましたり、機械中心のストリームを妨害したりはしない。

データの協同作業での検証に関連するリアルタイムの制約は、コンピュータデータキャッシュのインフラストラクチャ設計に影響する。異なる精度でのデータセットを複数のデータセットで代表させるには双方向のレスポンスを維持する必要がある。

データ集中アプリケーションのためには、データセットの精度はディスプレイデバイスの精度よりも高精細である必要がある。

プレゼンテーション環境はコンピュータグリッド、CORBAオブジェクトサービス、Javaプレゼンテーションサービスおよびデジタルライブラリ発行サービスの組合せによって解釈される必要がある。

> Top 10. Data-Handling Paradigms:

Data-handling environments are evolving from local systems that can interact only with local data peripherals to distributed systems that integrate access to multiple heterogeneous data resources. Data access via user-defined data set names is evolving to information access based on data set attributes.

This shift from local to global resource access will enable new paradigms for data handling; as a shift from local resources to distributed resources, to an environment that supports ubiquitous access to computing and information resources.

The emergence of ubiquitous access to data is revolutionizing the conduct of science. Researchers are publishing scientific results on the Web and providing Web-based access mechanisms to query data repositories and apply analysis algorithms.

This infrastructure is expected to enable analyses, resulting in faster progress in scientific research through the nonlinear feedback made possible when new information is used to improve data analysis. The publication of the results of computations, followed by the dynamic discovery of information when new applications are run, forms a feedback loop that can rapidly accelerate the generation of knowledge.

Digital library technology is evolving to include the capability to analyze data in associated workspaces through application of published algorithms.

User interfaces to these systems are evolving into dynamic collaboration environments in which researchers simultaneously view and interact with data

10. データ処理のパラダイム:

データ処理環境はローカルなデータ周辺機器とやりとりできるローカルシステムから、複数の異質なデータリソースにアクセスして統合する分散システムへと進化してきている。ユーザが定義するデータセットの名前でアクセスするデータはデータセット属性に基づく情報アクセスへと進化している。

このようにローカルリソースからグローバルリソースへとシフトすることは、データ処理にとっては新たなパラダイムを可能にする。それはローカルリソースから分散リソースへ、またどこからでも分散したコンピュータと情報リソースにアクセスできることをサポートする。

データに対するユビキタスアクセスの登場は、科学の遂行を革命的に進化させた。研究者は、科学上の成果をWebに発表し、データリポジトリへの問い合わせ、分析アルゴリズムを適用するWebベースのアクセスメカニズムを提供する。

このインフラストラクチャは分析を可能となり、新たな情報によってデータ解析に利用されることでノンリニアなフィードバックが可能となり、結果としてより急速な科学研究の進歩をもたらす。
計算結果のを公表することの結果として、新たなアプリケーションを走らせ、知識の生成を急に加速化できるような循環ループを作ることによって情報のダイナミックな発見をもたらす。

デジタルライブラリ技術は、公表されたアルゴリズムの適用を通じたワークスペースに関連するデータの分析する能力を含んで進化している。

これらのシステムのユーザインターフェイスは、研究者がデータを同時に見て反応できるようなダイナミックな協同環境へと進化している。

Comment

> Top In database related technologies, we realized that there are very limited terms in Japanese expression which correspond to the original English. This shows how the database management is one of rapidly growing fields which leaves not enough time for cultural digestion.

データベース関連の技術では、我々は日本に元の英語に対応する日本語の表現に非常に限界があることを認識した。このことはとりもなおさずデータベース管理が文化的な消化のための十分な時間を残さないような急成長の分野の一つであることを示している。

| Top | Home | Article | Bookshelf | Keyword | Author | Oxymoron |