Jump to content

Data warehouse appliance

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Rstackowiak (talk | contribs) at 16:42, 2 August 2008 (History: Updated the Oracle section to reflect current naming, vendor list, and delivery mechanism). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

[original research?]

A data warehouse appliance is an integrated set of servers, storage, OS, DBMS and software specifically pre-installed and pre-optimized for data warehousing. Alternatively, the term is also used for similar software-only systems[1] that purportedly are very easy to install on specific recommended hardware configurations.[2] DW appliances provide solutions for the mid-to-large volume data warehouse market, offering low-cost performance most commonly on data volumes in the terabyte to petabyte range.

Technology Primer

Most DW appliance vendors use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. MPP architectures consist of independent processors or servers executing in parallel. Most MPP architectures implement a “shared nothing architecture” where each server is self-sufficient and controls its own memory and disk. Shared nothing architectures have a proven record for high scalability and little contention. DW appliances distribute data onto dedicated disk storage units connected to each server in the appliance. This distribution allows DW appliances to resolve a relational query by scanning data on each server in parallel. The divide-and-conquer approach delivers high performance and scales linearly as new servers are added into the architecture.

MPP database architectures are not new. Teradata, Tandem, Britton Lee, and Sequent offered MPP SQL-based architectures in the 1980s. The re-emergence of MPP data warehouses has been aided by open source and commodity components. Advances in technology have reduced costs and improved performance in storage devices, multi-core CPUs and networking components. Open source RDBMS products, such as Ingres and PostgreSQL, reduce software license costs and allow DW appliance vendors to focus on optimization rather than providing basic database functionality. Open source Linux provides a stable, well-implemented OS for DW appliances.

History

Many consider Teradata’s initial product as the first DW appliance (or Britton-Lee's, but Britton Lee—renamed ShareBase—was acquired by Teradata in June, 1990[3]). Some regard Teradata's current offerings as still being other appliances, while others argue that they fall short in ease of installation or administration. Interest in the data warehouse appliance category is generally dated to the emergence of Netezza in the early 2000s.

More recently, a second generation of modern DW appliances has emerged, marking the move to mainstream vendor integration. IBM integrated its InfoSphere Warehouse (formerly DB2 Warehouse) with its own servers and storage to create the IBM InfoSphere Balanced Warehouse. Other DW appliance vendors have partnered with major hardware vendors to help bring their appliances to market. DATAllegro partners with EMC and Dell and implements open source Ingres on Linux. Greenplum has a partnership with Sun Microsystems and implements Bizgres (a form of PostgreSQL) on Solaris using the ZFS file system. HP Neoview has a wholly-owned solution and uses HP NonStop SQL.

Kognitio offers a row-based “virtual” data warehouse appliance while Vertica, and ParAccel offer column-based “virtual” data warehouse appliances. Like Greenplum, ParAccel partners with Sun Microsystems. These solutions provide software-only solutions deployed on clusters of commodity hardware. Kognitio’s homegrown WX2 database runs on several blade configurations. Other players in the DW appliance space include Calpont and Dataupia.

Recently, the market has seen the emergence of data warehouse bundles where vendors combine their hardware and database software together as a data warehouse platform. The Oracle Optimized Warehouse Initiative combines the Oracle Database with the industry’s leading computer manufacturers Dell, EMC, HP, IBM, SGI and Sun Microsystems. Oracle's Optimized Warehouses are pre-validated configurations and the database software comes pre-installed, though some analysts differ as to whether these should be regarded as appliances.

Benefits

Reduction in Costs
The total cost of ownership (TCO) of a data warehouse consists of initial entry costs, on-going maintenance costs and the cost of increasing capacity as the data warehouse grows. DW appliances offer low entry and maintenance costs. Initial costs range from $10,000 to $150,000 per terabyte, depending on the size of the DW appliance installed.

The resource cost for monitoring and tuning the data warehouse makes up a large part of the TCO, often as much as 80%. DW appliances reduce administration for day-to-day operations, setup and integration. Many also offer low costs for expanding processing power and capacity.

With the increased focus on controlling costs combined with tight IT Budgets, data warehouse managers need to reduce and manage expenses while leveraging their technology as much as possible making DW appliances a natural solution.

Parallel Performance
DW appliances provide a compelling price/performance ratio. Many support mixed-workloads where a broad range of ad-hoc queries and reports run simultaneously with loading. DW appliance vendors use several distribution and partitioning methods to provide parallel performance. Some DW appliances scan data using partitioning and sequential I/O instead of index usage. Other DW appliances use standard database indexing.

With high performance on highly granular data, DW appliances are able to address analytics that previously could not meet performance requirements.

Reduced Administration
DW appliances provide a single vendor solution and take ownership for optimizing the parts and software within the appliance. This eliminates the customer’s costs for integration and regression testing of the DBMS, storage and OS on a terabyte scale and avoids some of the compatibility issues that arise from multi-vendor solutions. A single support point also provides a single source for problem resolution and a simplified upgrade path for software and hardware.

The care and feeding of DW appliances is less than many alternate data warehouse solutions. DW appliances reduce administration through automated space allocation, reduced index maintenance and in most cases, reduced tuning and performance analysis.

Built-in High Availability
DW appliance vendors provide built-in high availability through redundancy on components within the appliance. Many offer warm-standby servers, dual networks, dual power supplies, disk mirroring with robust failover and solutions for server failure.

Scalability
DW appliances scale for both capacity and performance. Many DW appliances implement a modular design that database administrators can add to incrementally, eliminating up-front costs for over-provisioning. In contrast, architectures that do not support incremental expansion result in hours of production downtime, during which database administrators export and re-load terabytes of data. In MPP architectures, adding servers increases performance as well as capacity. This is not always the case with alternate solutions.

Rapid Time-to-Value
Companies increasingly expect to use business analytics to improve the current cycle. DW appliances provide fast implementations without the need for regression and integration testing. Rapid prototyping is possible because of reduced tuning and index creation, fast loading and reduced needs for aggregation in some cases.

Application Uses

DW appliances provide solutions for many analytic application uses, including:

  • Enterprise data warehousing
  • Super-sized sandboxes isolate power users with resource intensive queries
  • Pilot projects or projects requiring rapid prototyping and rapid time-to-value
  • Off-loading projects from the enterprise data warehouse; ie large analytical query projects that affect the overall workload of the enterprise data warehouse
  • Applications with specific performance or loading requirements
  • Data marts that have outgrown their present environment
  • Turnkey data warehouses or data marts
  • Solutions for applications with high data growth and high performance requirements
  • Applications requiring data warehouse encryption

The DW appliance market is shifting trends in many areas as it evolves:

  • Vendors are moving toward using commodity technologies rather than proprietary assembly of commodity components.
  • Implemented applications show usage expansion from tactical and data mart solutions to strategic and enterprise data warehouse use.
  • Mainstream vendor participation is now apparent.
  • With a lower total cost of ownership, reduced maintenance and high performance to address business analytics on growing data volumes, most analysts believe that DW appliances will gain market share.

See also

References

  1. ^ Queries From Hell blog » When is an appliance not an appliance?
  2. ^ DBMS2 — DataBase Management System Services»Blog Archive » Data warehouse appliances – fact and fiction
  3. ^ Todd White (November 5 1990). "Teradata Corp. suffers first quarterly loss in four years". Los Angeles Business Journal. {{cite journal}}: Check date values in: |date= (help); Unknown parameter |accessed= ignored (help)