SAP HANA® Performance
Efficient Speed and Scale-Out for Real-Time Business Intelligence
SAP HANA® PerformANce
Table of Contents
Introduction The Test environment Database Schema Test Data System Configuration Setup Queries
Test results Baseline Test Throughput Ad Hoc Historical Queries Results Summary Real-World Experiences
SAP HANA Performance
ANAlyzINg lArge AmouNTS of DATA IN reAl TIme
SAP HANA® appliance software enables organizations to optimize their business operations by analyzing large amounts of data in real time. It runs on inexpensive, commodity hardware and requires no proprietary add-on components. It achieves very high performance without requiring any tuning. A 100 TB performance test was developed to demonstrate that SAP HANA is extremely efficient and scalable and can very simply deliver breakthrough analytic performance for real-time business intelligence (BI) on a very large database that is representative of the data that businesses use to analyze their operations. A 100 TB1 data set was generated in the same format as would be extracted from the SAP® ERP application (for example, data records with multiple fields) for analysis in the SAP NetWeaver® Business Warehouse (SAP NetWeaver BW) component.2 This paper will describe the test environment and present and analyze the test results.
SAP HANA appliance software enables organizations to optimize their business operations by analyzing large amounts of data in real time.
SAP HANA Performance
SAleS AND DISTrIbuTIoN bI QuerIeS
The Test Environment
The performance test environment was developed to represent a sales and distribution (SD) BI query environment that supports a broad range of Structured Query Language (SQL) queries and business users. DATAbASe ScHemA The star-schema design in Figure 1 shows the SD test data environment and each table’s cardinality. No manual tuning structures were used in this design; there were no indexes, materialized views, summary tables, or other redundant structures added for the purposes of achieving faster query performance. TeST DATA The test data consists of one large fact table and several smaller dimension tables, as seen in Figure 1. There are 100 billion records in the fact table alone, representing 5 years’ worth of SD data. The data is hash partitioned equally across the 16 nodes using “Customer_ID.” Within a node, the data is further partitioned into one-month intervals. This results in 60 partitions per node and approximately 104 million records per partition. (See also Figure 2 and the “Loading” section.) SySTem coNfIgurATIoN The test system configuration (Figure 2) is a 16-node cluster of IBM X5 servers with 8 TB of total RAM. Each server has: • 4 CPUs with 10 cores and 2 hyperthreads per core, totaling: – 40 cores – 80 hyperthreads • 512 GB of RAM • 3.3 TB of disk storage SeTuP These performance tests did not use precaching of results or manual tuning structures of any kind and therefore validated the SAP HANA load-then-query ability. A design that is completely free of tuning structures (internal or external) is important for building a sustainable, real-time BI environment; it not only speeds implementation but also provides ongoing flexibility for ad hoc querying while eliminating the maintenance cost that tuning structures require. loading The loading was done in parallel using the “IMPORT” command in SAP HANA, which is a single SQL statement that names the file to load. Loading is automatically parallelized across all of the nodes and uses the distribution-and-partitioning scheme defined for each table3 (in this case, hash distribution and monthly range partitions). The load rate was measured at 16 million records per minute, or 1 million records per minute per node. This load performance is sufficient to load 100 million records (representing one business day’s activity) in just six minutes....