Tuesday 8 May 2012

ABAP FOR BI


ABAP Recommendations


Table of Contents:


 

SELECT Statements

The most frequently used method of extracting data from the database is by using the SELECT statement which makes it important that this is correctly written, the database optimizer selects and uses database indexes depending partly on the WHERE conditions specified in the SELECT statement. The order of the WHERE conditions in the SQL statement is not important.

Select Statement Recommendations

  1. Try to avoid using SELECT…ENDSELECT loops since they are almost always less efficient than set processing using a SELECT statement.

  1. It’s usually faster to use INTO TABLE rather than APPENDING table if the receiving internal table is empty.

  1. Use a SELECT list instead of SELECT * if you are only interested in specific columns of the table, the network load is considerably less.

  1. Use explicit field lists rather than INTO CORRESPONDING FIELDS OF since it is more efficient and easier for maintenance to see which fields are moved where.

  1. Use aggregate functions (COUNT, SUM, MAX, MIN, and AVG) to perform calculations if they significantly reduce the quantity of data records to be transferred, they are usually used with a GROUP BY clause.

  1. To apply a logical condition to the groups defined in the GROUP BY clause, use HAVING. The data quantity to be transferred depends on the selectivity of the HAVING clause.  Using HAVING clauses is similar to using aggregate functions; they create an extra load on the database so only use them if they reduce the quantity of data records to be transferred. Before you use a HAVING clause, check whether the condition can be specified in the WHERE clause instead.

  1. If you are interested if there exists at least one row of a database table or view with a certain condition, use SELECT ... UP TO 1 ROWS.  If all primary key fields are supplied in the WHERE condition you can also use SELECT SINGLE.

  1. Always specify your conditions in the WHERE-clause instead of checking them yourself with CHECK statements.  The database system is then given the opportunity to use an index (if possible) and the network load is considerably less.

  1. Try to use the full primary index in the WHERE clause of SELECT statements for optimum performance.  For Outbound BADI processing, this may involve pulling the LOGSYS field into the field list of an extract so that it can be used in the BADI.

  1. Formulate WHERE statements positively wherever possible.  If fields in the WHERE clause are specified with operators NOT or <>, these conditions cannot be used for a search over a database index.  If a positive formulation cannot be used, for example because the IN list would be too long, you should still specify the WHERE condition with NOT to reduce the amount of data to be transferred.  Use WHERE with BETWEEN and WHERE with LIKE, >, or <  and WHERE with OR only if  necessary.

  1. A field specified with LIKE can narrow the selection of database indexes and be used for a search over a database index only if it does not begin with the wildcard character _ or %. In ABAP, the wildcard + is used for any character, and * is used for any character string. For database accesses, however, the characters _ and % are used.

  1. If you want the results of a SELECT statement to be returned in the sorted order of the table index, use an ORDER BY clause on the database.  Otherwise use SORT <internal table> in the program.

  1. RANGES tables can be defined using the ABAP statements SELECT-OPTIONS and RANGES.  They implicitly define an internal table with the SIGN, OPTION, LOW, and HIGH fields. The RANGES table can be populated by the end user at the selection screen dynamically or by the program.  The database interface converts the rows in a RANGES table so that they can be read by the DBMS, and links them with OR. Since the RANGES table can contain rows with complex expressions (e.g. BT, CP). The SQL statement that results can be complex and difficult for the DBMS to process so USE WITH CAUTION.


Database Joins

In ABAP, there are various ways to implement the Join operator:-

v  Nested SELECTs – These have several disadvantages and should be avoided:
-       Data records of the inner table are transferred row-by-row to the driving table
-       Lots of small fetches, instead of fewer and more compact fetches.
-       The data records of the inner table are read multiple times.

v  SELECT FOR ALL ENTRIES - Enables you to create a join between an internal table and a database table, when using this, the following problems can occur:-
-       If the internal table is empty selection is not limited and the WHERE clause conditions are not evaluated so the whole Database table is likely to be selected.
-       Duplicate entries should be deleted from the internal table before executing the SELECT... FOR ALL ENTRIES, otherwise, identical data is read unnecessarily from the database.

v  DB VIEWS – These are defined in the ABAP dictionary and activated to create them in the database. DB views created by other developers can also be used (reusability).

v  ABAP JOINs - These are defined in an ABAP program.  To read data distributed over various tables, you must create a Join between tables using the JOIN operator.
-       An INNER JOIN produces a result set that contains only the records that match in both sides of the Join.
-       A LEFT or OUTER JOIN produces a resulting set containing all data records from the outer table regardless of whether matching records exist in the inner table, if there are no matching data records in the inner table, NULL values are returned for the fields in the inner table.

Join Recommendations

1.     Instead of a SELECT into one internal table and LOOP AT …. READ to check the contents of a second internal table, use DataBase Join between the two tables wherever possible.

2.     If using a DataBase Join ensure that it uses indexes on both sides of the join.

3.     If it’s possible to select all data from database tables then the ABAP JOIN method is preferable, if it’s necessary to select data from the database based on the contents of an internal table then the FOR ALL ENTRIES must be used.  One significant difference between these two methods is that FOR ALL ENTRIES will not reflect duplicates in the internal table, so duplicates in the itab matching a single row in the database table will return a single record.  A database JOIN will reflect duplicates in both sides of the.

4.     If the data is to be selected from more than 2 database tables then it may be efficient to write a multi-way join or better to break it down into several steps, this depends on the circumstances so some trial and error will be necessary.




Other Statements


Recommendations

1.     Avoid using MOVE-CORRESPONDING since it can be more expensive in processing than explicit MOVE statements.

2.     Avoid unnecessary MOVEs by using the following explicit work area operations where appropriate:-
APPEND wa TO itab.
INSERT wa INTO itab.
COLLECT wa INTO itab.
MODIFY itab FROM wa.
READ TABLE itab INTO wa.
LOOP AT itab INTO wa.

5.     LOOP ... WHERE is faster than LOOP/CHECK because LOOP ... WHERE evaluates the specified condition internally.  

6.     The performance of logical expressions is better if the operands being compared have the same data type.

7.     CASE statements are clearer and a little faster than IF-constructs.

8.     If you can use WHILE instead of a DO+EXIT-construction, then do so.  WHILE is easier to understand and faster to execute.





Internal Tables

The rows of internal tables can be identified either via the work area from which the key values are taken (in an implicit key access); or via an explicit key specification using WITH [TABLE] KEY. Key accesses are possible for all table types, but they show different levels of
performance for each table type.

There are three types of internal tables:
v  In a STANDARD table, you can access data using either the table index or the key.  If access is the primary key, the response time is linearly related to the number of table entries. The key of a standard table is always non-unique.

v  SORTED tables are always stored in ascending order according to their key. You can access them using either the table index or the key. If you use the key the response time is in logarithmic relation to the number of table entries because the system uses a binary search to access the table. The key can be either UNIQUE or NON-UNIQUE.  STANDARD tables and SORTED tables are generically known as index tables, since their values can be accessed via an index.

v  HASHED tables can only be accessed via the primary key. The response time is constant, regardless of the number of table entries, because access proceeds via a hash algorithm which is determined internally so they are useful for reading a single data record via the key. The key of a HASHED table must be unique and you cannot access hash tables through indexes.

Internal Table Recommendations

1.     INSERT for a standard table or hashed table using the key has the same effect as an APPEND statement, for hashed tables, however, the address of the data row to be entered must also be calculated.  For inserts in SORTED tables, the sort sequence must be strictly observed, which means the access costs grow with increasing table size.

2.     For a STANDARD table, it is generally best performance to sort according to the relevant fields and subsequently execute a READ... BINARY SEARCH.

3.     With the COLLECT statement, the contents of the work area is added to an entry with the same key or added to the table as a new entry. This allows you to create aggregated internal tables, so reducing data volume.

4.     If you need the COLLECT semantics, DO use COLLECT.  When using READ BINARY the internal table's index must be adjusted with each INSERT.  COLLECT, however, uses a hash algorithm and is therefore independent of the number of entries and does not need to maintain a table index. If you need the final data sorted, sort it after all data has been collected.  If the amount of data is small, the READ/INSERT approach isn't bad, but for large amounts of data (> 1000), COLLECT is much faster. 

5.     Do not use COLLECT to populate an internal table in combination with any other table-filling statements (APPEND, INSERT, MODIFY, SELECT * INTO TABLE and/or SELECT * APPENDING TABLE) because then the COLLECT cannot use its hash algorithm but resorts to a normal linear search, which is dramatically slower.

6.     Use APPEND LINES OF instead of LOOP AT…APPEND…ENDLOOP, if possible because the former statement is far more efficient.

7.     Mass data processing should not be used if the internal table has a deep row structure, that is, contains nested tables. In this case, you require single data record processing with SELECT ... ENDSELECT. 

8.     The internal table should have a structure similar to the field list, so that it is unnecessary to use INTO CORRESPONDING FIELDS (which must perform the additional work of comparing the field names). 

9.     If a target table is of type SORTED, it automatically sorts the inserted data. If APPENDING is used, the new data records are added in the sorted order.

10.  If an internal table key is PART of the primary key of the database table, it cannot be defined as UNIQUE because duplicates may occur. These duplicates cause runtime errors.

11.  If you are populating an internal table from an internal table, use mass processing instructions APPEND/INSERT LINES.  They can be used to extend an existing internal table, you can copy an interval of lines from one internal table to another using a FROM...TO. 

12.  If you want all key fields of the database table to appear in the internal table, but not in the key, you cannot use a SELECT DISTINCT.  Instead, this can be optimally realized using DELETE ADJACENT DUPLICATES FROM This checks whether adjacent rows have the same key. The internal table should be sorted.  COMPARING lets you define duplicates in terms of other keys.



Index Considerations

An index should not consist of too many fields. Having a few very selective fields increases the chance of reusability, and reduces the chance of the database optimizer selecting an unsuitable access path.

There are several types of Index Scan
v  INDEX UNIQUE SCAN: The index selected is unique (primary index or unique secondary index) and fully specified. This type of access is very effective and will be used if all fields in the index are specified with an equal (=) sign in the WHERE clause.  If a further WHERE condition is specified in the SELECT statement for a field that is not in the index, this is evaluated after the table records had been read from the table blocks.

v  INDEX RANGE SCAN: The index selected is unique or non-unique. For a non-unique index this means that not all index fields are specified in the WHERE clause and a range of the index is read and checked. An index range scan may not be as effective as a full table scanOutdated update statistics, database optimizer errors, missing indexes, or inappropriate ABAP coding can all mean that the database optimizer may select a completely unsuitable index and then perform an unselective index range scan.

v  FULL TABLE SCAN: The whole table is read sequentially. Each table block is read once.  Since no index is used, no index blocks are read.  The full table scan access strategy is very effective if a large part of a table (for example, 5% of all table records) needs to be read.

Index Recommendations

1.     Create as few indexes as possible per table (as few as possible but as many as necessary).

2.     Selection of index fields
-       As few fields as possible in index (as a general rule – 4 fields)
-       Fields that are as selective as possible
-       Selective fields as near to the beginning as possible

3.     Do not change SAP standard indexes






Slow SELECT Statement Checklist

If a SELECT statement is running slowly consider the following:-

v  Changing the WHERE clause to use an existing index on the table

v  Adding a secondary index to the table to fulfill the needs of the selection

v  Alter the WHERE clause to reduce the number of rows returned by limiting the selection or by aggregate functions

v  Reduce the columns to be transferred by formulating a suitable field list.



Slow LOOP Checklist

If a LOOP is taking a long time consider the following:
           
v  If internal tables being READ inside the loop are HASHED consider changing them to STANDARD, if they are STANDARD consider changing them to HASHED.

v  Explore the possibility of taking complex calculations out of the loop.

v  Populating some of the internal tables in one go using a SELECT statement with a JOIN before the loop so that fewer or no READs of internal tables are needed in the loop. 

No comments:

Post a Comment